What I Learned About Working with Shapefiles in Tableau

By: Eric Parker

Pro Headshot.jpg

Eric lives in Seattle and has been teaching Tableau and Alteryx for 5 years. He's helped thousands of students solve their most pressing problems. If you have a question, feel free to reach out to him directly via email.

Background

I grew up in the Issaquah/Sammamish area of Washington State about 30 (depending on traffic 60) minutes east Seattle. When I was growing up there in the 1990s and 2000s I remember hearing comments from a lot of people that, “this area has the highest amount of high school students per capita anywhere in the state.”

Back then, I didn’t have the skills (or time/interest) to investigate that claim for myself, so I took it to be true. Confirmation bias seemed to confirm it anyways, I saw high schoolers all the time! (I wasn’t a critical enough thinker to realize I was in high school which influenced my perception).

Years later that claim came to mind again and I decided to do some digging. The data was hard to find so the accompanying figures are based on 2010 values. The total population by school district comes from the Spokesman Review and the public high school population figures are downloaded directly from the State of Washington OSPI. I can’t speak specifically to the veracity of the data but in broad brush strokes it makes sense.

79-1.png

First, I was able to debunk my theory. Only 5.2 % of the population that lived in the Issaquah School District boundaries was in high school in 2010, not close to the top of the list.

Next, I noticed that the school with the most students per capita, “Quillayete Valley” had 50% of their population in high school! After some research it’s a small community with a large, online school presence that draws thousands of students from across the state inflating those numbers.

Then, I wondered what would happen if we took small communities out of the mix. I filtered the dashboard to only show communities with a total population of at least 15,000. After doing that, I found Monroe tops the list of larger communities with 9% of their populace in high school.

79-2.png

Shapefile Stuff

I showed my work to a stakeholder who mentioned it was interesting and asked, “Are there any geographic trends in the data?” I didn’t have the ability to easily answer that question so I started looking for a shapefile that contained data for Washington State school districts. I quickly found what I was looking for and tried to bring the shapefile into my Tableau Prep workflow only to find out that Tableau Prep can’t incorporate shapefiles.

I then tried to use data blending in Tableau Desktop only to realize that was a mistake. The district name values didn’t always match between the 3 separate sources so blending meant data was being lost (and I couldn’t see it or fix it). Finally, I settled on a cross-database join.

I wish I could have brought the shapefile into Tableau Prep because the data wasn’t consistent from one data source to the next. For instance in one data source a district might be called “Seattle Public Schools” and in the next it was “Seattle School District”. There were numerous of these issues and without a method for altering the underlying data of the shapefile, I had to return to the Tableau Prep flow to make all the census and OSPI data match the district names in the shapefile.

After numerous bouts of tinkering (and a Tableau Desktop join calculation for good measure) I got all the data to join correctly!

Creating this dashboard required a several step analytic process;

  1. Start with a premise.

  2. Gather data.

  3. Prepare data.

  4. Visualize.

  5. Clean data.

  6. Investigate results.

For a full walkthrough of that analytic process join us for a 30-minute webinar on Friday, April 12th at 9 am PDT.