Preparing survey data for analysis in a data visualization tool is notoriously difficult. It's not as simple as being able to perform a single pivot or filter.
If you need to pivot data from a SQL table for use in Tableau, Tableau Desktop’s default pivot feature can’t help you.
You might never have heard of a self join but you might need one. Occasionally, tables in a database are structured so that it makes sense to join a copy of a table to itself…
I frequently hear the question, “Can Tableau show my missing data?”. Generally when I get this question, people want to either see a 0 or a blank where there should be missing data.
Joins can be a sticky business, especially if…
● You haven’t used them much before.
● You are working with data that is new to you.
● You don’t trust your data cleanliness.
“How do you calculate a headcount at a moment in time when you only have a start and end date?” I’ve gotten this question several times. My answer used to be “Ideally, you’d want a row of data for an individual for every possible date unit you’d want to count them at.”
After scouring the internet, I couldn’t find a good reference for beginners on how to prepare your data for Tableau so I figured, it looks like I’m going to have to do this myself! This is meant to be an introductory guide only, if you have more detailed questions feel free to send me an email.
I’m going to demonstrate how to prepare your data in a few different ways. I would like to be clear, as of January, 2017, I recommend preparing your data as much as possible before bringing it into Tableau. If you can bring all your data into Tableau as a single data source, that is going to make your life immensely easier. The ways that I’ll demonstrate preparing data are with Tableau, Alteryx and SQL.
First, let’s start with some definitions.
Fact Table: A fact table is a set of data that consists of measurements, metrics or facts of business processes. These could include sales, quantities, # of uses, basically, and kind of metric you’ll add, subtract, multiply or divide to gain additional insight. More on that here: https://en.wikipedia.org/wiki/Fact_table.
Dimension Table: A dimension table contains descriptive information which is likely to be text-based. Dimensional fields include dates, geographies, business units and IDs. More on that here: https://en.wikipedia.org/wiki/Dimension_(data_warehouse).
Join: A join is a way to combine two tables of data on (a) shared dimension(s). This process is used when combining two dimension tables or a dimension and fact table. One example would be a fact table with sales information that has a “Customer ID” field. This table could be joined to a dimension table on “Customer ID” with additional customer information like name and address. More on joins here: https://en.wikipedia.org/wiki/Join_(SQL). To learn more about different types of joins: http://www.sql-join.com/sql-join-types/.
Union: A union is a way to combine two fact tables by essentially stacking one on top of the other. All columns that have the same name will be matched up (to allow you to seamlessly filter across different data sources with a single field like “Date” or “State”). Any columns that don’t have a match in a secondary table will be filled in with a “null” or empty cell of data. More on unions here: https://www.techonthenet.com/sql/union.php.
Alright, let’s dig into an example together. Let’s say we are analyzing data for a company that manages printers. They need to stay up to date on the total volume of prints their customers have done to ensure they keep their fleets at optimal health.
Let’s imagine we have three data tables. Below are the names and a list of the fields in each table.
Customer Table (Dimension):
· Customer ID
· Customer Name
Current Prints Table (Fact)
· Customer ID
· Device Serial #
· Total Prints
Historical Prints Table (Fact)
· Customer ID
· Device Serial #
· Prints (since previous read)
Prepare Your Data in Tableau
As mentioned earlier, it is preferable, especially with complex data, to prepare it before it reaches Tableau. However, sometimes that’s not an option and you have to make do with what you have. In this simple scenario, Tableau should work just fine. Typically, I would start by doing a union followed by a join, but we’ll need to reverse the order to get Tableau to work properly (as of January, 2017).
I’ll start by joining the Current Prints table to the Customer table on the field “Customer ID”.
I’ll follow that by unioning the Historical Prints table to the Current Prints table.
Tableau creates one, larger table that allows the end user to work more effectively across multiple data sources.
Prepare Your Data in Alteryx
I’m biased because I’ve really grown to love using this tool over the last couple years, but I think Alteryx is the best tool for someone who is new to ETL. It’s straightforward and easy to learn quickly. It has native connectors to most major data sources and also has some great capabilities for handling non-standard data like xml parsing or connecting to APIs.
After pulling both tables into Alteryx with an input tool, I add a Formula tool for both to indicate what table the data come from. From there, I feed both data sources into a union tool where fields are automatically configured by name.
The result looks like this:
From there, we can join the customer data with a join tool. I join the unioned data sources to the customer table on “Customer ID”.
We have the three original data sources consolidated into a single workflow for easier analysis in Tableau. I typically output the data from Alteryx as a .tde file for consumption in Tableau.
Prepare Your Data with SQL
Depending on your level of SQL expertise, this work can be accomplished with SQL as well. Best practice is to create a view in a SQL database that you pull into Tableau. You will apply all the same logic discussed above. Feel free to study the SQL statement below to get a feel for how you write the join, unions, create a “Source” field and properly union fields that don’t exist in all data sources. Please note, this SQL query was written against an Access database so I was more limited in my options. You could also utilize a subquery to consolidate the two join clauses into one if you are writing against a SQL Server database.
Join and Union
The SQL query returns a table that looks identical to the results from the Tableau and Alteryx data preparation walkthroughs.
I hope this basic walkthrough is a helpful jumping off-point for preparing your data for Tableau.
If you run into further complexities that aren’t explained here or have any questions, please feel free to reach out.