By: Eric Parker
Eric lives in Seattle and has been teaching Tableau and Alteryx for 5 years. He's helped thousands of students solve their most pressing problems. If you have a question, feel free to reach out to him directly via email.
Working with data that isn’t clean can be a pain. One of the most frequent data issues I come across is dealing with duplicated rows of data. They are particularly tricky because they can be hard to spot and can inflate your values. Sometimes these duplicates can be solved by creating a join with multiple conditions (i.e. joining on Store ID and Date) but sometimes they are inherent data issues.
For instance, imagine you are analyzing recent orders placed to your company to determine how many have been processed and shipped.
The Orders table looks like this:
The Order Status table looks like this:
Each Order can be listed up to three times in the Order Status table. There will be a row for “Placed” another row for “Processed” and one fo “Shipped” as well.
The Orders table contains 500 rows of data but the Order Status table contains 1,274. The result is that when these tables are joined on Order ID, there result set will contain 1,274 rows.
That’s great news if we want to create a “funnel” of sorts to see how many orders are at each order status.
An issue arises however when summarizing sales:
Going back to the Excel sheet and summarizing the sales figures from the Orders table, it’s clear that the total sales is $1.7M not $4.3M.
Because the join structure was many (up to 3 statuses) to one (each Order ID is unique in the Orders table) duplicates were often created. See Order 1000 as an example:
In reality, the sales amount for this order was $3,020, but since it was tripled it’s showing up as $9,060.
One of the fastest ways of solving this issue in Tableau is using the FIXED function. Here’s an example:
This formula tells Tableau to only return the minimum sales amount once per Order ID so that if the value was summarized, Order ID 1000 would be counted as $3,020 not $9,060. Here it is in action:
While there are alternative methods for dealing with duplicate records at the database level, this is the preferred method in Tableau Desktop.
Want help with your own Tableau calculation questions? Sign up for an office hour and we’ll help you!