Using SQL for Tableau Performance Improvement

By: Eric Parker

Pro Headshot.jpg

Eric Parker lives in Seattle and has been teaching Tableau and Alteryx since 2014. He's helped thousands of students solve their most pressing problems. If you have a question, feel free to reach out to him directly via email.

When teaching, I often get asked some variation of the question, “How many rows of data can Tableau handle?”. It’s a well-intended question but it’s incomplete. Tableau technically doesn’t have a row count limitation, but row count is one of the factors that can impact performance.

Here are some of the key factors which will impact performance:

●        Row Count

●        Column Count

●        Data Type

●        Data Source Type

●        Computer Memory

Let’s dive into the question about row count and how SQL can help us.

First, you should know that even if you apply a data source or extract filter, those rows of data you are filtering out are still being brought into your local memory in Tableau before being removed.

Tableau’s website provides an Order of Operations image but it only starts at the Extract Filters level.

 
147-1.png
 

In reality, there are some data preparation steps in Tableau Desktop that occur before the steps listed above.

Those steps are roughly like this.

 
147-2.png
 

As you can see, all data from the tables you accessed is brought in, then it is processed, and only then is it filtered out.

That might have a huge impact on performance. Let’s say for instance you only need to access 2 million rows in a 50 million row table. Using this process you’ll have to process a query against all 50 million first. Not ideal, right?

This is where using Custom SQL (either in Tableau or your database) can be advantageous.

WHERE

The SQL WHERE expression acts like a filter.

For instance, if you only want to keep rows of data from a table that are tagged as active, you might write a Custom SQL Statement with a WHERE clause like this.

147-3.png

If you aren’t even sure where to get started writing Custom SQL, Tableau makes it easy to convert a table you are already connected to into a SQL statement. (Note, this works for Microsoft SQL Server database connections and some others. It won’t work for local files like Excel).

Simply drag the table you want to use into the Tableau Desktop data preparation view and select “Convert to Custom SQL” from the Data dropdown on the toolbar.

 
147-4.png
 

Utilizing this process will ensure the inactive rows get filtered at the database level, not the Tableau layer on your local computer.

 

GROUP BY

Another method you can utilize to reduce row count is the GROUP BY expression. If you’ve used Tableau Prep, GROUP BY is similar to the Aggregate tool. It allows you to aggregate some fields and group by others.

Here are some examples where I’ve used GROUP BY:

●        Calculating number of college credits earned by student per year. The table was at a quarterly level and I used GROUP BY to roll it up to a yearly level.

●        Rolling transaction data (e.g. restaurant sales) up from an item-by-item level to a receipt-level summary.

●        Rolling baseball data up from a pitch-by-pitch level to an “at-bat” level summary.

Let’s look at a simple way that we could write a SQL statement to solve for the first bullet point listed above.

SELECT

FROM [schemaname].[tablename]

[tablename].[StudentID] as [StudentID],

[tablename].[AcademicYear] as [AcademicYear],

SUM([tablename].[Credits]) as [AnnualCredits]

GROUP BY [StudentID],[AcademicYear]

The theoretical table above might contain 30 columns, but we ignored all the ones we didn’t need. We used the Sum function to aggregate credits and then grouped at the Student/Year level.

One of the best things about SQL is how flexible it is! You can even pair WHERE and GROUP BY clauses together if you want to limit rows and aggregate. It seems daunting at first because it requires learning a new language and syntax, but if you understand how Tableau calculations work, you are halfway to knowing SQL already!

*Monitor local performance as you utilize Custom SQL in Tableau Desktop. Tableau is not as efficient at processing SQL as a SQL Server. I have seen overuse of Custom SQL in Tableau Desktop lead to slow load times. If that happens you’ll want to look at moving your SQL Statements into a table or view in the SQL Server.

Want more SQL tips? Check out our upcoming webinar, Tableau vs. SQL: The Data Manipulation Showdown.

Want help implementing or improving your SQL? Book an hour of our resident expert Brian Pohl’s time via SQL Office Hours!

Tableau vs. SQL: The Data Manipulation Showdown - Webinar Recording

How to Create a Dynamic Measure with Multiple Number Formats in Tableau

0