By: Eric Parker and Brian Pohl
Eric Parker is a founding member of OneNumber. He lives in Seattle and has been teaching Tableau and Alteryx since 2014. If you have a question, feel free to reach out to him directly via email.
Brian Pohl is a credentialed SQL and Snowflake expert. He’s worked with some of the largest organizations in the world crafting solutions to problems they long ago wrote off as unsolvable.
I was recently working with a client when we hit a roadblock. We came across a situation where we needed to create a multi-row formula and Tableau Desktop is not equipped to do that. Let me walk you through the situation.
The client was looking to understand how long it took for employees to be promoted. They wanted to break down that average time to get promoted by a number of dimensions including by Career Level and by Employee.
In a simplified format we were dealing with 2 primary tables.
Employee Transaction
The Employee Transaction table returns a row of data for each major employee transaction (e.g. hiring, promotions, terminations).
Employee Details
The Employee Details table contains basic dimensional information about each employee.
Let’s say we want to put together a table which displays Promotion Count and Average Time Before Promotion. (Because this sample dataset is so small these will be low figures).
I’ll start by joining the tables together in Tableau.
Now, we’ll need some calculations.
Promotion Count:
You might be wondering what the MID functions are for. In the situation we worked on, someone could change career levels without being promoted if they transferred to another organization but remained at the same number level.
For instance, in the two records below, the career level changed but the number at the end stayed the same. So even though there are 9 records with the label “Career Level Change” only 7 are truly promotions.
Now I want to calculate average time before promotion. In order to return the right value, I need to find the difference between their last promotion (or hire) date and the date of their current promotion.
For simplification, here is the Employee Transaction table sorted by Employee ID and then EffectiveDate.
So for Employee 101, their time before their first promotion should be the difference between 1/1/2013 and 12/30/2015 or nearly 3 years.
Unfortunately, Tableau calculations have no native way of creating multi-row calculations to solve this problem (unless you show all the data in a worksheet and use a LOOKUP table calculation function but that severely restricts what kinds of visuals you can build).
This is where Custom SQL can be so helpful. SQL lets us use window functions. A window function is a function used in the SELECT part of a SQL query that allows you to reference values in other rows.
Before we use a window function, we start with this SQL query:
SELECT
EmployeeID
,EffectiveDate
,Period
,TransactionType
,CurrentLevel
,ProposedLevel
,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT) OR TransactionType = ‘Hire’
THEN EffectiveDate
ELSE NULL
END AS PromoDate
FROM EmployeeTransaction
The CASE statement in this query returns a new column, PromoDate, looking like this:
This PromoDate column returns the same value as EffectiveDate, but only for rows that mark a promotion or rows that mark a hire. It uses RIGHT to retrieve the numerical value from the job levels and then compares their values. Note that it is blank in the two rows where the Employee’s title changed, but they did not receive a promotion (outlined in red above). Next, we will remove the non-promotion/hire rows:
SELECT
*
FROM (
SELECT
EmployeeID
,EffectiveDate
,Period
,TransactionType
,CurrentLevel
,ProposedLevel
,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT) OR TransactionType = ‘Hire’
THEN EffectiveDate
ELSE NULL
END AS PromoDate
FROM EmployeeTransaction
) et
WHERE PromoDate IS NOT NULL
Now, we’re going to use the window function I mentioned earlier. The structure of a window function is like this:
AGGREGATION(ColumnName) OVER (ORDER BY OrderColumns… PARTITION BY PartitionColumns…)
While window functions can accept the aggregation functions you’re used to, like SUM, MIN, and MAX, there are also aggregations that are unique to window functions. To solve our problem with the promotion data, we will use LAG.
LAG will retrieve the PromoDate from the previous row. We partition the data by EmployeeID - this means LAG will only grab values within the scope of a single EmployeeID, never referencing a PromoDate for a different employee. The data is ordered by EffectiveDate, which defines what we mean by the “next” or “previous” row.
SELECT
*
,LAG(PromoDate) OVER (PARTITION BY EmployeeID ORDER BY EffectiveDate) AS PreviousPromoDate
FROM (
SELECT
EmployeeID
,EffectiveDate
,Period
,TransactionType
,CurrentLevel
,ProposedLevel
,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT)
THEN EffectiveDate
ELSE NULL
END AS PromoDate
FROM EmployeeTransaction
) et
WHERE PromoDate IS NOT NULL
Nice! The PreviousPromoDate column grabs the PromoDate from the previous row, partitioned by the EmployeeID. But for the first row for each employee (the hire), it is blank, because there is no “previous row” for it to reference. That’s ok, because it wouldn’t make sense to calculate this on the hire date anyway.
Now that the data has come together, you may have spotted what the strategy will be. We can calculate the difference between PromoDate and PreviousPromoDate to determine how long elapsed before this promotion. This data can be joined back to our original dataset, so we don’t have to modify any of our existing work. Let’s bring this Custom SQL query into our Tableau workbook with a Left Join.
Here is the data in Tableau. Note that there are many null values in the PromoDate and PreviousPromoDate columns. That’s ok - for those rows, there is no promotion duration that we need to calculate.
Let’s calculate the promotion times:
I’ve pulled this into my table, and voilà! Using the combined power of SQL and Tableau, we are able to create multi-row functions, retrieving and aggregating values across sets of rows that we define.
Want help with your own calculations? Sign up for a SQL Office Hour or Tableau Office Hour to get expert help!