How to Create a Multi-Row Formula in SQL (Because Tableau Can't)

By: Eric Parker and Brian Pohl

Eric and Brian Headshot v2.PNG

Eric Parker is a founding member of OneNumber. He lives in Seattle and has been teaching Tableau and Alteryx since 2014. If you have a question, feel free to reach out to him directly via email.

Brian Pohl is a credentialed SQL and Snowflake expert. He’s worked with some of the largest organizations in the world crafting solutions to problems they long ago wrote off as unsolvable.

I was recently working with a client when we hit a roadblock. We came across a situation where we needed to create a multi-row formula and Tableau Desktop is not equipped to do that. Let me walk you through the situation.

The client was looking to understand how long it took for employees to be promoted. They wanted to break down that average time to get promoted by a number of dimensions including by Career Level and by Employee.

In a simplified format we were dealing with 2 primary tables.

 

Employee Transaction

The Employee Transaction table returns a row of data for each major employee transaction (e.g. hiring, promotions, terminations).

 
152-1.png
 

 

Employee Details

The Employee Details table contains basic dimensional information about each employee.

 
152-2.png
 

 

Let’s say we want to put together a table which displays Promotion Count and Average Time Before Promotion. (Because this sample dataset is so small these will be low figures).

I’ll start by joining the tables together in Tableau.

152-3.png


Now, we’ll need some calculations.

Promotion Count:

 
152-4.png
 

You might be wondering what the MID functions are for. In the situation we worked on, someone could change career levels without being promoted if they transferred to another organization but remained at the same number level.

For instance, in the two records below, the career level changed but the number at the end stayed the same. So even though there are 9 records with the label “Career Level Change” only 7 are truly promotions.

 
152-5.png
 

Now I want to calculate average time before promotion. In order to return the right value, I need to find the difference between their last promotion (or hire) date and the date of their current promotion.

For simplification, here is the Employee Transaction table sorted by Employee ID and then EffectiveDate.

 
152-6.png
 

So for Employee 101, their time before their first promotion should be the difference between 1/1/2013 and 12/30/2015 or nearly 3 years.

Unfortunately, Tableau calculations have no native way of creating multi-row calculations to solve this problem (unless you show all the data in a worksheet and use a LOOKUP table calculation function but that severely restricts what kinds of visuals you can build).

This is where Custom SQL can be so helpful. SQL lets us use window functions. A window function is a function used in the SELECT part of a SQL query that allows you to reference values in other rows.

Before we use a window function, we start with this SQL query:

SELECT
     EmployeeID
    ,EffectiveDate
    ,Period
    ,TransactionType
    ,CurrentLevel
    ,ProposedLevel
    ,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT) OR TransactionType = ‘Hire’
        THEN EffectiveDate
        ELSE NULL
        END AS PromoDate
FROM EmployeeTransaction

The CASE statement in this query returns a new column, PromoDate, looking like this:

 
152-7.png
 

This PromoDate column returns the same value as EffectiveDate, but only for rows that mark a promotion or rows that mark a hire. It uses RIGHT to retrieve the numerical value from the job levels and then compares their values. Note that it is blank in the two rows where the Employee’s title changed, but they did not receive a promotion (outlined in red above). Next, we will remove the non-promotion/hire rows:

SELECT
     *
FROM (

    SELECT
         EmployeeID
        ,EffectiveDate
        ,Period
        ,TransactionType
        ,CurrentLevel
        ,ProposedLevel
        ,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT) OR TransactionType = ‘Hire’
            THEN EffectiveDate
            ELSE NULL
            END AS PromoDate
    FROM EmployeeTransaction
) et
WHERE PromoDate IS NOT NULL

 
152-8.png
 

Now, we’re going to use the window function I mentioned earlier. The structure of a window function is like this:

AGGREGATION(ColumnName) OVER (ORDER BY OrderColumns… PARTITION BY PartitionColumns…)

While window functions can accept the aggregation functions you’re used to, like SUM, MIN, and MAX, there are also aggregations that are unique to window functions. To solve our problem with the promotion data, we will use LAG.

LAG will retrieve the PromoDate from the previous row. We partition the data by EmployeeID - this means LAG will only grab values within the scope of a single EmployeeID, never referencing a PromoDate for a different employee. The data is ordered by EffectiveDate, which defines what we mean by the “next” or “previous” row.

SELECT
     *
    ,LAG(PromoDate) OVER (PARTITION BY EmployeeID ORDER BY EffectiveDate) AS PreviousPromoDate
FROM (
    SELECT
         EmployeeID
        ,EffectiveDate
        ,Period
        ,TransactionType
        ,CurrentLevel
        ,ProposedLevel
        ,CASE WHEN CAST(RIGHT(ProposedLevel,1) AS INT) > CAST(RIGHT(CurrentLevel,1) AS INT)
            THEN EffectiveDate
            ELSE NULL
            END AS PromoDate
    FROM EmployeeTransaction
) et
WHERE PromoDate IS NOT NULL

152-9.png

Nice! The PreviousPromoDate column grabs the PromoDate from the previous row, partitioned by the EmployeeID. But for the first row for each employee (the hire), it is blank, because there is no “previous row” for it to reference. That’s ok, because it wouldn’t make sense to calculate this on the hire date anyway.

Now that the data has come together, you may have spotted what the strategy will be. We can calculate the difference between PromoDate and PreviousPromoDate to determine how long elapsed before this promotion. This data can be joined back to our original dataset, so we don’t have to modify any of our existing work. Let’s bring this Custom SQL query into our Tableau workbook with a Left Join.

 
152-10.png
 

Here is the data in Tableau. Note that there are many null values in the PromoDate and PreviousPromoDate columns. That’s ok - for those rows, there is no promotion duration that we need to calculate.

 
152-11.png
 

Let’s calculate the promotion times:

 
152-12.png
 

I’ve pulled this into my table, and voilà! Using the combined power of SQL and Tableau, we are able to create multi-row functions, retrieving and aggregating values across sets of rows that we define.

152-13.png

Want help with your own calculations? Sign up for a SQL Office Hour or Tableau Office Hour to get expert help!

Adding Arrows and Colors to Tableau Summary Tiles

How to Create a SQL LISTAGG Function with Google Sheets

0