By: Eric Parker
Eric Parker lives in Seattle and has been teaching Tableau and Alteryx since 2014. He's helped thousands of students solve their most pressing problems. If you have a question, feel free to reach out to him directly via email. You can also sign up for a Tableau Office Hour to work with him directly!
While working with personally identifiable information, you may need to suppress sensitive data. Let’s say that you are working with healthcare data and want to suppress patient names.
First, you’ll want to create an aggregate step to get a unique list of names.
Next, you will output the list as a .csv.
After getting a row count of names that need suppression, you can use a tool (like Mockaroo) to generate a list of random names.
Next, you’ll put the name columns side-by-side in a table.
Last, you can join the generated file into the original data flow, join on the original name field, and after joining remove the original name field. Now, all that is left are the generated names.
If you do this, make sure you remove the original name from the workflow before outputting your table! This works great for any field (not just name). Item descriptions, IDs, dates, amounts. Any of those can be randomized in the same manner.
This exact approach is great for a one-time anonymization but won’t automate well if the customer names grow and change often. However, this same approach can be scaled using other tools for complete and automated anonymization of data.
Looking for help with your own Tableau project? Book a Tableau Office Hour!