Protecting sensitive data
Matillion ETL software integrates with virtually any data source, ingests data into leading cloud data platforms, and transforms data so it can be used by leading analytics and BI tools and synced back to the business.
Matillion offers selected Universities the usage of the Software for student enablement on data engineering.
ALTR (al·tr [/ôltər/]) is the only automated data access control and security solution that allows you to easily govern and protect sensitive data in the cloud – so you can gain more insights and value from more data in less time.
Step by step approach to securing your data
Protecting sensitive data is becoming a critical aspect of any organization’s data processes. Sensitive information, such as financial data, personal information, and confidential business information, must be kept secure to prevent unauthorized access, theft or misuse.
Of course, by implementing robust security measures and technologies, such as data loss prevention tools, network protection, and strong access controls, companies can significantly reduce the risk of a breach and protect sensitive data.
Tokenization can come on top of ‘traditional’ security measures to protect sensitive data, by physically replacing the original data at the database level using a unique identifier or token. This token can be used to revert the process to see the original data on the fly.
Sounds like masking data? Yes and no… While the data remains clear when applying a mask, tokenization physically alters the underneath data… So, it goes one step further than simply masking data.
Detokenization is the process of reversing tokenization by taking the token and returning the original data. This process is typically only done in secure systems where the data is needed for legitimate purposes, such as for a financial transaction.
Codex Consulting prioritizes protection of sensitive data and is dedicated to implementing tokenization and detokenization techniques in a straightforward manner, without the need for complex protocol.
In this blog, ALTR (https://www.altr.com) is the go-to solution for data security, data governance and monitoring. Matillion is the data integration and productivity tool for streamlining data pipelines and delivering promised protection to organizations. Snowflake is the Data Cloud platform on which we want to add another layer of security and protection.
Therefore, our goal is to convey our expertise on seamlessly incorporating tokenization and detokenization to secure sensitive data within your Snowflake environment.
Let’s take the example where customer emails require protection and only specific roles have access to the clear data.
These are the steps of tokenization & detokenization:
Create an API integration.
Create an external function of Tokenization and grant the USAGE permission on the function to the PUBLIC role.
Create an external function of Detokenization and grant the USAGE permission on the function to the PUBLIC role.
Create stored procedures for masking policy.
Create an Orchestration pipeline in Matillion to invoke the tokenization and detokenization functions.
Finally, check the data with specific roles.
Tokenization
For initial setup, we create an API integration "ALTR_TOKENIZATION" in the Snowflake environment.
We create an external function called "ALTR_PROTECT_TOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function. We create an external function called "ALTR_PROTECT_TOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function.
We also create an external function called "ALTR_PROTECT_DETOKENIZE", and we grant the USAGE permission on the function to the PUBLIC role, allowing any user or role to use the function.
The purpose of this function is to detokenize sensitive data that has been previously tokenized using the ALTR_PROTECT_TOKENIZE function.
We create a stored procedure SP_MODIFY_MASKING which allows us to create a masking policy for a specific column in a table and applies different types of masking based on the value of the column. (Script in Appendix).
Once that’s done, we can then secure the data early in the pipeline. Let’s learn how to do it through a simple Matillion job.
Let’s create a script (SQL component in Matillion) to call the Snowflake Function we just created.
We run the script below in the Snowflake environment with component SQL script and will want to choose the email data we want to protect.
Now let’s check the email column in Snowflake. We can see that the email field is now protected.
But we also want to make sure that only authorized groups can see the data in clear.
Detokenization:
As per our observations, the email data has been physically modified in Snowflake so that only specific groups can access the unencrypted data while the data remains tokenized for others. Is it possible to reverse this process and restore the data to its original state? Yes, definitely!
These are the steps in Snowflake and ALTR:
For Initial setup of Detokenization, we run this script first in Snowflake Environment with Matillion (SQL component) to call the Detokenization function.
Run the following script in the SQL component:
After, we open ALTR and open the “Data Management” page under the “Data Configuration” section.
Click the column that is added to ALTR.
Remove this column from ALTR with the “Disconnect Column” button.
Add your new Column to ALTR with the “Add New” button on this page.
This column is the “PRIVACY”.”STAGING”.”CUSTOMER_DETAILS”.”EMAIL”
Run the following command in Snowflake to configure your masking policy for automatic detokenization.
In ALTR, open the “Locks” page under the “Data Policy” section.
Create a lock called “Allow Detokenization.”
Pick the “Snowflake” Application
Pick the “SYSADMIN” role (to allow that role to see plain text values)
Switch the “Tag” to “Column”
Pick your new email address field.
Set your masking policy to “No Mask.”
Now let’s check the email column in Snowflake. We can see that the email is now protected.
With CodexAdmin role, we see scrambled data:
With the SysAdmin role, we see plain text values.
Ultimately, tokenization and detokenization are effective and effortless using Matillion and ALTR.
The automation offered by these tools is remarkable and saves a lot of time for data engineers, allowing them to access and utilize cloud data in a matter of minutes.