Data Security: JSON & Dynamic Data Masking (Part 1 of 2)

Sunny Malik
4 min readAug 1, 2021

As part of the Data Security and Governance Suite Snowflake added host of features like

  1. Dynamic Data Masking
  2. Row- Level Security
  3. Access History (Metadata View)
  4. Data Classification (Announced in Summit 2021) and Tags
  5. End-to-End encryption of data during flight and in store using Tri-Secret-Secure feature (you can also bring your own encryption key).

In this blog we shall discuss a little more details about Dynamic Data Masking

Dynamic Data masking would allow users to mask the value of a particular column by using re-usable component known as “Policy”

This powerful features can be applied to your datasets as it is ingested into Snowflake to provide security on data as it lands into Snowflake. These policies can be easily associated to Structured or Semi-Structured data like JSON data.

You can bring in your data using Snowflake Kafka connector or use Snowpipe to ingest telemetry(near real-time) or bring in data in batch mode, but it’s important to have right security model built in for your sensitive data. Only authorized users should only we able to access and see the datasets (least access principle) whereas the growing trend is use generic service accounts during Ingestion or ETL process which may un-intentionally expose your sensitive data, leading to compliance and governance issues.

Let’s see how can we solve for this problem by creating and applying policies on semi-structured data for enhanced data security and best practices around Dynamic Data masking.

Let’s get started…

Create a table to host the semi-structured data and load sample data

As a re-usable component along with policies we can create a UDF and add the business logic to protect sensitive column and place controls around which user/role would have the ability to see the unmasked dataset.

This UDF will take the variant column as input parameter and return the masked values if found on the semi-structured dataset

You can now optionally validate the UDF response by executing the query on semi-structured data. Make sure to grant the usage privileges on the UDF to the appropriate role.

Now it’s time to create the making policy and attach the policy to a table column.

Once the policy is applied to the table successfully we can validate the masking on data in the table. It’s important to note that the masking policies would be applied as per the business logic specified in the policy definition.

We can apply the same policy to multiple tables but since this policy is specialized — used on VARIANT with identified column to mask data it would be difficult. So try to define the policy requirements to be generic (if possible) before creating specialized policies.

Often times we come across a scenario where the semi-structured data is hierarchical having sensitive data at multiple levels.

You can choose to mask the entire payload in your policy definition or create another utility to first identify the attributes and their corresponding levels in the payload before applying a masking logic. More to follow on this scenario (stay tuned)

Best practices on Dynamic Data Masking

  1. Define Sensitivity and Identify Sensitive columns from your datasets and classify them (e.g. Public, Confidential, Restricted, Highly Restricted etc..)
  2. Start by creating generic policies which can be applied on multiple objects in the account.
  3. Having a separate Role delegated for managing policies in your account is recommended (e.g Policy Admin)
  4. Create policies using proper naming conventions.
  5. Validate the effect of policy on the object before providing access on the users

Follow along on the topic for masking semi-structured data with hierarchies: JSON & Dynamic Data Masking (Part 2 of 2)

………………It’s Snowing on a SunnyDay…!!! ……………….

--

--