Streamlining Data Metrics and Alerts in Snowflake with Streamlit
Snowflake’s powerful data warehousing capabilities combined with Streamlit’s intuitive app-building framework offers a robust solution for managing data metrics and setting up alerts. In this blog, we’ll explore how to create a Streamlit app that simplifies these processes within Snowflake.
Creating a Streamlit App for Snowflake Metrics and Alerts
Key Data Quality Challenges
- Compliance Risk: Organizations, especially in regulated industries like banking, face risks if they cannot verify the state of their data for compliance reporting
- Business Risk: Late or failed data jobs can lead to missed SLAs and impact downstream consumers and clients
- Credibility Issues: Lack of trust in data quality, especially when multiple teams are involved in managing data ingestion and processing
- Consistency: Different data quality requirements across teams can lead to inconsistencies
- Privacy and Security: Implementing data quality checks without compromising sensitive data
- Cost Concerns: Implementing data quality monitoring at scale can have cost implications
How Data Metric Functions Help
Snowflake’s Data Metric Functions provide several capabilities to address these challenges:
- Predefined Metrics: Out-of-the-box metrics for standard checks like uniqueness and null values
- Custom Metrics: Ability to create custom metrics for specific business rules
- Configurable Measurements: Flexibility to set measurement frequency based on data update patterns.
- Monitoring and Alerting: Built-in capabilities to monitor data quality and set up alerts for issues
- Early Detection: Helps detect data quality issues earlier in the data lifecycle
- Comprehensive Checks: Allows scanning of entire datasets instead of relying on sampling techniques
- Performance Optimization: Leverages Snowflake’s computing capabilities to perform data quality checks efficiently
- Security Enhancement: Processes data natively within Snowflake, reducing the need for data movement and associated security risks
By providing these features, Snowflake’s Data Metric Functions aim to help data engineers, data stewards, and administrators maintain high data quality while addressing common challenges around compliance, credibility, consistency, and cost-effectiveness.
Setting Up the Environment
First, ensure you have the necessary privileges and connections set up in Snowflake
For creating ALERTs in Snowflake you’ll need:
EXECUTE ALERT
privilege on the accountUSAGE
andCREATE ALERT
privileges on the schemaUSAGE
privilege on the database and warehouse
For DATA METRIC FUNCTIONS you’ll need:
CREATE DATA METRIC FUNCTION
: This privilege is required on the schema where you want to create the DMFEXECUTE DATA METRIC FUNCTION
: This is a global (account-level) privilege that enables using serverless compute resources when calling a DMFUSAGE
: This privilege is needed on the data metric function itself to enable calling the DMF
Log into your Snowflake account and run the setup script to create a dedicated database, schema, and required objects
Building the App Interface
Using the code from app.py , update your Streamlit application. Upload all other library files into your Snowflake Stage -DMF_FILES
(you can find more details on how to upload Custom packages/.py files in Streamlit in my other blog — Stepper Bar in Snowflake Streamlit)
Once we import all the .py files from stage, we can stitch everything together. (An abstract version of the app.py code is shown below for reference)
In the below code snippet, We initiate a Snowflake session, setup up a background image for our app, and define our app pages. We make use of the imported .py files we uploaded to Snowflake stage and orchestrate the flow to create and manage Data Metric Functions in Snowflake.
In the “Create DMFs” page, you have the option to create a new Custom DMF and “Test” on the data before “Associating” it with compatible table column (which is all baked into the app). You can set the “Schedule” to run these DMFs in the periodical manner using CRON schedule, time based schedule (1 Minute,..) or my personal favorite option “TRIGGER_ON_CHANGES” to manage your platform Data Quality needs.
Setting Up Alerts
Once you have associated your DMFs in the app, you have the option to setup up System alerts where a notification is triggered when the <alert_condition> is met (which in this scenario would be defined as a data violations or a threshold breach).
The current implementation would send out an email notification for an alert which can optionally be configured to support Slack or PagerDuty (natively supported in Snowflake)
Main App Logic
Complete code can be found on my Github account location.
Best Practices and Tips
- Define Clear Metrics: Ensure your metric functions are well-defined and capture meaningful data points
- Optimize Alert Schedules: Balance between timely notifications and system load when setting alert schedules
- Use Custom Roles: Create and use custom roles for managing alerts to maintain organizational security best practices and policies
- Leverage Snowflake Functions: Utilize Snowflake’s built-in functions like SNOWFLAKE.ALERT.LAST_SUCCESSFUL_SCHEDULED_TIME() for more sophisticated alert conditions
- Monitor Alert History: Implement a feature to view alert history for better tracking and analysis
Conclusion
By combining Streamlit’s user-friendly interface with Snowflake’s robust data processing capabilities, you can create a powerful tool for managing data metrics and alerts. This approach not only simplifies the process of setting up and monitoring data-driven alerts but also provides a centralized platform for data teams to collaborate and maintain oversight of critical business metrics. Remember to continuously refine your metrics and alerts based on evolving business needs and data patterns.
With this Streamlit app, you’ll have a flexible and scalable solution for keeping your finger on the pulse of your data in Snowflake.
……………….………………It’s Snowing on a SunnyDay…!!! ………………………….