Data Quality Validations
This feature is currently in Public Preview.
- Available for Tecton on Databricks and EMR. Coming to Rift in a future release.
Data Quality Validations help detect feature data issues once a Feature View has
been materialized. If validation results indicate that feature data failed to
meet expectations during a materialization interval, an alert email will be sent
to an email provided as alert_email
in the Feature View declaration.
Terminology
- Data Quality Metrics are statistics that describe feature values output by a Feature View during materialization. See Data Quality Metrics for more information.
- Expectations are verifiable assertions about metrics. Expectations can be based on metrics. For example, “Expect that <100% of values for a given feature are null”.
- Validations are the process of validating that the set of expectations has been met when materializing a Feature View. Validations can either pass or fail.
- Alerts notify the specified user when validation fails.
This document covers Data Quality Expectations, Validations, and Alerts.
Default Expectations
By default, Tecton defines the following expectations for all Batch and Stream Feature Views.
For Stream Feature Views, Data Quality Metrics and Expectations only apply to offline materialized feature data.
Expectation | Applicable to | Explanation |
---|---|---|
Feature View row count > 0 | Feature Views | Expect feature rows to be produced when a Feature View is materialized |
A feature has any non-null values | All types of features | Expect a feature to have at least one non-null value, when there are feature rows. |
A feature has any non-zero values | Numerical features | Expect a feature to have at least one non-zero value, when there are feature rows. |
A feature has any non-empty values | String or Array features | Expect a feature to have at least one non-empty-string/array value when there are feature rows. |
Enable Validations and Alert Emails
Validation can be disabled per Feature View, by setting
skip_default_expectations=True
in a Feature View declaration.
Email Alerting is enabled when alert_email
is specified in a Batch or Stream
Feature View definition. The alert email will be sent out at most once in 6
hours per Feature View. If you would like to disable all email alerts for a
Feature View, including other types of materialization alerts, leave this field
unset.
Viewing Validation Results
You can view the validation results for all Feature Views in a workspace by selecting Data Quality in the left navigation panel in Tecton web UI.