Monitoring Materialization
Materialization processes are run by Tecton to keep production features up-to-date. Monitoring these materialization processes helps ensure that feature pipelines are continuously delivering data to your models.
Tecton offers various tools to facilitate materialization monitoring, including dashboards in the Web UI, email alerts, and the Metrics API.
Monitor Spark Stream Materialization​
For Stream Feature Views, Tecton orchestrates Spark Structured Streaming jobs to continuously update feature values when new data arrives. Tecton Monitoring protect against scenarios where your models are operating on out-of-date features by helping identify scenarios where the stream process is down, falling behind, or otherwise failing to output updated feature values.
View Job Status​
To view the status of stream jobs, navigate to the Materialization tab for the relevant Stream Feature View.
For currently running jobs, the best way to track job progress is to follow the details link from the jobs table to see the detailed job information from your Spark compute provider.
Materialization Failure Alerts​
If alert_email
is set for the Feature View, Tecton will send email alerts when
failures occur for batch materialization processes.
The types of failure alerts are:
FeatureViewBatchMaterializationFailures
: stream materialization jobs have failed 2 or more times.FeatureViewTooManyFailures
: stream materialization jobs have failed too many times based on the retry policy, and Tecton will no longer make any retry attempts.
Monitor Stream Progress​
Even if the stream job is running, it may be failing to produce up-to-date features. For example, the stream job may be falling behind new events because it is under-resourced, or the upstream data source itself may have no events.
The Stream Feature View Monitoring tab contains several metrics to help assess the the progress of your Stream Feature View.
These metrics are also available through the Metrics API. Ingesting these metrics into your Application Performance Monitoring system allows creating your own custom dashboards and alerts.
Processed Event Age​
Processed Event Age is the key metric for understanding how up to date your features are.
Processed Event Age measures the difference between when a microbatch completes writing to the online store, and the timestamps of the features processed in that microbatch. This metric is an indication of how long it takes for an event to make it all the way to being ingested to the feature store. By definition, Processed Event Age will include both upstream processing time, in addition to the time taken by Tecton to transform and persist the event.
If your Processed Event Age suddenly increases, then either the stream processing is falling behind, or your upstream data source is outputting stale records. To understand if the stream process is falling behind, look for a correlated increase in microbatch processing latency or input rate.
Input Rate​
Input rate helps identify if there is a change in records being output by the upstream data source. Input Rate is the rate of messages read from the stream.
Online Store Write Rate​
Online Store Write Rate is the number of records being written to the Online Store as the output of the stream feature pipeline.
The Online Store Write Rate may be lower than the Input Rate due to:
- Filtering logic in the Data Source post-processor or Feature View transformation logic
- Multiple records for the same entity ID arriving in the same microbatch, so that the events are aggregated before write
Micro-batch Processing Latency​
Micro-batch processing latency shows the time between complete micro-batches.
By default, this number should remain below 30 seconds since Stream Feature Views micro-batches are 30 seconds long. Above 30 seconds indicates that the stream processing job is under-resourced and will fall behind.
If using continuous processing, then micro-batch latency should be close to 0.
Monitor Online Serving Feature Freshness​
Feature Freshness measures how up-to-date the stream feature data is. If no new data is coming in on the stream, or the stream feature pipeline is falling behind, then the freshness measurement will increase.
Specifically, Online Serving Feature Freshness measures the most recent timestamp written to the Online Store. Because this metric is polled periodically, the value reported here may be higher than the true value.
The Online Serving Feature Freshness value for a Stream Feature View can be viewed on the Monitoring tab.
To receive alerts if the freshness value becomes too high, set
monitor_freshness=True
to enable alerting, and specify the appropriate
threshold with expected_feature_freshness
. See the
sdk reference
for more details on these parameters.
Monitor Batch Materialization​
Batch Materialization processes are run on a scheduled cadence as defined in the Batch or Stream Feature View. Tecton Monitoring protect against scenarios where your models are operating on out-of-date features by helping identify scenarios where batch jobs are failing to complete or output correct features.
View job status​
To view the status and history of batch jobs, navigate to the Materialization tab for the relevant Feature View.
For currently running jobs, the best way to track job progress is to follow the details link from the jobs table to see the detailed job status. Depending on the compute platform used to run the job, this link may take you to an external page.
Additionally the Online Store Write Rate chart under the Feature View Monitoring tab shows how many records are written to the Online Store per second. If you have an idea of the total number of records your job needs to output, viewing the writes per second can give you an idea of how long the job will take to complete.
Materialization Failure Alerts​
If alert_email
is set for the Feature View, Tecton will send email alerts when
failures occur for batch materialization processes.
The types of failure alerts are:
FeatureViewBatchMaterializationFailures
: batch materialization jobs have failed 2 or more times.FeatureViewTooManyFailures
: batch materialization jobs have failed too many times based on the retry policy, and Tecton will no longer make any retry attempts.
Even if the job succeeds, the data output may be incorrect. See how Tecton implements Data Quality Validations to address these scenarios.