Materialize Features
Materialization is an essential part of Tecton's operational ML features lifecycle management. It refers to the process of precomputing feature data using a feature pipeline, followed by publishing the results to either the Online or Offline Feature Store.
The main objective of materialization is to enable quick feature retrieval during training and inference, thereby reducing latencies and improving the efficiency of machine learning applications.
Types of Materialization​
Tecton handles backfill and steady-state materialization for batch and stream features based on your Feature View configuration.
Steady-state Materialization​
Steady-state Materialization refers to materialization being performed on new data arriving in real-time. Steady-state Materialization continuously occurs on all Feature Views where Materialization is enabled.
When a Feature View has materialization enabled, Tecton will schedule
steady-state materialization jobs on an ongoing basis in order to maintain fresh
feature values. The frequency of steady-state materialization is controlled by
the batch_schedule
parameter. If you use Delta for the offline store, Tecton
will run periodic background maintenance tasks on an ongoing basis with a 7-day
schedule to perform
optimize
and
vacuum
operations in order to optimize performance with file managements on your Delta
tables.
Backfill materialization​
Backfill refers to any materialization operations performed on data in the past. There are two Backfill operations.
The initial materialization of a Feature View is referred to as a bootstrap backfill. During a bootstrap materialization, existing raw data is processed into feature values.
When materialization is initially enabled for a Feature View, Tecton performs a
bootstrap materialization. The amount of data materialized during a bootstrap is
controlled by the feature_start_time
parameter.
Enabling Feature View materialization​
Every Batch and Stream Feature Views can enable materialization to the online
and/or offline store by setting online=True
and/or offline=True
in the
Feature View decorator parameters. These options are available for the following
types of Feature Views:
On-Demand Feature Views cannot be materialized since they are calculated only at request-time.
Determining if materialized feature data is being used when reading feature data​
When reading feature data using get_features_for_events()
,
get_features_in_range()
, get_online_features()
, or the GetFeatures
endpoint of the HTTP API, materialized feature data is used if all of the
following are true:
-
Your feature service is running in a live workspace
-
The constituent feature views have the option
offline=True
(when usingget_features_for_events()
orget_features_in_range()
) oronline=True
(when usingget_online_features()
or theGetFeatures
endpoint of the HTTP API) -
(Applies to
get_features_for_events()
andget_features_in_range()
only): You omitted thefrom_source
option or set it toFalse
Using get_online_features()
is not recommended in production. It's much slower
than the GetFeatures
endpoint of the HTTP API, and is not designed for
production workloads.
When reading feature data using get_features_for_events()
,
get_features_in_range()
or get_online_features()
, materialized feature data
is not used if any of the following are true:
-
Your feature service is running in a development workspace
-
Any of the constituent feature views have the option
offline=False
(when usingget_features_for_events()
orget_features_in_range()
) oronline=False
(when usingget_online_features()
or theGetFeatures
endpoint of the HTTP API) -
(Applies to
get_features_for_events()
andget_features_in_range()
only): You specifiedfrom_source=True
Monitoring​
Tecton provides tools to monitor and debug production Feature Views via the Web UI, SDK, and CLI. More information on monitoring is available in Monitoring Materialization.