Migrate to features parameter
features is a new parameter Tecton introduces to consolidate all types of
features into a single parameter. It makes your feature view definition more
readable, easier to iterate and simplify the feature view definition. This
document outlines the steps required to migrate existing feature views to
utilize the new features parameter.
General Guidanceβ
- The
featuresparameter will become the mandatory way to define features in future release, so it is strongly recommended to migrate to it now. - Incremental migration is supported. You can migrate the entire repository at once or object by object.
- This migration is not tied to the SDK upgrade. You can migrate after upgrading the SDK.
- Avoid making other changes while migrating to use the
featuresparameter to prevent encountering errors.
SDK Referenceβ
Please check Attribute and Aggregate for SDK
reference.
1. Migrate Entityβ
Previously, entities only contained the join key names. To use the new
features parameter, the join key needs to be typed using the Field class.
Exampleβ
Before:
user_entity = Entity(name="user", join_keys=["user_id"])
After:
user_entity = Entity(name="user", join_keys=[Field("user_id", String)])
2. Migrate Batch and Stream Feature View with Aggregationβ
2.1 Specify Timestampβ
Ensure that the timestamp_field is specified in the feature view.
2.2 Remove Existing Schemaβ
If your existing feature view uses the schema parameter, remove it first. If
not, skip this step.
2.3 Replace aggregations with featuresβ
Replace aggregations argument with the features parameter and supply a list
of Aggregate objects.
Note: Aggregate includes all parameters that Aggregation has, with an
additional parameter called column_dtype to specify the type of the column
thatβs being aggregated.
Exampleβ
Before:
@batch_feature_view(
# ...
aggregations=[
Aggregation(column="value", function="count", time_window=TimeWindow(window_size=timedelta(days=7))),
],
schema=[
Field("user_id", String),
Field("value", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
After:
@batch_feature_view(
# ...
features=[
Aggregate(
column="value", column_dtype=Int64, function="count", time_window=TimeWindow(window_size=timedelta(days=7))
),
],
timestamp_field="timestamp",
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
3. Migrate Batch and Stream Feature View without Aggregationβ
3.1 Specify Timestampβ
Ensure that the timestamp_field is specified in the feature view..
3.2 Remove Existing Schemaβ
If your existing feature view uses the schema parameter, remove it first. If
your existing feature view doesn't have schema, skip this step.
3.3 Add featuresβ
Add the features parameter and supply a list of Attribute objects.
Exampleβ
Before
@batch_feature_view(
# ...
schema=[
Field("user_id", String),
Field("value", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
After
@batch_feature_view(
# ...
features=[
Attribute(column="value", column_dtype=Int64),
],
timestamp_field="timestamp",
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
Note: Attribute includes column and column_dtype, which can be translated
similarly from Field in the schema if your feature view used it.
4. Migrate Realtime Feature Viewβ
4.1 Remove Existing Schemaβ
Remove the schema parameter from the realtime_feature_view
4.2 Add featuresβ
Add the features parameter with a list of Attribute objects. Attribute
includes column and column_dtype, which can be translated similarly from
Field in the schema.
Exampleβ
Before
@realtime_feature_view(
# ...
schema=[
Field("feature1", String),
Field("feature2", Int64),
]
)
def feature_view(input):
return input
After
@realtime_feature_view(
# ...
features=[
Attribute(column="feature1", column_dtype=String),
Attribute(column="feature2", column_dtype=Int64),
],
)
def feature_view(input):
return input
4. Migrate Feature Tableβ
4.1 Specify Timestampβ
Ensure that the timestamp_field is specified in the feature view..
4.2 Remove Existing Schemaβ
Remove the schema parameter from the FeatureTable
4.3 Add featuresβ
Add the features parameter with a list of Attribute objects. Attribute
includes column and column_dtype, which can be translated similarly from
Field in the schema.
Exampleβ
Before
ft = FeatureTable(
# ...
schema=[
Field("user_id", String),
Field("feature1", Int64),
Field("feature2", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return input
After
@realtime_feature_view(
# ...
features=[
Attribute(column="feature1", column_dtype=Int64),
Attribute(column="feature2", column_dtype=Int64),
],
timestamp_field="timestamp",
)
def feature_view(input):
return input