Migrate to features
parameter
features
is a new parameter Tecton introduces to consolidate all types of
features into a single parameter. It makes your feature view definition more
readable, easier to iterate and simplify the feature view definition. This
document outlines the steps required to migrate existing feature views to
utilize the new features
parameter.
General Guidanceβ
- The
features
parameter will become the mandatory way to define features in future release, so it is strongly recommended to migrate to it now. - Incremental migration is supported. You can migrate the entire repository at once or object by object.
- This migration is not tied to the SDK upgrade. You can migrate after upgrading the SDK.
- Avoid making other changes while migrating to use the
features
parameter to prevent encountering errors.
SDK Referenceβ
Please check Attribute
and Aggregate
for SDK
reference.
1. Migrate Entityβ
Previously, entities only contained the join key names. To use the new
features
parameter, the join key needs to be typed using the Field
class.
Exampleβ
Before:
user_entity = Entity(name="user", join_keys=["user_id"])
After:
user_entity = Entity(name="user", join_keys=[Field("user_id", String)])
2. Migrate Batch and Stream Feature View with Aggregationβ
2.1 Specify Timestampβ
Ensure that the timestamp_field
is specified in the feature view.
2.2 Remove Existing Schemaβ
If your existing feature view uses the schema
parameter, remove it first. If
not, skip this step.
2.3 Replace aggregations
with features
β
Replace aggregations
argument with the features
parameter and supply a list
of Aggregate
objects.
Note: Aggregate
includes all parameters that Aggregation
has, with an
additional parameter called column_dtype
to specify the type of the column
thatβs being aggregated.
Exampleβ
Before:
@batch_feature_view(
# ...
aggregations=[
Aggregation(column="value", function="count", time_window=TimeWindow(window_size=timedelta(days=7))),
],
schema=[
Field("user_id", String),
Field("value", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
After:
@batch_feature_view(
# ...
features=[
Aggregate(
column="value", column_dtype=Int64, function="count", time_window=TimeWindow(window_size=timedelta(days=7))
),
],
timestamp_field="timestamp",
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
3. Migrate Batch and Stream Feature View without Aggregationβ
3.1 Specify Timestampβ
Ensure that the timestamp_field
is specified in the feature view..
3.2 Remove Existing Schemaβ
If your existing feature view uses the schema parameter, remove it first. If
your existing feature view doesn't have schema
, skip this step.
3.3 Add features
β
Add the features
parameter and supply a list of Attribute
objects.
Exampleβ
Before
@batch_feature_view(
# ...
schema=[
Field("user_id", String),
Field("value", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
After
@batch_feature_view(
# ...
features=[
Attribute(column="value", column_dtype=Int64),
],
timestamp_field="timestamp",
)
def feature_view(input):
return f"""
SELECT user_id, value, timestamp FROM {input}
"""
Note: Attribute
includes column
and column_dtype
, which can be translated
similarly from Field
in the schema
if your feature view used it.
4. Migrate Realtime Feature Viewβ
4.1 Remove Existing Schemaβ
Remove the schema
parameter from the realtime_feature_view
4.2 Add featuresβ
Add the features
parameter with a list of Attribute
objects. Attribute
includes column
and column_dtype
, which can be translated similarly from
Field
in the schema
.
Exampleβ
Before
@realtime_feature_view(
# ...
schema=[
Field("feature1", String),
Field("feature2", Int64),
]
)
def feature_view(input):
return input
After
@realtime_feature_view(
# ...
features=[
Attribute(column="feature1", column_dtype=String),
Attribute(column="feature2", column_dtype=Int64),
],
)
def feature_view(input):
return input
4. Migrate Feature Tableβ
4.1 Specify Timestampβ
Ensure that the timestamp_field
is specified in the feature view..
4.2 Remove Existing Schemaβ
Remove the schema
parameter from the FeatureTable
4.3 Add featuresβ
Add the features
parameter with a list of Attribute
objects. Attribute
includes column
and column_dtype
, which can be translated similarly from
Field
in the schema
.
Exampleβ
Before
ft = FeatureTable(
# ...
schema=[
Field("user_id", String),
Field("feature1", Int64),
Field("feature2", Int64),
Field("timestamp", Timestamp),
],
)
def feature_view(input):
return input
After
@realtime_feature_view(
# ...
features=[
Attribute(column="feature1", column_dtype=Int64),
Attribute(column="feature2", column_dtype=Int64),
],
timestamp_field="timestamp",
)
def feature_view(input):
return input