Aggregation Windows
Tecton's Aggregation Engine
allows you to create features as aggregations over a column in your Feature View
transformation. Aggregations are specified via the aggregations
parameter in
the decorator of a Batch or Stream Feature View. See a quick example here:
@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=timedelta(days=1), name="1_day_avg"),
Aggregation(function="mean", column="amt", time_window=timedelta(days=3), name="3_day_avg"),
Aggregation(function="mean", column="amt", time_window=timedelta(days=7), name="7_day_avg"),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]
Every Aggregation
has an associated time window to aggregate over, specified
via the time_window
parameter. There are 3 different types of time windows
which together allow for a great degree of flexibility:
- Time Window: A fixed window length stretching into the past from "now," with an optional offset. For example, "the last 7 days."
- Lifetime Window: An ever-growing window from a specific point in the past, up until "now." For example, "from Jan 1, 2020 until now."
- Time Window Series: A series of windows over a time range relative to "now." For example, "every day in the last week."
The following diagram illustrates some common window configurations.
See the sections below for in-depth explanations and usage examples.
You might be asking, "but when exactly is 'now'?" The answer is that it depends on the context.
During online retrieval, "now" means now (i.e. the request time) because we are interested in the current feature value for inference.
However, when retrieving offline features for a historical event, "now" means "the provided timestamp of the event." Tecton handles the time travel to retrieve the correct historical value as of that time.
One last note: In Batch Feature Views or Stream Feature Views that use
sliding windows,
the end of the window will not truly be "now," but rather the most recent
aggregation_interval
.
Time Window​
The TimeWindow
class is used to specify a fixed window length into the past
relative to "now." For example, "the last 7 days." This is the most common
window type.
TimeWindow
has two parameters:
Name | Required? | Description |
---|---|---|
window_size | Yes | The size of the window, expressed as a positive timedelta . |
offset | No. Defaults to 0. | The relative end time of the window, expressed as a negative timedelta . |
As shorthand, if you simply pass in a timedelta
to the Aggregation's
time_window
parameter, Tecton will interpret this as a TimeWindow
with no
offset. For example, time_window=timedelta(days=7)
is the same as
time_window=TimeWindow(window_size=timedelta(days=7))
.
See the SDK Reference for more details.
Time Window Example​
This example leverages the shorthand notation described above.
@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=timedelta(days=7), name="1_week_avg"),
Aggregation(
function="mean",
column="amt",
time_window=TimeWindow(window_size=timedelta(days=7), offset=timedelta(days=-3)),
"1_week_avg_3_days_ago",
),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]
Lifetime Window​
This capability requires Data Compaction. Compaction and Lifetime Windows are in Private Preview and have limitations that will be resolved in future Tecton releases. See Limitations & Requirements for more details. This is currently available for Spark-based Feature Views -- support for Rift is coming soon.
The LifetimeWindow
class is used to specify an ever-growing window from a
specific point in the past, up until "now." For example, "from Jan 1, 2000 until
now."
The start time of the Lifetime Window is specified via the lifetime_start_time
parameter on a Batch or Stream Feature View and therefore must be the same time
for all LifetimeWindows in a single Feature View.
Lifetime Windows require Data Compaction to be enabled via
the compaction_enabled=True
parameter on a Batch or Stream Feature View. This
ensures efficient computation and retrieval.
See the SDK Reference for more details.
Lifetime Window Example​
@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(function="mean", column="amt", time_window=LifetimeWindow(), name="txn_avg_since_2000"),
Aggregation(function="sum", column="amt", time_window=LifetimeWindow(), name="txn_sum_since_2000"),
],
compaction_enabled=True,
lifetime_start_time=datetime(2000, 1, 1),
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]
Time Window Series​
This feature is currently available for Spark-based features, and available on
Rift when
setting tecton.conf.set('DUCKDB_ENABLE_OPTIMIZED_FULL_AGG', False)
.
The TimeWindowSeries
class is used to specify a series of time windows over a
time range relative to "now." For example, "every hour in the last week."
The output type of a Time Window Series feature is an array of values representing an aggregate for each window in the series ordered from earliest to latest.
TimeWindowSeries
has 4 parameters:
Name | Required? | Description |
---|---|---|
series_start | Yes | The relative start of the series of windows, represented as a negative timedelta . |
series_end | No. Defaults to 0 (i.e. "now"). | The relative end of the series of windows, represented as a negative timedelta . |
window_size | Yes | The size of each window in the series, represented as a positive timedelta . |
step_size | No. Defaults to window_size . | The interval by which the time windows step forward in the series, represented as a positive timedelta . This is primarily useful if you want to express a series of overlapping windows. For example, if you want a series of 3 hour windows as of every hour in the last week you would set window_size=timedelta(hours=3) and step_size=timedelta(hours=1) . |
The start and end of the series are aligned to the start of the first window and the end of the last window.
Tecton will validate your configuration to ensure this alignment is possible and give an error if not.
For example, a 3 day series, with a 2 day window size would be invalid because you can not fit sequential non-overlapping 2 day windows into a 3 day range. This configuration would be valid with either a 1 day step size, or a 4 day series. See this diagram for a visual:
See the SDK Reference for more details.
Time Window Series Example​
The output data for this feature would be an array of 168 floats representing the transaction average for each hour in the past week, starting from the earliest hour.
@stream_feature_view(
source=transactions,
entities=[user],
mode="pandas",
aggregations=[
Aggregation(
function="mean",
column="amt",
time_window=TimeWindowSeries(series_start=timedelta(days=-7), window_size=timedelta(hours=1)),
name="hourly_txn_avg_last_7d",
),
],
schema=[Field("user_id", String), Field("timestamp", Timestamp), Field("amt", Float64)],
)
def user_transaction_averages(transactions):
return transactions[["user_id", "timestamp", "amt"]]