tecton.TimeWindow
Summaryβ
This class describes a TimeWindow that is applied in an Aggregation within a Batch or Stream Feature View.
Descriptionβ
Tecton aggregations are applied over a specified time window using the
time_window
parameter. Use the TimeWindow
class to create an aggregation
over a fixed window size as shown in the example below:
from tecton import batch_feature_view, FilteredSource, Aggregation, TimeWindow
@batch_feature_view(
sources=[FilteredSource(transactions)],
mode="spark_sql",
entities=[user],
aggregation_interval=timedelta(days=1),
aggregations=[Aggregation(column="amount", function="mean", time_window=TimeWindow(window_size=timedelta(days=7)))],
)
def user_average_transaction_amount(transactions):
return f"""
SELECT user_id, timestamp, amount
FROM {transactions}
"""
If you directly pass a datetime.timedelta
object to the time_window
parameter, as in time_window=datetime.timedelta(days=7)
, it will be inferred
as time_window=TimeWindow(window_size=datetime.timedelta(days=7))
The end time of this window will be the most recent aggregation interval relative to the online request time or offline spine timestamp.
Offset Time Windowsβ
The end time of the time window can be adjusted via an offset
parameter in the
TimeWindow
class as shown in the example below. In this example, the window
will be from -10 days to -3 days:
from tecton import batch_feature_view, FilteredSource, Aggregation, TimeWindow
from datetime import timedelta
@batch_feature_view(
sources=[FilteredSource(transactions)],
mode="spark_sql",
entities=[user],
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(
column="amount",
function="mean",
time_window=TimeWindow(window_size=timedelta(days=7), offset=timedelta(days=-3)),
)
],
)
def user_average_transaction_amount(transactions):
return f"""
SELECT user_id, timestamp, amount
FROM {transactions}
"""
The offset parameter must always be negative.
Exampleβ
Consider the following example mock data:
user_id | timestamp | value | |
---|---|---|---|
0 | user_1 | 2022-05-14 00:00:00 | 1 |
1 | user_1 | 2022-05-15 00:00:00 | 3 |
2 | user_1 | 2022-05-16 00:00:00 | 6 |
3 | user_1 | 2022-05-17 00:00:00 | 11 |
4 | user_1 | 2022-05-18 00:00:00 | 23 |
A Feature View can have aggregations with and without an offset.
from tecton import Entity, batch_feature_view, Aggregation, TimeWindow
user_entity = Entity(name="user", join_keys=["user_id"])
@batch_feature_view(
mode="spark_sql",
sources=[ds],
entities=[user_entity],
aggregation_interval=timedelta(days=1),
timestamp_field="timestamp",
offline=True,
online=True,
feature_start_time=datetime(2022, 5, 1),
aggregations=[
Aggregation(column="value", function="sum", time_window=TimeWindow(window_size=timedelta(days=2))),
Aggregation(
column="value",
function="sum",
time_window=TimeWindow(window_size=timedelta(days=2), offset=timedelta(days=-2)),
),
],
)
def user_transaction_sums(input_table):
return f"""
SELECT user_id, timestamp, value
FROM {input_table}
"""
At request time when you pass in a spine, the aggregation will be computed over the time window with an offset relative to the spine timestamp. We give examples of how the aggregation is computed for different spine timestamps below.
import pandas as pd
import datetime
training_events = pd.DataFrame(
{
"user_id": ["user_1", "user_1", "user_1", "user_1"],
"timestamp": [datetime(2022, 5, 15), datetime(2022, 5, 18), datetime(2022, 5, 19), datetime(2022, 5, 20)],
}
)
df = user_transaction_sums.get_historical_features(training_events).to_pandas()
display(df)
user_id | timestamp | user_transaction_sums__value_sum_2d_1d | user_transaction_sums__value_sum_2d_1d_offset_2d | |
---|---|---|---|---|
0 | user_1 | 2022-05-15 00:00:00 | 1 | None |
1 | user_1 | 2022-05-18 00:00:00 | 17 | 4 |
2 | user_1 | 2022-05-19 00:00:00 | 34 | 9 |
3 | user_1 | 2022-05-20 00:00:00 | 23 | 17 |
Attributesβ
The attributes are the same as the __init__
method parameters. See below.
Methodsβ
__init__(...)β
Parametersβ
-
window_size
(datetime.timedelta
) β The size of the window to aggregate over. Example:datetime.timedelta(days=30)
. -
offset
(datetime.timedelta
)- The negative offset of the time windowβs end time relative to the most
recent aggregation interval for a given request timestamp. Example:
datetime.timedelta(days=-1)
.
- The negative offset of the time windowβs end time relative to the most
recent aggregation interval for a given request timestamp. Example: