TimeWindow
Summary​
This class describes a TimeWindow that is applied in an Aggregation within a Batch or Stream Feature View.
Time Windows are useful for expressing a feature like "user transaction average in the last 7 days."
For an overview of Aggregation Windows, check out Aggregation Windows.
Description​
Tecton aggregations are applied over a specified time window using the
time_window parameter. Use the TimeWindow class to create an aggregation
over a fixed window size as shown in the example below:
from tecton import batch_feature_view, FilteredSource, Aggregation, TimeWindow
@batch_feature_view(
    sources=[FilteredSource(transactions)],
    mode="spark_sql",
    entities=[user],
    aggregation_interval=timedelta(days=1),
    aggregations=[Aggregation(column="amount", function="mean", time_window=TimeWindow(window_size=timedelta(days=7)))],
)
def user_average_transaction_amount(transactions):
    return f"""
        SELECT user_id, timestamp, amt
        FROM {transactions}
        """
If you directly pass a datetime.timedelta object to the time_window
parameter, as in time_window=datetime.timedelta(days=7), it will be inferred
as time_window=TimeWindow(window_size=datetime.timedelta(days=7))
The end time of this window will be the most recent aggregation interval
relative to the online request time or offline events dataframe timestamp.
Offset Time Windows​
The end time of the time window can be adjusted via an offset parameter in the
TimeWindow class as shown in the example below. In this example, the window
will be from -10 days to -3 days:
from tecton import batch_feature_view, FilteredSource, Aggregation, TimeWindow
from datetime import timedelta
@batch_feature_view(
    sources=[FilteredSource(transactions)],
    mode="spark_sql",
    entities=[user],
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(
            column="amount",
            function="mean",
            time_window=TimeWindow(window_size=timedelta(days=7), offset=timedelta(days=-3)),
        )
    ],
)
def user_average_transaction_amount(transactions):
    return f"""
        SELECT user_id, timestamp, amt
        FROM {transactions}
        """
The offset parameter must always be negative.
Example​
Consider the following example mock data:
| user_id | timestamp | value | |
|---|---|---|---|
| 0 | user_1 | 2022-05-14 00:00:00 | 1 | 
| 1 | user_1 | 2022-05-15 00:00:00 | 3 | 
| 2 | user_1 | 2022-05-16 00:00:00 | 6 | 
| 3 | user_1 | 2022-05-17 00:00:00 | 11 | 
| 4 | user_1 | 2022-05-18 00:00:00 | 23 | 
A Feature View can have aggregations with and without an offset.
from tecton import Entity, batch_feature_view, Aggregation, TimeWindow
user_entity = Entity(name="user", join_keys=["user_id"])
@batch_feature_view(
    mode="spark_sql",
    sources=[ds],
    entities=[user_entity],
    aggregation_interval=timedelta(days=1),
    timestamp_field="timestamp",
    offline=True,
    online=True,
    feature_start_time=datetime(2022, 5, 1),
    aggregations=[
        Aggregation(column="value", function="sum", time_window=TimeWindow(window_size=timedelta(days=2))),
        Aggregation(
            column="value",
            function="sum",
            time_window=TimeWindow(window_size=timedelta(days=2), offset=timedelta(days=-2)),
        ),
    ],
)
def user_transaction_sums(input_table):
    return f"""
        SELECT user_id, timestamp, value
        FROM {input_table}
        """
During Offline Retrieval, when you pass in an events dataframe to join
against, the aggregation will be computed over the time window with an offset
relative to the timestamps in the input events dataframe. We give examples of
how the aggregation is computed for different timestamps in the events
dataframe below.
import pandas as pd
import datetime
training_events = pd.DataFrame(
    {
        "user_id": ["user_1", "user_1", "user_1", "user_1"],
        "timestamp": [datetime(2022, 5, 15), datetime(2022, 5, 18), datetime(2022, 5, 19), datetime(2022, 5, 20)],
    }
)
df = user_transaction_sums.get_features_for_events(training_events).to_pandas()
display(df)
| user_id | timestamp | user_transaction_sums__value_sum_2d_1d | user_transaction_sums__value_sum_2d_1d_offset_2d | |
|---|---|---|---|---|
| 0 | user_1 | 2022-05-15 00:00:00 | 1 | None | 
| 1 | user_1 | 2022-05-18 00:00:00 | 17 | 4 | 
| 2 | user_1 | 2022-05-19 00:00:00 | 34 | 9 | 
| 3 | user_1 | 2022-05-20 00:00:00 | 23 | 17 | 
Attributes​
The attributes are the same as the __init__ method parameters. See below.
Methods​
__init__(...)​
Parameters​
| Name | Required? | Description | 
|---|---|---|
window_size | Yes | The size of the window, expressed as a positive timedelta. | 
offset | No. Defaults to 0. | The relative end time of the window, expressed as a negative timedelta. |