TimeWindowSeries
Summary​
This class describes a TimeWindowSeries that is applied in an Aggregation within a Batch or Stream Feature View.
Time Window Series are useful for expressing a feature like "user transaction sum for every hour in the last week."
For an overview of Aggregation Windows, check out Aggregation Windows.
Description​
Tecton aggregations are applied over the specified time window series using the
time_window
parameter. Use the TimeWindowSeries
class to create an
aggregation over a series of time windows as shown in the example below:
from tecton import batch_feature_view, FilteredSource, Aggregation, TimeWindowSeries
from datetime import timedelta
@batch_feature_view(
sources=[FilteredSource(transactions)],
mode="spark_sql",
entities=[user],
aggregation_interval=timedelta(days=1),
aggregations=[
Aggregation(
column="value",
function="sum",
time_window=TimeWindowSeries(
series_start=timedelta(days=-7),
window_size=timedelta(days=1),
),
)
],
)
def user_transaction_sums(transactions):
return f"""
SELECT user_id, timestamp, value
FROM {transactions}
"""
Example​
Consider the following example mock data:
user_id | timestamp | value | |
---|---|---|---|
0 | user_1 | 2022-05-14 00:00:00 | 1 |
1 | user_1 | 2022-05-15 00:00:00 | 3 |
2 | user_1 | 2022-05-16 00:00:00 | 6 |
3 | user_1 | 2022-05-17 00:00:00 | 11 |
4 | user_1 | 2022-05-18 00:00:00 | 23 |
During Offline Retrieval, when you pass in an events
dataframe to join
against, the aggregation will be computed over the time window series relative
to the timestamps in the input events
dataframe. We give examples of how the
aggregation is computed for different timestamps in the events
dataframe
below.
import pandas as pd
import datetime
training_events = pd.DataFrame(
{
"user_id": ["user_1", "user_1", "user_1", "user_1", "user_1", "user_1"],
"timestamp": [
datetime(2022, 5, 15),
datetime(2022, 5, 18),
datetime(2022, 5, 19),
datetime(2022, 5, 20),
datetime(2022, 5, 24),
datetime(2022, 5, 26),
],
}
)
df = user_transaction_sums.get_features_for_events(training_events).to_pandas()
display(df)
user_id | timestamp | user_transaction_sums__amt_sum_1d_1d_series_7d_0s_1d | |
---|---|---|---|
0 | user_1 | 2022-05-15 00:00:00 | [None, None, None, None, None, None, 1] |
1 | user_1 | 2022-05-18 00:00:00 | [None, None, None, 1, 3, 6, 11] |
2 | user_1 | 2022-05-19 00:00:00 | [None, None, 1, 3, 6, 11, 23] |
3 | user_1 | 2022-05-20 00:00:00 | [None, 1, 3, 6, 11, 23, None] |
4 | user_1 | 2022-05-24 00:00:00 | [11, 23, None, None, None, None, None] |
5 | user_1 | 2022-05-26 00:00:00 | [None, None, None, None, None, None, None] |
Attributes​
The attributes are the same as the __init__
method parameters. See below.
Methods​
__init__(...)​
Parameters​
Name | Required? | Description |
---|---|---|
series_start | Yes | The relative start of the series of windows, represented as a negative timedelta . |
series_end | No. Defaults to 0 (i.e. "now"). | The relative end of the series of windows, represented as a negative timedelta . |
window_size | Yes | The size of each window in the series, represented as a positive timedelta . |
step_size | No. Defaults to window_size . | The interval by which the time windows step forward in the series, represented as a positive timedelta . This is primarily useful if you want to express a series of overlapping windows. For example, if you want a series of 3 hour windows as of every hour in the last week you would set window_size=timedelta(hours=3) and step_size=timedelta(hours=1) . |