tecton.Aggregation
Summary​
This class describes a single aggregation that is applied in a batch or stream feature view.
Description​
The Aggregation
constructor accepts a function
input, which can be one of
the
built-in aggregation functions.
For these aggregation functions, you can pass the name of it as a string. Nulls
are handled like Spark SQL Function(column)- for example, sum
of all nulls is
null and count
of all nulls is 0.
In addition to numeric aggregations, Aggregation
supports the last
non-distinct and distinct N aggregation that will compute the last N
non-distinct and distinct values for the column by timestamp. Right now only
string column is supported as input to this aggregation, i.e., the resulting
feature value will be a list of strings. The order of the value in the list is
ascending based on the timestamp. Nulls are not included in the aggregated list.
Example​
You can use it via the last()
and last_distinct()
helper function like this:
from tecton.aggregation_functions import last_distinct, last, TimeWindow
@batch_feature_view(
...
aggregations=[
Aggregation(
column='my_column',
function=last_distinct(15),
time_window=TimeWindow(window_size=datetime.timedelta(days=7))),
Aggregation(
column='my_column',
function=last(15),
time_window=TimeWindow(window_size=datetime.timedelta(days=7))),
],
...
)
def my_fv(data_source):
pass
Attributes​
The attributes are the same as the __init__
method parameters. See below.
Methods​
__init__(...)​
Parameters​
-
column
(str
) – Column name of the feature we are aggregating. -
function
(Union[
str
,<aggregation function>]
) – One of the built-in aggregation functions, such ascount
. See the time-window aggregation functions reference for a list of aggregation functions. -
time_window
(TimeWindow
) – The window_size and optional offset over which to aggregate over. See Time Window Reference for more details on the TimeWindow class. -
name
(str
) – The name of this feature. Defaults to an autogenerated name, e.g.transaction_count_7d_1d
.