Time-Window Aggregation Functions Reference
Time-window aggregation functions are built-in functions that are used by
defining an Aggregation object in a Batch Feature View or a Stream Feature
View.
-
Example of using a time-window aggregation function in a Batch Feature View
-
Example of using a time-window aggregation function in a Stream Feature View
This page is a reference that contains the available time-window aggregation
functions. The aggregation functions discussed on this page are either available
exclusively under the tecton.aggregation_functions namespace or can only be
specified through string representations. For specific examples of how to use
these functions, please refer to the examples provided under each aggregation
function.
count​
An aggregation function that returns, for a materialization time window, the
number of row values for a column, per entity value (such as a user_id value).
Null values are excluded.
Input column types
- Tecton on Spark: All types
- Tecton on Snowflake: All types
Output column types
Int64
Usage
To use this aggregation, define an Aggregation object, using
function="count", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1))
first_distinct(n)​
An aggregation function that returns, for a materialization time window, the
first N distinct row values for a column, per entity value (such as a user_id
value).
For example, if the first 2 distinct row values for a column, in the
materialization time window, are 10 and 20, then the function returns
[10,20].
The output sequence is in ascending order based on timestamp.
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Import this aggregation with
from tecton.aggregation_functions import first_distinct.
Then, define an Aggregation object, using function=first_distinct(n), where
n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature
View.
Example
Aggregation(column="amt", function=first_distinct(2), time_window=timedelta(days=1)).
first(n)​
An aggregation function that returns, for a materialization time window, the
first N row values for a column, per entity value (such as a user_id value).
For example, if the first 2 row values for a column, in the materialization time
window, are 10 and 20, then the function returns [10,20].
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
To use this aggregation, define an Aggregation object, using
function=first(n), where n is an integer > 0 and <= 1000, in a Batch
Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function=first(2), time_window=timedelta(days=1))
last_distinct(n)​
An aggregation function that returns, for a materialization time window, the
last N distinct row values for a column, per entity value (such as a user_id
value).
For example, if the last 2 distinct row values for a column, in the
materialization time window, are 10 and 20, then the function returns
[10,20].
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
Import this aggregation with
from tecton.aggregation_functions import last_distinct.
Then, define an Aggregation object, using function=last_distinct(n), where
n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature
View.
Example
Aggregation(column="amt", function=last_distinct(2), time_window=timedelta(days=1))
last​
An aggregation function that returns, for a materialization time window, the
last row value for a column, per entity value (such as a user_id value).
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
Int64,Int32,Float64,Bool,String
Output column type
Int64,Float64,Bool,String
Usage
To use this aggregation, define an Aggregation object, using
function="last", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="last", time_window=timedelta(days=1))
last(n)​
An aggregation function that returns, for a materialization time window, the
last N row values for a column, per entity value (such as a user_id value).
For example, if the last 2 row values for a column, in the materialization time
window, are 10 and 20, then the function returns [10,20].
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
Import this aggregation with from tecton.aggregation_functions import last.
Then, define an Aggregation object using function=last(n), where n is an
integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function=last(2), time_window=timedelta(days=1))
max​
An aggregation function that returns, for a materialization time window, the
maximum of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64,Int32,Float64,String
Output column type
Int64,Float64,String
Usage
To use this aggregation, define an Aggregation object, using function="max",
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="max", time_window=timedelta(days=1))
mean​
An aggregation function that returns, for a materialization time window, the
mean of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64,Int32,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation object, using
function="mean", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="mean", time_window=timedelta(days=1))
min​
An aggregation function that returns, for a materialization time window, the
minimum of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64,Int32,Float64,String
Output column type
Int64,Float64,String
Usage
To use this aggregation, define an Aggregation object, using function="min",
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="min", time_window=timedelta(days=1))
stddev_pop​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the population mean,
per entity value (such as a user_id value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64,Int32,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation object, using
function="stddev_pop", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_pop", time_window=timedelta(days=1))
stddev_samp​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the sample mean, per
entity value (such as a user_id value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64,Int32,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation object, using
function="stddev_samp", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_samp", time_window=timedelta(days=1))
sum​
An aggregation function that returns, for a materialization time window, the sum
of the row values for a column, per entity value (such as a user_id value).
Input column types
Int64,Int32,Float64
Output column type
Int64orFloat64
Usage
To use this aggregation, define an Aggregation object, using function="sum",
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="sum", time_window=timedelta(days=1))
var_pop​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the population mean, per entity
value (such as a user_id value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64,Int32,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation object, using
function="var_pop", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_pop", time_window=timedelta(days=1))
var_samp​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the sample mean, per entity value
(such as a user_id value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64,Int32,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation object, using
function="var_samp", in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_samp", time_window=timedelta(days=1))