Time-Window Aggregation Functions Reference
Time-window aggregation functions are built-in functions that are used by
defining an Aggregation
object in a Batch Feature View or a Stream Feature
View.
-
Example of using a time-window aggregation function in a Batch Feature View
-
Example of using a time-window aggregation function in a Stream Feature View
This page is a reference that contains the available time-window aggregation
functions. The aggregation functions discussed on this page are either available
exclusively under the tecton.aggregation_functions
namespace or can only be
specified through string representations. For specific examples of how to use
these functions, please refer to the examples provided under each aggregation
function.
count​
An aggregation function that returns, for a materialization time window, the
number of row values for a column, per entity value (such as a user_id
value).
Null values are excluded.
Input column types
- Tecton on Spark: All types
- Tecton on Snowflake: All types
Output column types
Int64
Usage
To use this aggregation, define an Aggregation
object, using
function="count"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1))
first_distinct(n)​
An aggregation function that returns, for a materialization time window, the
first N distinct row values for a column, per entity value (such as a user_id
value).
For example, if the first 2 distinct row values for a column, in the
materialization time window, are 10
and 20
, then the function returns
[10,20]
.
The output sequence is in ascending order based on timestamp.
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Import this aggregation with
from tecton.aggregation_functions import first_distinct
.
Then, define an Aggregation
object, using function=first_distinct(n)
, where
n
is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature
View.
Example
Aggregation(column="amt", function=first_distinct(2), time_window=timedelta(days=1))
.
first(n)​
An aggregation function that returns, for a materialization time window, the
first N row values for a column, per entity value (such as a user_id
value).
For example, if the first 2 row values for a column, in the materialization time
window, are 10
and 20
, then the function returns [10,20]
.
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
To use this aggregation, define an Aggregation
object, using
function=first(n)
, where n
is an integer > 0 and <= 1000, in a Batch
Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function=first(2), time_window=timedelta(days=1))
last_distinct(n)​
An aggregation function that returns, for a materialization time window, the
last N distinct row values for a column, per entity value (such as a user_id
value).
For example, if the last 2 distinct row values for a column, in the
materialization time window, are 10
and 20
, then the function returns
[10,20]
.
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
Import this aggregation with
from tecton.aggregation_functions import last_distinct
.
Then, define an Aggregation
object, using function=last_distinct(n)
, where
n
is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature
View.
Example
Aggregation(column="amt", function=last_distinct(2), time_window=timedelta(days=1))
last​
An aggregation function that returns, for a materialization time window, the
last row value for a column, per entity value (such as a user_id
value).
Not currently supported with:
- Tecton on Snowflake
- Serverless Feature Retrieval with Athena
Input column types
Int64
,Int32
,Float64
,Bool
,String
Output column type
Int64
,Float64
,Bool
,String
Usage
To use this aggregation, define an Aggregation
object, using
function="last"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="last", time_window=timedelta(days=1))
last(n)​
An aggregation function that returns, for a materialization time window, the
last N row values for a column, per entity value (such as a user_id
value).
For example, if the last 2 row values for a column, in the materialization time
window, are 10
and 20
, then the function returns [10,20]
.
The output sequence is in ascending order based on the timestamp.
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
String
Output column type
Array[String]
Usage
Import this aggregation with from tecton.aggregation_functions import last
.
Then, define an Aggregation
object using function=last(n)
, where n
is an
integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function=last(2), time_window=timedelta(days=1))
max​
An aggregation function that returns, for a materialization time window, the
maximum of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64
,Int32
,Float64
,String
Output column type
Int64
,Float64
,String
Usage
To use this aggregation, define an Aggregation
object, using function="max"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="max", time_window=timedelta(days=1))
mean​
An aggregation function that returns, for a materialization time window, the
mean of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="mean"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="mean", time_window=timedelta(days=1))
min​
An aggregation function that returns, for a materialization time window, the
minimum of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64
,Int32
,Float64
,String
Output column type
Int64
,Float64
,String
Usage
To use this aggregation, define an Aggregation
object, using function="min"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="min", time_window=timedelta(days=1))
stddev_pop​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the population mean,
per entity value (such as a user_id
value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="stddev_pop"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_pop", time_window=timedelta(days=1))
stddev_samp​
An aggregation function that returns, for a materialization time window, the
standard deviation of the row values for a column around the sample mean, per
entity value (such as a user_id
value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="stddev_samp"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="stddev_samp", time_window=timedelta(days=1))
sum​
An aggregation function that returns, for a materialization time window, the sum
of the row values for a column, per entity value (such as a user_id
value).
Input column types
Int64
,Int32
,Float64
Output column type
Int64
orFloat64
Usage
To use this aggregation, define an Aggregation
object, using function="sum"
,
in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="sum", time_window=timedelta(days=1))
var_pop​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the population mean, per entity
value (such as a user_id
value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="var_pop"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_pop", time_window=timedelta(days=1))
var_samp​
An aggregation function that returns, for a materialization time window, the
variance of the row values for a column around the sample mean, per entity value
(such as a user_id
value).
Not currently supported with:
- Serverless Feature Retrieval with Athena
Input column types
Int64
,Int32
,Float64
Output column type
Float64
Usage
To use this aggregation, define an Aggregation
object, using
function="var_samp"
, in a Batch Feature View or a Stream Feature View.
Example
Aggregation(column="amt", function="var_samp", time_window=timedelta(days=1))