Version: 0.6

Time-Window Aggregation Functions Reference

Time-window aggregation functions are built-in functions that are used by defining an Aggregation object in a Batch Feature View or a Stream Feature View.

This page is a reference that contains the available time-window aggregation functions. The aggregation functions discussed on this page are either available exclusively under the tecton.aggregation_functions namespace or can only be specified through string representations. For specific examples of how to use these functions, please refer to the examples provided under each aggregation function.

count

An aggregation function that returns, for a materialization time window, the number of row values for a column, per entity value (such as a user_id value). Null values are excluded.

Input column types

Tecton on Spark: All types
Tecton on Snowflake: All types

Output column types

Int64

Usage

To use this aggregation, define an Aggregation object, using function="count", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1))

first_distinct(n)

An aggregation function that returns, for a materialization time window, the first N distinct row values for a column, per entity value (such as a user_id value).

For example, if the first 2 distinct row values for a column, in the materialization time window, are 10 and 20, then the function returns [10,20].

note

The output sequence is in ascending order based on timestamp.

Not currently supported with:

Tecton on Snowflake
Serverless Feature Retrieval with Athena

Input column types

String

Output column type

Array[String]

Import this aggregation with from tecton.aggregation_functions import first_distinct.

Then, define an Aggregation object, using function=first_distinct(n), where n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function=first_distinct(2), time_window=timedelta(days=1)).

first(n)

An aggregation function that returns, for a materialization time window, the first N row values for a column, per entity value (such as a user_id value).

For example, if the first 2 row values for a column, in the materialization time window, are 10 and 20, then the function returns [10,20].

note

The output sequence is in ascending order based on the timestamp.

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

String

Output column type

Array[String]

Usage

To use this aggregation, define an Aggregation object, using function=first(n), where n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function=first(2), time_window=timedelta(days=1))

last_distinct(n)

An aggregation function that returns, for a materialization time window, the last N distinct row values for a column, per entity value (such as a user_id value).

For example, if the last 2 distinct row values for a column, in the materialization time window, are 10 and 20, then the function returns [10,20].

note

The output sequence is in ascending order based on the timestamp.

Not currently supported with:

Tecton on Snowflake
Serverless Feature Retrieval with Athena

Input column types

String

Output column type

Array[String]

Usage

Import this aggregation with from tecton.aggregation_functions import last_distinct.

Then, define an Aggregation object, using function=last_distinct(n), where n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function=last_distinct(2), time_window=timedelta(days=1))

last

An aggregation function that returns, for a materialization time window, the last row value for a column, per entity value (such as a user_id value).

Not currently supported with:

Tecton on Snowflake
Serverless Feature Retrieval with Athena

Input column types

Int64, Int32, Float64, Bool, String

Output column type

Int64, Float64, Bool, String

Usage

To use this aggregation, define an Aggregation object, using function="last", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="last", time_window=timedelta(days=1))

last(n)

An aggregation function that returns, for a materialization time window, the last N row values for a column, per entity value (such as a user_id value).

For example, if the last 2 row values for a column, in the materialization time window, are 10 and 20, then the function returns [10,20].

note

The output sequence is in ascending order based on the timestamp.

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

String

Output column type

Array[String]

Usage

Import this aggregation with from tecton.aggregation_functions import last.

Then, define an Aggregation object using function=last(n), where n is an integer > 0 and <= 1000, in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function=last(2), time_window=timedelta(days=1))

max

An aggregation function that returns, for a materialization time window, the maximum of the row values for a column, per entity value (such as a user_id value).

Input column types

Int64, Int32, Float64, String

Output column type

Int64, Float64, String

Usage

To use this aggregation, define an Aggregation object, using function="max", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="max", time_window=timedelta(days=1))

mean

An aggregation function that returns, for a materialization time window, the mean of the row values for a column, per entity value (such as a user_id value).

Input column types

Int64, Int32, Float64

Output column type

Float64

Usage

To use this aggregation, define an Aggregation object, using function="mean", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="mean", time_window=timedelta(days=1))

min

An aggregation function that returns, for a materialization time window, the minimum of the row values for a column, per entity value (such as a user_id value).

Input column types

Int64, Int32, Float64, String

Output column type

Int64, Float64, String

Usage

To use this aggregation, define an Aggregation object, using function="min", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="min", time_window=timedelta(days=1))

stddev_pop

An aggregation function that returns, for a materialization time window, the standard deviation of the row values for a column around the population mean, per entity value (such as a user_id value).

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

Int64, Int32, Float64

Output column type

Float64

Usage

To use this aggregation, define an Aggregation object, using function="stddev_pop", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="stddev_pop", time_window=timedelta(days=1))

stddev_samp

An aggregation function that returns, for a materialization time window, the standard deviation of the row values for a column around the sample mean, per entity value (such as a user_id value).

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

Int64, Int32, Float64

Output column type

Float64

Usage

To use this aggregation, define an Aggregation object, using function="stddev_samp", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="stddev_samp", time_window=timedelta(days=1))

sum

An aggregation function that returns, for a materialization time window, the sum of the row values for a column, per entity value (such as a user_id value).

Input column types

Int64, Int32, Float64

Output column type

Int64 or Float64

Usage

To use this aggregation, define an Aggregation object, using function="sum", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="sum", time_window=timedelta(days=1))

var_pop

An aggregation function that returns, for a materialization time window, the variance of the row values for a column around the population mean, per entity value (such as a user_id value).

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

Int64, Int32, Float64

Output column type

Float64

Usage

To use this aggregation, define an Aggregation object, using function="var_pop", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="var_pop", time_window=timedelta(days=1))

var_samp

An aggregation function that returns, for a materialization time window, the variance of the row values for a column around the sample mean, per entity value (such as a user_id value).

Not currently supported with:

Serverless Feature Retrieval with Athena

Input column types

Int64, Int32, Float64

Output column type

Float64

Usage

To use this aggregation, define an Aggregation object, using function="var_samp", in a Batch Feature View or a Stream Feature View.

Example

Aggregation(column="amt", function="var_samp", time_window=timedelta(days=1))

Time-Window Aggregation Functions Reference

count​

first_distinct(n)​

first(n)​

last_distinct(n)​

last​

last(n)​

max​

mean​

min​

stddev_pop​

stddev_samp​

sum​

var_pop​

var_samp​

Was this page helpful?

count

first_distinct(n)

first(n)

last_distinct(n)

last

last(n)

max

mean

min

stddev_pop

stddev_samp

sum

var_pop

var_samp