Version: 0.6

Creating Feature 2

In this topic, you will create and test the second feature, user_transaction_counts. This feature calculates the number of transactions (per user), over the last day, 30 days, and 90 days.

In your local feature repository, open the file features/batch_features/user_transaction_counts.py. In the file, uncomment the following code, which is a definition of the Feature View.

from tecton import batch_feature_view, FilteredSource, Aggregation
from entities import user
from data_sources.transactions import transactions
from datetime import datetime, timedelta


@batch_feature_view(
    sources=[FilteredSource(transactions)],
    entities=[user],
    mode="spark_sql",
    aggregation_interval=timedelta(days=1),
    aggregations=[
        Aggregation(column="transaction_id", function="count", time_window=timedelta(days=1)),
        Aggregation(column="transaction_id", function="count", time_window=timedelta(days=30)),
        Aggregation(column="transaction_id", function="count", time_window=timedelta(days=90)),
    ],
    online=True,
    offline=True,
    feature_start_time=datetime(2021, 1, 1),
    description="User transaction totals over a series of time windows, updated daily.",
    name="user_transaction_counts",
)
def user_transaction_counts(transactions):
    return f"""
        SELECT
            user_id,
            transaction_id,
            timestamp
        FROM
            {transactions}
        """

In your terminal, run tecton apply to apply this Feature View to your workspace.

The Feature View's transformation

The `aggregations` parameter

The @batch_feature_view decorator contains the aggregations parameter. The presence of this parameter indicates that a Feature View uses one or more built-in aggregations. Built-in aggregations are much easier to use than defining the equivalent aggregations on your own.

The aggregations parameter value specifies three Aggregation objects, which define three built-in aggregations. An Aggregation object takes three inputs: the column to perform the aggregation on, a function to apply to the column, and a time_window which is the time period that the aggregation runs against.

The transformation function

Unlike the credit_card_issuer transformation function shown previously, the user_transaction_counts transformation function does not implement the transformation logic because its associated Feature View uses built-in aggregations.

The columns in the SELECT statement of the user_transaction_counts transformation function specify inputs to send to the Aggregations, as follows:

Column number in SELECT statement	Description	`SELECT` column value for the `user_transaction_counts` function
1	Column for the `function` in the `Aggregation` to group by. This is also the entity name. Entities are used as join keys when multiple features are joined together. You will see an example of this in part 2 of the tutorial.	`user_id`
2	The `column` value in the `Aggregation`	`transaction_id`
3	The field name of the timestamp in the external data source	`timestamp`

Internally, the built-in Aggregation with time_window=timedelta(days=30) is translated to an optimized sliding-window aggregation that is functionally equivalent to:

SELECT
    user_id,
    COUNT(transaction_id),
    ${aggregation_window.end_time}
FROM
    {transactions}
WHERE
    timestamp >= ${aggregation_window.start_time}
    AND timestamp < ${aggregation_window.end_time}
GROUP BY user_id

Tecton runs the query for every aggregation_window (30 days window in the case). And since the aggregation_interval was set to 1 day, Tecton will run a job for every day starting at feature_start_time. The aggregation_window input encapsulates the start and end of each aggregation window.

Feature View output

When the Feature View runs, it outputs each aggregation in the following format.

<column name in the Aggregation>_<function name in the Aggregation>_<time_window value in Aggregation>_<aggregation_interval value>

For example, when the user_transaction_counts Feature View runs, the column name for the 30 day aggregation is transaction_id_count_30d_1d. You will see the output for all of the Feature View columns when testing the Feature View, in the next section.

Test the Feature View

To test the Feature View interactively, follow these steps. Note that a unit test is not shown.

In your notebook, get the Feature View from the workspace:

fv = ws.get_feature_view("user_transaction_counts")

In your notebook, call the run method of the Feature View to get feature data for the timestamp range of 2022-1-1 to 2022-4-10, and display the generated feature values.

offline_features = fv.run(datetime(2022, 1, 1), datetime(2022, 4, 10)).to_spark().limit(10)
offline_features.show()

Sample Output:

user_id	timestamp	transaction_id_count_1d_1d	transaction_id_count_30d_1d	transaction_id_count_90d_1d
user_131340471060	2022-01-02 00:00:00	1	1	1
user_131340471060	2022-01-03 00:00:00	1	2	2
user_131340471060	2022-01-04 00:00:00	1	3	3

Materialization scheduling

The aggregation_interval specifies how often to run the materialization jobs for Feature Views that use built-in aggregations.

Creating Feature 2

The Feature View's transformation

The `aggregations` parameter

Further reading on using aggregations

The transformation function

Feature View output

Test the Feature View

Materialization scheduling

Was this page helpful?

Creating Feature 2

The Feature View's transformation​

The aggregations parameter​

Further reading on using aggregations​

The transformation function​

Feature View output​

Test the Feature View​

Materialization scheduling​

Was this page helpful?

The Feature View's transformation

The `aggregations` parameter

Further reading on using aggregations

The transformation function

Feature View output

Test the Feature View

Materialization scheduling