Version: 0.7

Test Batch Features

Import libraries and select your workspace

import tecton
import pandas
from datetime import datetime, timedelta

ws = tecton.get_workspace("prod")

Load a Batch Feature View

fv = ws.get_feature_view("user_transaction_counts")
fv.summary()

Run a Feature View transformation pipeline

The BatchFeatureView::run function can be used to dry run execute a Feature View transformation pipeline over a given time range. This can be useful for checking the output of your feature transformation logic or debugging a materialization job.

caution

There is no guarantee that the output data is the same as the feature values that would be created in this time frame, such as in the following cases:

When using incremental backfills, feature data for a given time range may depend on multiple executions of the Feature view transformation pipeline.
Feature values may be dependent on scheduling information (e.g. batch_schedule, data_delay, feature_start_time) that doesn't match the start_time and end_time you provide.
Aggregations may require more input data that the window you provide with start_time and end_time.

If you want to produce feature values for a given time range, you should use get_historical_feature(start_time, end_time).

result_dataframe = fv.run(start_time=datetime(2021, 1, 1), end_time=datetime(2022, 1, 2)).to_pandas()
display(result_dataframe)

	user_id	signup_timestamp	credit_card_issuer
0	user_600003278485	2021-01-01 06:25:57	other
1	user_469998441571	2021-01-01 07:16:06	Visa
2	user_502567604689	2021-01-01 04:39:10	Visa
3	user_930691958107	2021-01-01 10:52:31	Visa
4	user_782510788708	2021-01-01 20:15:25	other

Run with mock sources

Mock input data sources can be passed into the BatchFeatureView::run function using the same source names from the Feature View definition.

users_data = pandas.DataFrame(
    {
        "user_id": ["user_1", "user_1", "user_2"],
        "cc_num": ["423456789012", "567890123456", "678901234567"],
        "signup_timestamp": [
            datetime(2022, 1, 1, 2),
            datetime(2022, 1, 1, 4),
            datetime(2022, 1, 1, 3),
        ],
    }
)

result_dataframe = fv.run(
    start_time=datetime(2022, 1, 1),
    end_time=datetime(2022, 1, 2),
    users=users_data,  # `users` is the name of this Feature View input.
).to_pandas()

display(result_dataframe)

	user_id	signup_timestamp	credit_card_issuer
0	user_1	2022-01-01 02:00:00	Visa
1	user_1	2022-01-01 04:00:00	MasterCard
2	user_2	2022-01-01 03:00:00	Discover

Run a Batch Feature View with tiled aggregations

BatchFeatureView::run for feature views with aggregations is quite similar to with the only different that it also supports aggregation_level parameter.

When a feature view with tile aggregates, the query operates in three logical steps:

The feature view query is run over the provided time range. The user defined transformations are applied over the data source.
The result of #1 is aggregated into tiles the size of the aggregation_interval.
The tiles from #2 are combined to form the final feature values. The number of tiles that are combined is based off of the time_window of the aggregation.

To see the output of #1, use aggregation_level="disabled". For #2, use aggregation_level="partial". For #3, use aggregation_level="full".

aggregation_level="full" is the default behavior.

For more details on aggregate_tiles, refer to Creating Features that use Time-Windowed Aggregations.

agg_fv = ws.get_feature_view("user_transaction_counts")

result_dataframe = agg_fv.run(
    start_time=datetime(2022, 5, 1),
    end_time=datetime(2022, 5, 2),
    aggregation_level="disabled",
).to_pandas()

display(result_dataframe)

	user_id	transaction	timestamp
0	user_222506789984	1	2022-05-01 21:04:38
1	user_26990816968	1	2022-05-01 19:45:14
2	user_337750317412	1	2022-05-01 15:18:48
3	user_337750317412	1	2022-05-01 07:11:31
4	user_337750317412	1	2022-05-01 01:50:51

result_dataframe = agg_fv.run(
    start_time=datetime(2022, 5, 1),
    end_time=datetime(2022, 5, 2),
    aggregation_level="partial",
).to_pandas()

display(result_dataframe)

	user_id	transaction_count_1d	tile_start_time	tile_end_time
0	user_222506789984	1	2022-05-01 00:00:00	2022-05-02 00:00:00
1	user_26990816968	1	2022-05-01 00:00:00	2022-05-02 00:00:00
2	user_337750317412	4	2022-05-01 00:00:00	2022-05-02 00:00:00
3	user_402539845901	2	2022-05-01 00:00:00	2022-05-02 00:00:00
4	user_461615966685	1	2022-05-01 00:00:00	2022-05-02 00:00:00

end = datetime(2022, 5, 2)

result_dataframe = agg_fv.run(
    start_time=end
    - timedelta(days=90),  # Note: to get an interesting "full" aggregation, we need to provide adequate input data.
    end_time=end,
    aggregation_level="full",
).to_pandas()

display(result_dataframe)

	user_id	timestamp	transaction_count_1d_1d	transaction_count_30d_1d	transaction_count_90d_1d
0	user_131340471060	2022-04-30 00:00:00	1	6	22
1	user_131340471060	2022-04-23 00:00:00	1	6	21
2	user_131340471060	2022-04-18 00:00:00	1	7	20
3	user_131340471060	2022-04-15 00:00:00	2	7	19
4	user_131340471060	2022-04-08 00:00:00	1	6	17

Get a Range of Feature Values from the Offline Store

BatchFeatureView::get_historical_features can read a range of feature values from the offline store between a given start_time and end_time.

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. This is useful for testing the expected output of feature values.

Use from_source=False (default) to see what data is materialized in the offline store.

result_dataframe = fv.get_historical_features(
    start_time=datetime(2022, 5, 1), end_time=datetime(2022, 5, 2)
).to_pandas()
display(result_dataframe)

	user_id	timestamp	transaction_count_1d_1d	transaction_count_30d_1d	transaction_count_90d_1d	_effective_timestamp
0	user_205125746682	2022-05-01 00:00:00	2	8	34	2022-05-01 00:00:00
1	user_222506789984	2022-05-01 00:00:00	1	42	141	2022-05-01 00:00:00
2	user_268514844966	2022-05-01 00:00:00	1	29	66	2022-05-01 00:00:00
3	user_394495759023	2022-05-01 00:00:00	1	21	68	2022-05-01 00:00:00
4	user_459842889956	2022-05-01 00:00:00	1	14	39	2022-05-01 00:00:00

Read the Latest Features from Online Feature Store

danger

For performance reasons, this function should only be used for testing and not in a production environment. To read features online efficiently, see Reading Features for Inference

fv.get_online_features({"user_id": "user_609904782486"}).to_dict()

Out: {
    "transaction_count_1d_1d": 1,
    "transaction_count_30d_1d": 17,
    "transaction_count_90d_1d": 56,
}

Read Historical Features from Offline Feature Store with Time-Travel

Create a spine DataFrame with events to look up. For more information on spines, check out Selecting Sample Keys and Timestamps.

spine_df = pandas.DataFrame(
    {
        "user_id": ["user_722584453020", "user_461615966685"],
        "timestamp": [datetime(2022, 5, 1, 3, 20, 0), datetime(2022, 6, 6, 2, 30, 0)],
    }
)
display(spine_df)

	user_id	timestamp
0	user_722584453020	2022-05-01 03:20:00
1	user_461615966685	2022-06-06 02:30:00

from_source=True can be passed in in order to bypass the offline store and compute features on-the-fly against the raw data source. However, this will be slower than reading feature data that has been materialized to the offline store.

result_dataframe = fv.get_historical_features(spine_df, from_source=True).to_pandas()
display(result_dataframe)

	user_id	timestamp	user_transaction_counts__transaction_count_1d_1d	user_transaction_counts__transaction_count_30d_1d	user_transaction_counts__transaction_count_90d_1d
0	user_461615966685	2022-06-06 02:30:00	0	13	40
1	user_722584453020	2022-05-01 03:20:00	0	28	73

Test Batch Features

Import libraries and select your workspace​

Load a Batch Feature View​

Run a Feature View transformation pipeline​

Run with mock sources​

Run a Batch Feature View with tiled aggregations​

Get a Range of Feature Values from the Offline Store​

Read the Latest Features from Online Feature Store​

Read Historical Features from Offline Feature Store with Time-Travel​

Was this page helpful?