Methods for Calling get_historical_features()
The primary way to read training data from Tecton is using
get_historical_features()
, which can be called via the following methods.
Method 1: Calling <feature service>.get_historical_features()​
Using this method, a spine is not passed in to
<feature view>.get_historical_features()
.
Example​
import tecton
# Get the feature service
feature_service = tecton.get_workspace("my_workspace").get_feature_service("my_feature_service")
# Get training dataframe from get_historical_features()
training_df = feature_service.get_historical_features().to_spark()
# Show the results, save them somewhere, etc.
training_df.write.parquet("s3://....")
training_df.limit(100).show()
Steps that <feature_service>.get_historical_features() performs​
When called, <feature_service>.get_historical_features()
performs the
following steps:
-
For each feature view in the feature service, Tecton generates feature data by doing the following:
- (a) Tecton either populates the feature view's materialized feature data from the offline store, or computes the feature data ad-hoc. For details, see Determining if materialized feature data is being used when reading feature data.
Note that transformations that run during materialization may be computationally or memory intensive; a robustly-provisioned notebook cluster may be required to properly run these.
- (b) If the
ttl
parameter is set for the feature view, then feature data for records with timestamps less thanttl
days prior to the current date are filtered out from the feature data populated in step (1b).
-
Tecton joins the feature data that was generated in steps (1a)-(1c), and sends the result back to the client.
Method 2: Calling <feature view>.get_historical_features() with a spine​
Using this method, a spine DataFrame
is passed to
<feature view>.get_historical_features()
. The spine contains a list of keys
and timestamps to join to the feature view.
Example​
import tecton
import pandas as pd
# Get the feature service
feature_view = tecton.get_workspace("my_workspace").get_feature_view("my_feature_view")
# Construct the "spine" dataframe
spine = pd.read_parquet(
"s3://tecton.ai.public/tutorials/fraud_demo/transactions/data.pq", storage_options={"anon": True}
)[["user_id", "timestamp", "amt", "is_fraud"]].head(1000)
# Get training dataframe from get_historical_features()
training_df = feature_view.get_historical_features(spine).to_spark()
# Show the results, save them somewhere, etc.
training_df.write.parquet("s3://....")
training_df.limit(100).show()
Steps that <feature_view>.get_historical_features() performs​
When called with a spine, <feature_view>.get_historical_features()
generates
feature data by doing the following:
-
(a) Tecton either populates the feature view's materialized feature data from the offline store, or computes the feature data ad-hoc. For details, see Determining if materialized feature data is being used when reading feature data.
Note that transformations that run during materialization may be computationally or memory intensive; a robustly-provisioned notebook cluster may be required to properly run these.
-
(b) Tecton runs an AS OF join (also known as a point-in-time join) on the spine and the feature data populated in step (a), matching on the join key(s) and the effective timestamp. See this section for details on AS OF joins and effective timestamps.
-
(c) If the
ttl
parameter is set for the feature view, then feature data for records with timestamps less thanttl
days prior to the current date are filtered out from the feature data populated in step (b). -
(d) Tecton sends the feature data that was generated, in steps (a)-(c), back to the client.
Method 3: Calling <feature view>.get_historical_features() with a starting timestamp and an ending timestamp​
Using this method, a spine is not passed in to
<feature view>.get_historical_features()
.
Steps that <feature_view>.get_historical_features(), using a starting timestamp and an ending timestamp, performs​
When called with a starting timestamp and an ending timestamp,
<feature_view>.get_historical_features()
generates feature data by doing the
following:
-
(a) Tecton either populates the feature view's materialized feature data from the offline store, or computes the feature data ad-hoc. For details, see Determining if materialized feature data is being used when reading feature data.
Note that transformations that run during materialization may be computationally or memory intensive; a robustly-provisioned notebook cluster may be required to properly run these.
-
(b) If the
ttl
parameter is set for the feature view, then feature data for records with timestamps less thanttl
days prior to the current date arefiltered out from the feature data populated in step (a). -
(c) Tecton sends the feature data, that was generated in steps (a) and (b), back to the client.
Example​
import tecton
# Get the feature service
feature_view = tecton.get_workspace("my_workspace").get_feature_view("my_feature_view")
# Get training dataframe from get_historical_features()
training_df = feature_view.get_historical_features(start_time="<start time>", end_time="<end time>").to_spark()
# Show the results, save them somewhere, etc.
training_df.write.parquet("s3://....")
training_df.limit(100).show()