Feature Development Workflow
In Tecton, features are developed and tested in a notebook and then productionized as code within a Tecton feature repository (and optionally, a GitOps workflow to enable an integrated CI/CD workflow).
This gives the benefit of fast iteration speed in a notebook, while preserving DevOps best practices for productionization like version control, code reviews, and CI/CD.
A typical development workflow for building a feature and testing it in a training data set looks like this:
- Create and validate a new feature definition in a notebook
- Run the feature pipeline interactively to ensure correct feature data
- Fetch a set of registered features from a workspace and create a new feature set
- Generate training data to test the new feature in a model
- Copy the new feature definition into your feature repo
- Apply your changes to a live production workspace via the Tecton CLI or a CI/CD pipeline
If you do not need to test the feature in a model, then you would skip steps 3 and 4 above.
This page will walk through these steps in detail.
If you have not already done so, install the Tecton CLI on your local machine and in your notebook environment. Also ensure that your notebook has access to the relevant compute for your Feature Views.
1. Create and validate a local feature definition in a notebook​
Any Tecton object can be defined and validated in a notebook. We call these definitions "local objects".
Simply write the definition in a notebook cell and call .validate()
on the
object. Tecton will ensure the definition is correct and run automatic schema
validations on feature pipelines.
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta
user = Entity(name="user", join_keys=["user_id"])
user_sign_ups = BatchSource(
name="user_sign_ups",
batch_config=FileConfig(
uri="s3://tecton.ai.public/tutorials/fraud_demo/customers/data.pq",
file_format="parquet",
timestamp_field="signup_timestamp",
),
)
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
schema=[Field("user_id", String), Field("signup_timestamp", Timestamp), Field("credit_card_issuer", String)],
)
def user_credit_card_issuer(user_sign_ups):
user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
lambda x: "AmEx"
if str(x)[0] == "3"
else "Visa"
if str(x)[0] == "4"
else "MasterCard"
if str(x)[0] == "5"
else "Discover"
if str(x)[0] == "6"
else "other"
)
return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]
Tecton objects must first be validated before they can be queried interactively.
You can either explicitly validate objects with .validate()
or call
tecton.set_validation_mode('auto')
once in your notebook for automatic lazy
validations at the time of usage.
user_credit_card_issuer.validate() # or call tecton.set_validation_mode('auto')
Depending on registered objects​
Your team's workspace(s) may include registered (a.k.a. applied) data sources, entities, or other objects that you want to build off of. During development, local objects (in your notebook) can depend on registered objects fetched from your workspace.
For example:
from tecton import Entity, BatchSource, FileConfig, batch_feature_view, FilteredSource
from tecton.types import Field, String, Timestamp
from datetime import datetime, timedelta
# Fetch the workspace
ws = tecton.get_workspace("prod")
# Fetch objects from the workspace
user = ws.get_entity("user")
user_sign_ups = ws.get_data_source("user_sign_ups")
# Use those objects as dependencies
@batch_feature_view(
sources=[FilteredSource(user_sign_ups)],
entities=[user],
mode="pandas",
batch_schedule=timedelta(days=1),
schema=[Field("user_id", String), Field("signup_timestamp", Timestamp), Field("credit_card_issuer", String)],
)
def user_credit_card_issuer(user_sign_ups):
user_sign_ups["credit_card_issuer"] = user_sign_ups["cc_num"].apply(
lambda x: "AmEx"
if str(x)[0] == "3"
else "Visa"
if str(x)[0] == "4"
else "MasterCard"
if str(x)[0] == "5"
else "Discover"
if str(x)[0] == "6"
else "other"
)
return user_sign_ups[["user_id", "signup_timestamp", "credit_card_issuer"]]