Creating Feature 1
In this topic, you will create and test the first feature,
user_credit_card_issuer
. This feature determines the user's credit card
issuer, based on the user's credit card number.
In your local feature repository, open the file
features/batch_features/user_credit_card_issuer.py
. In the file, uncomment the
following code, which is a definition of the user_credit_card_issuer
Feature
View.
A Feature View defines one or more features, whose values are generated when the Feature View's transformation runs.
The @batch_feature_view
decorator (included in the following code) indicates
that a Batch Feature View is being defined.
from tecton import batch_feature_view, FilteredSource
from entities import user
from data_sources.customers import customers
from datetime import datetime, timedelta
@batch_feature_view(
sources=[FilteredSource(customers)],
entities=[user],
mode="spark_sql",
online=True,
offline=True,
feature_start_time=datetime(2016, 1, 1),
batch_schedule=timedelta(days=1),
ttl=timedelta(days=3650),
timestamp_field="signup_timestamp",
description="User credit card issuer derived from the user credit card number.",
)
def user_credit_card_issuer(customers):
return f"""
SELECT
user_id,
signup_timestamp,
CASE SUBSTRING(CAST(cc_num AS STRING), 0, 1)
WHEN '4' THEN 'Visa'
WHEN '5' THEN 'MasterCard'
WHEN '6' THEN 'Discover'
ELSE 'other'
END as user_credit_card_issuer
FROM
{customers}
"""
The Feature View's transformation​
A transformation is logic that runs against data retrieved from one or more
external data sources. The user_credit_card_issuer
Feature View's
transformation is defined in the user_credit_card_issuer
function that follows
the @batch_feature_view
decorator.
The name of a Feature View is the name of its transformation function. You refer to a Feature View by name when using the Tecton interactive Python classes to read feature data.
The SELECT
statement​
SELECT
runs the SQL statement against every record in the table or file in the
external data source.
Columns in the SELECT
statement​
- A column for the name of each entity in the Feature View. This Feature View
has one entity,
user_id
. Entities are used as join keys when multiple features are joined together. You will see an example of this in part 2 of the tutorial. - The timestamp column. This needed because the Feature View will retrieve historical values from the external data source in order to generate feature values.
- A column for the name of the each feature in the Feature View. This Feature
View has one feature,
user_credit_card_issuer
.
The columns can be in any order.
The FROM
clause​
The FROM
clause contains {customers}
, which is the data source customers
specified in the sources
parameter. This parameter contains the names of one
or more data sources that the Feature View uses. In this case, there is only one
data source. The customers
definition (defined earlier in
data_sources/customers.py
), references the external source-- a file in an S3
bucket-- that the transformation query in the Feature View runs against.
Further information on transformations​
For more information on transformations, see Transformations.
Applying the Feature View​
In your terminal, run tecton apply
to apply the code that you uncommented in
features/user_credit_card_issuer.py
(above) to the workspace that you created
and selected in the setup.
Testing the Feature View​
You can test a Feature View in two ways:
- Test interactively by calling
<feature view>.run()
with a timestamp range. An example is shown in the next section. - Write a unit test, which is a repeatable test that calls
<feature view>.run()
. For more information, see Unit Testing.
You can also test a feature view by calling
<feature view>.get_historical_features()
, which is more flexible than
<feature view>.run()
. For more information, see the
batch feature view get_historical_features()
reference.
Running an interactive test​
Get the feature view from the workspace ws
that you defined in the
setup.
fv = ws.get_feature_view("user_credit_card_issuer")
Call the run()
method of the feature view to get feature data for the
timestamp range of 2022-01-01
to 2022-04-10
, and display the generated
feature values.
(Here, the timestamp range is set arbitrarily. When testing your own Feature Views, set these variables as needed to for the range of times you want to test).
offline_features = fv.run(datetime(2022, 1, 1), datetime(2022, 4, 1)).to_spark().limit(10)
offline_features.show()
Sample output:
user_id | signup_timestamp | user_credit_card_issuer |
---|---|---|
user_460877961787 | 2022-03-09 03:33:09 | Visa |
user_504831693 | 2022-03-12 20:11:22 | Visa |
user_609904782486 | 2022-03-23 13:57:48 | Visa |