0.6.0

March 15, 2023

New Features

Notebook-Driven Development

Notebook-driven development increases the speed of iteration on Tecton feature pipelines by enabling you to author and test feature pipelines directly in your notebook. With notebook-driven development, you can:

Define any Tecton Object, such as Entities, Data Sources, Feature Views, and Feature Services in a notebook
Immediately validate and interact with Tecton Objects, such as previewing data
Create training datasets by combining features already created in a Workspace with new features directly authored in your notebook

To get started with notebook-driven development, see the Tecton Development Workflow.

First-n, First-n Distinct, and Last-n Aggregation Functions

You can now use Tecton’s aggregation functions to do first(n), first_distinct(n), and last(n) aggregations, in addition to the existing last_distinct(n) aggregation.

This family of aggregations are especially powerful when combined with On-Demand Feature Views. For example, to create a feature that captures if a user is making repeated transactions, you could use the last(n) function on their prior transaction amounts and compare it to the current transaction value.

Note that as part of this change, the default feature names for the existing last_distinct(n) aggregations have changed. Please see the upgrade guide for details.

Faster data ingestion for Stream Feature Views

Tecton’s Continuous Processing Mode is now available for all Stream Feature Views, where previously the option was only available when using built-in aggregations.

By using Continuous Processing Mode for Stream Feature Views without aggregations, typical feature ingestion time improves from 1 minute to single-digit seconds.

To get started with Continuous Processing Mode, see Stream Processing Mode

Tecton Access Control and Service Account CLI Commands

The new tecton access-control and tecton service-account commands provide new options for managing Tecton Access Controls through the CLI. You can now view, assign, and remove roles directly from your terminal.

For example, you can use the new commands to create a new Service Account and grant it the ability to request features from our prod workspace.

tecton service-account create \
			--name "sample-service-account" \
			--description "An example for the release notes"
Save this API Key - you will not be able to get it again.
API Key:            <Your-api-key>
Service Account ID: <Your-Service-Account-ID>

tecton access-control assign-role --role consumer \
			--workspace <Your-workspace> \
			--service-account <Your-Service-Account-ID>
Successfully updated role.

tecton access-control get-roles \
			--service-account <Your-Service-Account-ID>
Workspace              Role
================================
<Your-workspace>      consumer

To get started, see the command details with tecton access-control --help and tecton service-account --help.

Query Debugging Tools

Tecton 0.6 brings new explainability and debugging capabilities to the feature development process. For any interactive query that produces a Tecton DataFrame, you can print a query tree using .explain() and step through it to inspect data and diagnose slow queries or queries that return unexpected data.

For more information check out Debugging Queries.

feature_service.get_historical_features(training_events).explain()

Stream Ingest API (Private Preview)

Tecton’s new Stream Ingest API in 0.6 makes publishing real-time data to the Feature Store from any stream or micro-service easy - you can do it via a simple HTTP API call! Tecton makes ingested data available both for online serving and for offline training data generation. Tecton’s Stream Ingest API is fully compatible with Tecton’s aggregations framework - this means that Tecton can even calculate aggregations on top of ingested real-time data. For example, a microservice could ingest raw transactions into Tecton using the Stream Ingest API. An ML application could afterward retrieve the 1-minute aggregate transaction count for a given credit card from Tecton.

Changes, enhancements and resolved issues

New DBR and EMR Supported Versions

Tecton 0.6 extends support for new Databricks Runtime and EMR versions. The lists below show supported versions and the defaults for Tecton 0.5 and 0.6.

Supported Databricks Runtimes:

9.1.x-scala2.12 (Tecton 0.5 default)
10.4.x-scala2.12 (Tecton 0.6 default)
11.3.x-scala2.12

Supported EMR Versions:

emr-6.5.0 (Tecton 0.5 default)
emr-6.7.0 (Tecton 0.6 default)
emr-6.9.0

Unit Testing Interface Improvements

Tecton has made a few minor changes to methods used for running unit tests:

FeatureView.run() has been renamed to FeatureView.test_run(). This new name helps differentiate between the method for unit testing and the method for interactive execution in notebook environments.
start_time and end_time are now required parameters for Batch/StreamFeatureView run() and test_run(). The former default behaviors for start/end time led to a lot of customer confusion.
FeatureView.test_run() does not have a spark parameter for specifying the Spark session. By default, FeatureView.test_run() will use the Tecton-defined Spark session. You can override the Spark session with tecton.set_tecton_spark_session().
Some internal changes were made to ensure the unit testing code path appropriately reflects the production code path. It’s possible some minor changes in behavior will cause tests to fail.

See the Unit Testing guide for more details on how to write unit tests with Tecton 0.6.

`prevent_destroy` parameter

Previously, you could set a tag with the prevent_destroy key to help mitigate the risk that erroneous changes impact production feature pipelines. This functionality is now a top level parameter for Feature Views and Feature Services to make the option more discoverable.

`tecton.get_current_workspace()`

When defining Tecton Objects, it can be helpful to configure conditional logic based on the Workspace applied to. For example, you may want to use On Demand instances for materialization jobs in your production workspaces to improve job reliability, and Spot instances in your staging environment to reduce costs.

The get_current_workspace() method provides a convenient way to implement this conditional logic.

# use prod warehouse only in the prod environment
warehouse = "prod" if get_current_workspace() == "prod" else "dev"
datasource = BatchDataSource(
    name="mytable", batch_config=SnowflakeConfig(warehouse=warehouse, table="mytable", timestamp_field="timestamp")
)

# save costs by materializing farther back only in the prod environment
start_time = datetime(2020, 1, 1) if get_current_workspace() == "prod" else datetime(2023, 1, 1)


@batch_feature_view(
    sources=[FilteredSource(datasource)],
    entities=[customer],
    mode="spark_sql",
    online=True,
    feature_start_time=start_time,
    batch_schedule=timedelta(days=1),
    ttl=timedelta(days=3650),
)
def my_fv(source):
    ...

`tecton.who_am_i()` and `tecton.set_credentials()`

The new tecton.who_am_i() method provides a convenient way to inspect what Tecton credentials you’re using in a Notebook environment. Equivalently, you can use the tecton whoami command in a CLI environment.

The tecton.set_credentials() method for setting the session-level credentials in a Notebook has a new tecton_url argument. This argument can be helpful if you have multiple Tecton instances in your organization.

Finally, tecton.test_credentials() is a convenience method to assert that you have valid credentials and is useful in a notebook environment.

Sunsetting Python 3.7 support

Starting in 0.6, the Tecton SDK and CLI no longer run in Python 3.7 environments. The Tecton SDK and CLI retain compatibility with Python 3.8 and Python 3.9.

The Tecton CLI is also compatible with Python 3.10 and Python 3.11. While the Tecton SDK is likely to work on Python 3.10 and Python 3.11 as well, it has not been tested.

Upgrading to 0.6

Follow this upgrade guide to upgrade to 0.6. The guide outlines all breaking and non-breaking changes.

New Features​

Notebook-Driven Development​

First-n, First-n Distinct, and Last-n Aggregation Functions​

Faster data ingestion for Stream Feature Views​

Tecton Access Control and Service Account CLI Commands​

Query Debugging Tools​

Stream Ingest API (Private Preview)​

Changes, enhancements and resolved issues​

New DBR and EMR Supported Versions​

Unit Testing Interface Improvements​

prevent_destroy parameter​

tecton.get_current_workspace()​

tecton.who_am_i() and tecton.set_credentials()​

Sunsetting Python 3.7 support​

Upgrading to 0.6​