Version: 0.7

Serverless Feature Retrieval from any Python environment

Public Preview

This feature is currently in Public Preview.

This feature has the following limitations:

Your Athena workgroup must be configured to use Athena Engine 3.
You cannot use the Athena connector to compute features with the `get_historical_features()` method’s `from_source` parameter set to `True`. Features can only be read from the offline store.
The Athena connector is not compatible with Tecton on Snowflake.
Athena supported aggregations are sum, min, max, count, mean, stddev_samp, stddev_pop, var_samp, var_pop

If you have questions or want to share feedback, please file a feature request.

Summary

When using the Tecton SDK with Databricks or EMR, the Tecton SDK leverages the existing Spark context to retrieve features from the offline store. As a result, the SDK methods for offline feature retrieval only work in Python environments that have access to an interactive Spark context (such as Databricks or AWS EMR notebooks).

As an alternative, users can use the Tecton SDK with Athena to access features from Tecton’s offline store in any Python environment that has access to AWS (e.g. your local laptop, a Jupyter notebook, Kubeflow pipelines etc.).

Note that this doesn't apply to those using Tecton on Snowflake as it has no dependency on a Spark context. Tecton on Snowflake users can use the .get_historical_features() method as documented here.

How the SDK works with Athena

The following diagram shows how Athena fits into Tecton’s Architecture:

Architecture Diagram of Athena in Tecton

When you retrieve features from the offline store, the SDK executes the following steps:

Tecton registers tables in AWS Glue for Feature Views and Feature Tables that the user attempts to read data from. Those tables point at the Feature View’s Offline Store location on S3.
Tecton’s SDK builds and executes Athena SQL queries to fetch historical feature data from the S3-based Tecton Offline Store.
Athena writes the query result to S3.
Tecton’s SDK reads the result from S3 and returns a pandas DataFrame.

note

Additionally, if you retrieve features using a pandas DataFrame spine, Tecton will upload the spine to S3 and register an Athena table for it.

Supported Tecton SDK operations

Athena is used for the following SDK operations:

FeatureView.get_historical_features()
FeatureTable.get_historical_features()
FeatureService.get_historical_features()

Otherwise, the Tecton SDK uses a Spark context for other operations that read data, such as DataSource.get_dataframe().

Installation

Your Athena workgroup must be configured to use Athena Engine 3.

To install the Athena connector in your notebook, run:

pip install 'tecton[athena]'

To enable the Athena connector, set SQL_DIALECT and ALPHA_ATHENA_COMPUTE_ENABLED as follows:

import tecton

tecton.conf.set("SQL_DIALECT", "athena")
tecton.conf.set("ALPHA_ATHENA_COMPUTE_ENABLED", False)

Athena session configuration

The following configuration can be set on the Athena Session. All the configurations are optional, except for the workgroup which must be configured to use Athena Engine 3 and specified below:

import tecton_athena

session = tecton_athena.athena_session.get_session()
config = session.config

config.boto3_session = ...  # Optional: Boto3 session
config.workgroup = ...  # Required: Athena workgroup which must be configured to use Athena Engine 3.
config.encryption = ...  # Optional: Valid values: [None, 'SSE_S3', 'SSE_KMS']. Notice: 'CSE_KMS' is not supported.
config.kms_key = ...  # Optional: For SSE-KMS, this is the KMS key ARN or ID.
config.database = ...  # Optional: Name of the Database in the Catalog
config.s3_path = ...  # Optional: S3 Location where spines will be uploaded and Athena output will be stored
config.spine_temp_table_name = ...  # Optional: Specifies the temp table name in the catalog used for Tecton spines

Notes regarding some of these configuration options:

s3_path: (Defaults to a newly created bucket in the user’s region). S3 Path that Athena writes its results to, and that Tecton uploads spines to. Must start with s3://.

database: (Defaults to default). Name of the AWS Glue catalog Tecton registers Athena tables in (both for spines and for Feature Views).

spine_temp_table_name: (Defaults to the empty string). Prefix for the Glue table the SDK registers when a spine, as required by some get_historical_features calls, is uploaded to S3 and registered in S3. This is useful if multiple users use the SDK at the same time.

Required permissions

The environment in which the SDK is used must have the following permissions to AWS services:

S3:
- Read and write access to the S3 path specified in ATHENA_S3_PATH
- Read access to the S3 bucket Tecton uses for the Offline Store
Glue Catalog:
- Read and write access to the catalog specified in ATHENA_DATABASE, including
  - glue:CreateTable
  - glue:DeleteTable
Athena
- Full read access, including
  - athena:StartQueryExecution
  - athena:GetQueryExecution

Authentication

The authentication methods that are available for authenticating your environment with AWS are described here.

On-Demand Feature Views

If you use a Feature Service that depends on On-Demand Feature Views, they will be executed locally directly in the SDK’s client environment.

Serverless Feature Retrieval from any Python environment

Summary​

How the SDK works with Athena​

Supported Tecton SDK operations​

Installation​

Athena session configuration​

Required permissions​

Authentication​

On-Demand Feature Views​

Was this page helpful?