Configuring Offline Store Access per Workspace
For Tecton on Databricks or Tecton on EMR deployments, offline materialized features are stored in S3.
This guide explains how to limit a Notebook Cluster's access to feature data from specific workspaces.
Offline Store Paths​
Feature data in the offline store is organized by subdirectory. For workspaces created after November 7, 2022, Feature Views in that workspace will be written to a subdirectory under the workspace name. These subdirectories can be secured by different IAM policies.
Creating per-Workspace Policies​
Workspace subdirectories can be used to give more fine grained read access to materialized features. The following example shows how you can modify the policy in a Notebook instance profile to scope access to the materialized features in a specific workspace.
{
"Sid": "S3ReadOnly${YOUR_WORKSPACE_NAME}",
"Effect": "Allow",
"Actions": [
"s3:Get*",
"s3:List*"
],
"Resource": [
"arn:aws:s3:::tecton-${YOUR_DEPLOYMENT_NAME}/offline-store/ws/${YOUR_WORKSPACE_NAME}"
]
}
Overriding Subdirectories​
You can override the subdirectory for a Feature View if you want to set up a different organizational structure for your offline store.
The following example shows how to override the subdirectory path used for a Feature View.
@batch_feature_view(
...
offline_store=ParquetConfig(
subdirectory_override='${YOUR_CUSTOMIZED_SUBDIRECTORY}'),
...
)
def my_fv(data_source):
pass
To check the exact path for your feature view, you could do so with the
FeatureView.summary()
method and look for the Offline Materialized Data
Location item.
Migrating existing Workspaces and Feature Views​
If your workspace was created before November 7, 2022, and you want to adopt this subdirectory structure for existing workspaces and feature views, please reach out to Tecton Support to initiate the process.
Note that there might be materialization and historical feature retrieval downtime while we are migrating your data.
Expect the following steps during the migration process:
- Pause offline materialization on all your feature views. You can do this by
setting the
offline=False
parameter. Then runtecton apply
. - Tecton will migrate existing data to the workspace subdirectory.
- Re-enable materialization for your feature views. You can do this by setting
offline=True
and then runtecton apply
.