Skip to main content
Version: Beta 🚧

Connect to S3

To run feature pipelines based on data in S3, Tecton needs the appropriate S3 permissions to connect to your S3 bucket. The following guide shows how to configure these permissions and validate that Tecton is able to connect to your data source.

Adding a Bucket Policy​

To grant Tecton access to your S3 data source, configure a S3 bucket policy that gives your Tecton Account's AWS role read-only access to the data source, such as the following example.

Contact Tecton Support if you do not know your TECTON_ACCOUNT_ARN.

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "TectonS3DataSourceGet",
"Effect": "Allow",
"Principal": {
"AWS": "{TECTON_ACCOUNT_ARN}"
},
"Action": "s3:GetObject",
"Resource": "{YOUR_S3_DATA_SOURCE_ARN}/*"
},
{
"Sid": "TectonS3DataSourceList",
"Effect": "Allow",
"Principal": {
"AWS": "{TECTON_ACCOUNT_ARN}"
},
"Action": "s3:ListObject",
"Resource": "{YOUR_S3_DATA_SOURCE_ARN}"
}
]
}

Testing a S3 Data Source​

To validate that Tecton can read your S3 data source, create a Tecton Data Source definition and test that you can read data from the Data Source.

The FileConfig and pandas_batch_config interfaces can both be used to read from S3. FileConfig provides a simple interface for reading parquet, json, or CSV data on S3. pandas_batch_config provides more flexibility if needed.

The following example shows how to use FileConfig to test reading from S3 in a notebook.

import tecton

# Follow the prompt to complete your Tecton Account sign in
tecton.login("https://<your-account>.tecton.ai")

# Declare a BatchSource
test_ds = BatchSource(
name="test_ds",
batch_config=FileConfig(uri="s3://path-to-your-data/data.pq", file_format="parquet", timestamp_field="timestamp"),
)
test_ds.validate()

# Read sample data
test_ds.get_dataframe().to_pandas().head(10)

Was this page helpful?

🧠 Hi! Ask me anything about Tecton!

Floating button icon