Connect to S3
To run feature pipelines based on data in S3, Tecton needs the appropriate S3 permissions to connect to your S3 bucket. The following guide shows how to configure these permissions and validate that Tecton is able to connect to your data source.
Adding a Bucket Policy​
To grant Tecton access to your S3 data source, configure a S3 bucket policy that gives your Tecton Account's AWS role read-only access to the data source, such as the following example.
Contact Tecton Support if you do not know your TECTON_ACCOUNT_ARN
.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "TectonS3DataSourceGet",
"Effect": "Allow",
"Principal": {
"AWS": "{TECTON_ACCOUNT_ARN}"
},
"Action": "s3:GetObject",
"Resource": "{YOUR_S3_DATA_SOURCE_ARN}/*"
},
{
"Sid": "TectonS3DataSourceList",
"Effect": "Allow",
"Principal": {
"AWS": "{TECTON_ACCOUNT_ARN}"
},
"Action": "s3:ListObject",
"Resource": "{YOUR_S3_DATA_SOURCE_ARN}"
}
]
}
Testing a S3 Data Source​
To validate that Tecton can read your S3 data source, create a Tecton Data Source definition and test that you can read data from the Data Source.
The
FileConfig
and
pandas_batch_config
interfaces can both be used to read from S3. FileConfig
provides a simple
interface for reading parquet, json, or CSV data on S3. pandas_batch_config
provides more flexibility if needed.
The following example shows how to use FileConfig
to test reading from S3 in a
notebook.
import tecton
# Follow the prompt to complete your Tecton Account sign in
tecton.login("https://<your-account>.tecton.ai")
# Declare a BatchSource
test_ds = BatchSource(
name="test_ds",
batch_config=FileConfig(uri="s3://path-to-your-data/data.pq", file_format="parquet", timestamp_field="timestamp"),
)
test_ds.validate()
# Read sample data
test_ds.get_dataframe().to_pandas().head(10)