Data Sources
Tecton can connect to practically any physical batch or stream source of data (e.g. S3, GCS, Snowflake, Redshift, Kafka, Kinesis etc.). To learn how to onboard your existing physical sources to Tecton, please head over to this guide.
This section explains how to use an onboarded physical source of data with a FeatureView. In Tecton's framework, Data Sources are logical objects that define raw data sources that can be used by your FeatureViews as inputs. A Data Source carries typical metadata (such as a name, an owner, or tags). In the case of batch or stream sources of data, they also reference your onboarded physical source of data.
Here's an example of a logical BatchSource
, named fraud_users_batch
, which
references a physical raw Hive table fraud_users
in the Hive database fraud
:
from tecton import HiveConfig, BatchSource
fraud_users_batch = BatchSource(
name="users_batch",
batch_config=HiveConfig(database="fraud", table="fraud_users"),
)
Tecton supports the following Data Source concepts:
BatchSource
: References a physical batch source of raw data, such as a Hive table, a data warehouse table, or a file. Used as an input for a BatchFeatureView.StreamSource
: References a physical stream source (such as a Kafka topic or a Kinesis Stream). It also references a physical batch source, which contains the stream's historical event log (used for backfills). Used as an input for a StreamFeatureView.PushSource
: Defines the expected schema for records sent to the Stream Ingest API. It optionally also references a physical batch source, which contains an historical event log of data (used for backfills). Used as an input for a StreamFeatureView.RequestSource
: Defines the expected schema for request context data that is optionally sent to an OnDemandFeatureView at inference time.