tecton.PushSource
PushSource has been deprecated and will be fully removed in upcoming SDK
Releases. Please use StreamSource with a PushConfig instead.
Refer to Stream Data Sources with Stream Ingest API for more information.
Summary​
A Push Source is used to configure the Tecton Stream Ingest API for use with a
Stream Feature View.
A Push Source may also contain an optional batch_config for efficiently
backfilling historical feature values.
Example​
from tecton import HiveConfig, PushSource, BatchSource
from tecton.types import Field, Int64, String, Timestamp
# Declare a schema for the Push Source
input_schema = [
Field(name="user_id", dtype=String),
Field(name="event_timestamp", dtype=String),
Field(name="clicked", dtype=Int64),
]
# Declare a Push Source with a name, schema and a batch_config parameters
# See the API documentation for BatchConfig
click_event_source = PushSource(
name="click_event_source",
schema=input_schema,
batch_config=HiveConfig(
database="demo_ads",
table="impressions_batch",
),
description="Sample Push Source for click events",
)
Attributes​
| Name | Data Type | Description |
|---|---|---|
created_at | Optional[datetime.datetime] | The time that this Tecton object was created or last updated. |
data_delay | Optional[datetime.timedelta] | Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed. |
defined_in | Optional[str] | The repo filename where this object was declared. |
description | Optional[str] | Returns the description of the Tecton object. |
id | str | Returns the unique id of the Tecton object. |
info | ||
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
tags | Dict[str,str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
options | Optional[Dict[str, str]] | A map of additional push source data source options. |
Methods​
| Name | Description |
|---|---|
__init__(...) | Creates a new Push Source. |
get_columns() | Returns the column names of the data source’s push schema. |
get_dataframe(...) | Returns the data in this Data Source as a Tecton DataFrame. |
summary() | Displays a human readable summary of this Data Source. |
validate() | Validate this Tecton object and its dependencies (if any). |
__init__(...)​
Creates a new Push Source.
Parameters​
-
name(str) – A unique name of the DataSource. -
description(Optional[str]) – A human-readable description. (Default:None) -
tags(Optional[Dict[str,str]]) – Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). (Default:None) -
owner(Optional[str]) – Owner name (typically the email of the primary maintainer). (Default:None) -
prevent_destroy(bool) – If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be first set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. (Default:False) -
schema(List[Field]) – A schema for the Push Source -
batch_config(Union[FileConfig,HiveConfig,RedshiftConfig,SnowflakeConfig,SparkBatchConfig,None]) – An optional BatchConfig object containing the configuration of the Batch Data Source that backs this Tecton Push Source. The Batch Source’s schema must contain a super-set of all the columns defined in the Push Source schema. (Default:None)
get_columns()​
Returns the column names of the data source’s push schema.
get_dataframe(...)​
Returns the data in this Data Source as a Tecton DataFrame.
Parameters​
-
start_time(Optional[datetime]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True. (Default:None) -
end_time(Optional[datetime]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translatoris True. (Default:None) -
apply_translator(bool) – If True, the transformation specified bypost_processorwill be applied to the dataframe for the data source.apply_translatoris not applicable to batch sources configured withspark_batch_configbecause it does not have apost_processor. (Default:None)
Returns​
A Tecton DataFrame containing the data source’s raw or translated source data.
Raises​
TectonValidationError – If apply_translator is False, but start_time or
end_time filters are passed in.
summary()​
Displays a human readable summary of this Data Source.
validate()​
Validate this Tecton object and its dependencies (if any).
Validation performs most of the same checks and operations as tecton plan.
-
Check for invalid object configurations, e.g. setting conflicting fields.
-
For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Source’s specified s3 path exists or that a Feature View’s SQL code executes and produces supported feature data types.
Objects already applied to Tecton do not need to be re-validated on retrieval
(e.g. my_workspace.get_feature_view('my_fv')) since they have already been
validated during tecton plan.
Locally defined objects (e.g. my_ds = BatchSource(name="my_ds", ...)) may need
to be validated before some of their methods can be called (e.g.
my_feature_view.get_historical_features()).