tecton.PushSource
Summary​
A Push Source
is used to configure the Tecton Stream Ingest API for use with a
Stream Feature View.
A Push Source may also contain an optional batch_config
for efficiently
backfilling historical feature values.
Example​
from tecton import HiveConfig, PushSource, BatchSource
from tecton.types import Field, Int64, String, Timestamp
# Declare a schema for the Push Source
input_schema = [
Field(name="user_id", dtype=String),
Field(name="event_timestamp", dtype=String),
Field(name="clicked", dtype=Int64),
]
# Declare a Push Source with a name, schema and a batch_config parameters
# See the API documentation for BatchConfig
click_event_source = PushSource(
name="click_event_source",
schema=input_schema,
batch_config=HiveConfig(
database="demo_ads",
table="impressions_batch",
),
description="Sample Push Source for click events",
)
Attributes​
Name | Data Type | Description |
---|---|---|
created_at | Optional[datetime.datetime] | The time that this Tecton object was created or last updated. |
data_delay | Optional[datetime.timedelta] | Returns the duration that materialization jobs wait after the batch_schedule before starting, typically to ensure that all data has landed. |
defined_in | Optional[str] | The repo filename where this object was declared. |
description | Optional[str] | Returns the description of the Tecton object. |
id | str | Returns the unique id of the Tecton object. |
info | ||
name | str | Returns the name of the Tecton object. |
owner | Optional[str] | Returns the owner of the Tecton object. |
tags | Dict[str,str] | Returns the tags of the Tecton object. |
workspace | Optional[str] | Returns the workspace that this Tecton object belongs to. |
options | Optional[Dict[str, str]] | A map of additional push source data source options. |
Methods​
Name | Description |
---|---|
__init__(...) | Creates a new Push Source. |
get_columns() | Returns the column names of the data source’s push schema. |
get_dataframe(...) | Returns the data in this Data Source as a Tecton DataFrame. |
summary() | Displays a human readable summary of this Data Source. |
validate() | Validate this Tecton object and its dependencies (if any). |
__init__(...)​
Creates a new Push Source.
Parameters​
-
name
(str
) – A unique name of the DataSource. -
description
(Optional
[str
]) – A human-readable description. (Default:None
) -
tags
(Optional
[Dict
[str
,str
]]) – Tags associated with this Tecton Object (key-value pairs of arbitrary metadata). (Default:None
) -
owner
(Optional
[str
]) – Owner name (typically the email of the primary maintainer). (Default:None
) -
prevent_destroy
(bool
) – If True, this Tecton object will be blocked from being deleted or re-created (i.e. a destructive update) during tecton plan/apply. To remove or update this object, prevent_destroy must be first set to False via the same tecton apply or a separate tecton apply. prevent_destroy can be used to prevent accidental changes such as inadvertently deleting a Feature Service used in production or recreating a Feature View that triggers expensive rematerialization jobs. prevent_destroy also blocks changes to dependent Tecton objects that would trigger a recreate of the tagged object, e.g. if prevent_destroy is set on a Feature Service, that will also prevent deletions or re-creates of Feature Views used in that service. prevent_destroy is only enforced in live (i.e. non-dev) workspaces. (Default:False
) -
schema
(List
[Field
]) – A schema for the Push Source -
batch_config
(Union
[FileConfig
,HiveConfig
,RedshiftConfig
,SnowflakeConfig
,SparkBatchConfig
,None
]) – An optional BatchConfig object containing the configuration of the Batch Data Source that backs this Tecton Push Source. The Batch Source’s schema must contain a super-set of all the columns defined in the Push Source schema. (Default:None
)
get_columns()​
Returns the column names of the data source’s push schema.
get_dataframe(...)​
Returns the data in this Data Source as a Tecton DataFrame.
Parameters​
-
start_time
(Optional
[datetime
]) – The interval start time from when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. (Default:None
) -
end_time
(Optional
[datetime
]) – The interval end time until when we want to retrieve source data. If no timezone is specified, will default to using UTC. Can only be defined ifapply_translator
is True. (Default:None
) -
apply_translator
(bool
) – If True, the transformation specified bypost_processor
will be applied to the dataframe for the data source.apply_translator
is not applicable to batch sources configured withspark_batch_config
because it does not have apost_processor
. (Default:None
)
Returns​
A Tecton DataFrame containing the data source’s raw or translated source data.
Raises​
TectonValidationError
– If apply_translator
is False, but start_time
or
end_time
filters are passed in.
summary()​
Displays a human readable summary of this Data Source.
validate()​
Validate this Tecton object and its dependencies (if any).
Validation performs most of the same checks and operations as tecton plan
.
-
Check for invalid object configurations, e.g. setting conflicting fields.
-
For Data Sources and Feature Views, test query code and derive schemas. e.g. test that a Data Source’s specified s3 path exists or that a Feature View’s SQL code executes and produces supported feature data types.
Objects already applied to Tecton do not need to be re-validated on retrieval
(e.g. my_workspace.get_feature_view('my_fv')
) since they have already been
validated during tecton plan
.
Locally defined objects (e.g. my_ds = BatchSource(name="my_ds", ...)
) may need
to be validated before some of their methods can be called (e.g.
my_feature_view.get_historical_features()
).