Using the Aggregation Engine
Tecton's Feature Platform comes with an Aggregation Engine, which optimizes both the feature engineering experience and the production performance of aggregation features.
Aggregation features are derived by calculating metrics over a window of time, such as averages, counts, max/min values, etc.
Using Tecton's Aggregation Engine is optional. If you prefer to write your own custom ETL pipelines, please review how write custom batch aggregation features
Quick Example​
from datetime import datetime, timedelta
from tecton import StreamFeatureView, Entity, Aggregation, StreamSource, PushConfig
from tecton.types import String, Timestamp, Float64, Field
from tecton import TimeWindow
# Define the entity for this feature
user = Entity(name="User", join_keys=["user_id"])
# Define a Streaming Push Source
input_schema = [
Field(name="user_id", dtype=String),
Field(name="timestamp", dtype=Timestamp),
Field(name="transaction_amount", dtype=Float64),
]
transactions_source = StreamSource(
name="transactions_source",
stream_config=PushConfig(log_offline=True),
schema=input_schema,
)
# Time window aggregation feature
transaction_aggregations = StreamFeatureView(
name="transaction_aggregations",
source=transactions_source,
entities=[user],
online=True,
offline=True,
feature_start_time=datetime(2018, 1, 1),
aggregations=[
Aggregation(
column="transaction_amount", function="sum", time_window=TimeWindow(window_size=timedelta(days=30))
),
Aggregation(
column="transaction_amount", function="sum", time_window=TimeWindow(window_size=timedelta(days=180))
),
Aggregation(
column="transaction_amount", function="count", time_window=TimeWindow(window_size=timedelta(days=180))
),
Aggregation(
column="transaction_amount", function="last", time_window=TimeWindow(window_size=timedelta(days=180))
),
Aggregation(
column="transaction_amount", function="max", time_window=TimeWindow(window_size=timedelta(days=365 * 4))
),
Aggregation(
column="transaction_amount", function="min", time_window=TimeWindow(window_size=timedelta(days=365 * 4))
),
],
schema=input_schema,
)
This simple streaming example calculates:
- The trailing 30d transaction amount sum
- The trailing 180d transaction amount sum
- The trailing 180d transaction count
- The last transaction amount over the past 180 days
- The max transaction amount in the past 4 years
- The min transaction amount in the past 4 years
Benefits of the Aggregation Engine​
If you define an aggregation feature using Tecton's Aggregation
concept, you
get the following benefits:
Simplicity
- Simple UX: As the example above shows, defining these online/offline-consistent, highly-performant and fresh features requires only a few lines of code
- Preprocessing Flexibility: You can aggregate raw data directly (see example above), or aggregate raw data that you filter / transform using standard Python or SQL
- Backfill Support: Streaming and Batch Features can be backfilled from batch data sources
- Data Source Agnostic: Aggregations are seamlessly supported across Batch, Stream or Push Data Sources
Performance & Efficiency
- Lifetime Window Support: You can aggregate a limited time window, or the entire life-time of data
- Low Latency Serving: Tecton optimizes the stored data to provide ultra-low latency serving, even for large time windows
- Streaming Memory Efficiency: Streaming aggregation features manage their state in Tecton's online store. As a result, they have a very limited memory footprint - even for arbitrarily large time windows. As a result, OOMs that you commonly run into with industry-standard streaming processors are a thing of the past.
- Online Storage Efficiency: Tecton's aggregation engine optimizes the data stored in the online store, significantly reducing the infrastructure cost
- Minimal Backfills: Backfills are done intelligently - Tecton only writes relevant data of the recent past to the online store.
Accuracy
- High Freshness: Time Window aggregations can be anchored to the "present" time, at which the feature data is requested. This allows you to fetch aggregation features that are fresh as of a few milliseconds
- Time Travel Compatible: When generating training data, Tecton's time travel capability ensures that you avoid online/offline skew
- Correctness Guarantees: Writing time window aggregations that can be executed online and offline consistently is hard and getting it wrong is easy. If you use Tecton aggregations, online/offline consistency and accuracy are guaranteed.