Understanding Online Performance and Costs of Aggregation Features
Architecture Background
The data pipeline that Tecton manages for aggregation features consists of the following steps:
- Filter + Projection:
- This is the user-defined SQL / Python transformation that’s specified in a Batch or Stream FeatureView.
- Optional Partial Aggregations:
- When Tecton aggregations with
sliding windows
are configured, Tecton will compute partial aggregation values for each
aggregation interval time window (configured by the
aggregation_interval
parameter for a Feature View).
- When Tecton aggregations with
sliding windows
are configured, Tecton will compute partial aggregation values for each
aggregation interval time window (configured by the
- Online Store:
- When using sliding windows, Tecton will write partial aggregations to the online store.
- If instead Tecton aggregations with
continuous windows
are configured, then Tecton will write all projected events directly to
the online store.
- Note: Continuous windows can only be configured with Stream Feature Views.
- Read-Time Aggregation:
- At feature request time, Tecton will build the final feature vector by aggregating over all relevant partial aggregations or events in real-time.
note
The same behavior is true for the offline path that Tecton manages. The only difference is that the data will be stored in an offline store.
Performance at Feature Request Time​
As a result of the read-time aggregation step mentioned above, feature retrieval latencies will vary depending on the number of rows (partial aggregations or events) being read from the online store and aggregated at the time of the request.
The factors that influence the number of rows in the online store will differ based on whether the window is sliding or continuous.