Cache Features for Real-time Inference
This feature is currently in Private Preview.
- Must be enabled by Tecton support. Sdk version must be 0.7.7+.
Tecton Feature Serving Cache reduces both cost and latency of real-time inference for high-scale use-cases. The Tecton Feature Serving Cache will be managed by Tecton and can be configured by users.
Which features should use the Cache?​
- High Traffic, Low Cardinality Key Reads: Ideal for use cases with high traffic where the same keys are repeatedly read, thereby reducing the cost and response times for a request.
- Large Aggregation Intervals: Suitable for features with large aggregation intervals that require extended computation times.
- On Demand Feature Views With Input Features: When an On-demand Feature View has a large number of dependent Feature Views, caching can reduce the execution time of a Feature View by caching the inputs.
Which features should NOT use the Cache?​
- Low Duplication Traffic: The cache should not be used if the inbound feature server traffic does not have high duplication.
- Low Tolerance for Staleness: The cache should not be used if the features requested have a low tolerance for staleness (i.e less than a maximum staleness of 60 seconds). An example are Stream Feature Views that use continuous mode streaming.
Cache Data Model​
The cache is managed by Tecton, with no data being persisted or being cached for more than 24 hours.
Tecton caches the data at the entity key level so multiple Feature Views sharing the same entity keys will be cached under the same cache key. Cached values will be isolated at the workspace level providing the following advantages:
- Fewer number of primary keys in the cache increasing performance.
- Retrieving different Feature Views with the same entity join keys will be more performant.
- Cached values from Feature Views are shared across different Feature Services with the same entity keys and will fetch/store the same values.
Note: Customers should try to reduce the ratio of join key combinations to Feature Views to maximize performance.
Using the Cache​
You can enable caching on a Feature Service by adding a flag to Feature View and a Feature Service as shown in the snippet below:
Note: These options only take effect if you have enabled caching for your account by talking to Tecton support.
# Arguments are case-sensitive and accept only strings as input. The maximum value is one day, and the minimum value is 60 seconds.
from tecton import batch_feature_view, FeatureService
@batch_feature_view(options={"BETA_CACHE_MAX_AGE_SECONDS": "3600"})
def my_cached_feature_view():
return
fs = FeatureService(
feature_views=[my_cached_feature_view, ...],
name="cached_feature_service",
online_serving_enabled=True,
options={"BETA_USE_CACHED_FEATURES": "true"},
)
- The
BETA_CACHE_MAX_AGE_SECONDS
field in theoptions
parameter determines the maximum number of seconds a feature will be cached before it becomes stale. This value must be between 60s and 1 day inclusive.- Increasing the max age will increase your overall cache hit rate but it will also mean that your data will remain in the cache for longer.
- The
BETA_USE_CACHED_FEATURES
field in theoptions
parameter determines whether the Feature Service will attempt to retrieve a cached value from cached Feature Views. If a Feature View with cache options set is part of a Feature Service with caching disabled, then that Feature View will not retrieve cached values. - You can verify that a value is being pulled from the cache by adding the
include_serving_status=true
metadata option in your request to the feature server. See metadata options - The server response metadata will include a
status
field that indicates whether the value was retrieved from the cache or not.
// First Request Response
{
"metadata": {
"features": [
"my_feature_view.feature": {
"status": "PRESENT"
},
...]
}
}
// Second Request Response
{
"metadata": {
"features": [
"my_feature_view.feature": {
"status": "CACHED"
},
...]
}
}
How does it work?​
- When the feature server receives a request, it first checks the cache for the requested features.
- If any of the requested features are not in the cache, the feature server will query the underlying online store for the missing features.
- The feature server will then store the retrieved features in the cache and return the requested features to the client.
- Subsequent requests will then attempt to retrieve the cached value.
Limitations​
- The maximum size of cached data allowed will be 100GB per Tecton Account with
a cap of 100,000 QPS.
- Contact Tecton support for guidance on cache sizings and workload requirements.
- The
online_serving_index
parameter is not supported for cached Feature Views. - The results of an On-Demand Feature View are not cached but the dependent
Feature Views that they rely on can be.
- The metadata status of an On-Demand Feature View will not reflect that its underlying Feature View data has been cached.
- You can instead verify that dependent FVs are cached by retrieving SLO
metadata and checking the
storeResponseSizeBytes
. In this case, if the underlying Feature View is cached, you should see thestoreResponseSizeBytes
decrease. - This functionality may be extended at a later time.
- Effective times will be omitted from the response if the feature is cached.
Running in Production​
See Caching in Production for guidelines on running in production and ballpark numbers on cost savings.