Cache Features for Real-time Inference
This feature is currently in Private Preview.
- Must be enabled by Tecton support.
Tecton Feature Serving Cache reduces both cost and latency of real-time inference for high-scale use-cases. The Tecton Feature Serving Cache will be managed by Tecton and can be configured by users.
Which features should use the Cache?​
- High Traffic, Low Cardinality Key Reads: Ideal for use cases with high traffic where the same keys are repeatedly read, thereby reducing the cost and response times for a request.
- Large Aggregation Intervals: Suitable for features with large aggregation intervals that require extended computation times.
- On Demand Feature Views With Input Features: When an On-demand Feature View has a large number of dependent Feature Views, caching can reduce the execution time of a Feature View by caching the inputs.
Which features should NOT use the Cache?​
- Low Duplication Traffic: The cache should not be used if the inbound feature server traffic does not have high duplication.
- Low Tolerance for Staleness: The cache should not be used if the features requested have a low tolerance for staleness (i.e less than a maximum staleness of 60 seconds). An example are Stream Feature Views that use continuous mode streaming.
Cache Data Model​
The cache is managed by Tecton, with no data being persisted or being cached for more than 24 hours.
Tecton caches the data at the entity key level so multiple Feature Views sharing the same entity keys will be cached under the same cache key. Cached values will be isolated at the workspace level providing the following advantages:
- Fewer number of primary keys in the cache increasing performance.
- Retrieving different Feature Views with the same entity join keys will be more performant.
- Cached values from Feature Views are shared across different Feature Services with the same entity keys and will fetch/store the same values.
Note: Customers should try to reduce the ratio of join key combinations to Feature Views to maximize performance.
Using the Cache​
You can enable caching on a Feature Service by adding a flag to Feature View and a Feature Service as shown in the snippet below:
Note: These options only take effect if you have enabled caching for your account by talking to Tecton support.
# Arguments are case-sensitive and accept only strings as input. The maximum value is one day, and the minimum value is 60 seconds.
from tecton import CacheConfig, batch_feature_view, FeatureService
cache_config = CacheConfig(max_age_seconds=3600)
@batch_feature_view(cache_config=cache_config)
def my_cached_feature_view():
return
fs = FeatureService(
feature_views=[my_cached_feature_view, ...],
name="cached_feature_service",
online_serving_enabled=True,
enable_online_caching=True,
)
- The
max_age_seconds
parameter in theCacheConfig
determines the maximum number of seconds a feature will be cached before it becomes stale. This value must be between 60s and 1 day inclusive.- Increasing the max age will increase your overall cache hit rate but it will also mean that your data will remain in the cache for longer.
- The
enable_online_caching
parameter determines whether the Feature Service will attempt to retrieve a cached value from cached Feature Views. If a Feature View with cache options set is part of a Feature Service with caching disabled, then that Feature View will not retrieve cached values. - You can verify that a value is being pulled from the cache by adding the
include_serving_status=true
metadata option in your request to the feature server. See metadata options - The server response metadata will include a
status
field that indicates whether the value was retrieved from the cache or not.
// First Request Response
{
"metadata": {
"features": [
"my_feature_view.feature": {
"status": "PRESENT"
},
...]
}
}
// Second Request Response
{
"metadata": {
"features": [
"my_feature_view.feature": {
"status": "CACHED"
},
...]
}
}
Skipping the cache​
You can add request options to skip the cache entirely. There are two options in your feature request to control skipping the write and/or read operations to the cache.
$ curl -X POST https://<your_cluster>.tecton.ai/api/v1/feature-service/get-features\
-H "Authorization: Tecton-key $TECTON_API_KEY" -d\
'{
"params": {
"workspace_name": "prod",
"feature_service_name": "cache_service",
...
"requestOptions": {
"readFromCache": False,
"writeToCache": False,
}
}
}'
Options​
readFromCache
: Defaults toTrue
. If set toFalse
, the feature server will not read from the cache and will instead recompute the feature value. This does not affect whether or not the feature server will write to your cache.writeToCache
: Defaults toTrue
. If set toFalse
, the feature server will not write to the cache after computing a feature value. This does not affect whether the feature server will read an already cached value for the corresponding request.
How does it work?​
- When the feature server receives a request, it first checks the cache for the requested features.
- If any of the requested features are not in the cache, the feature server will query the underlying online store for the missing features.
- The feature server will then store the retrieved features in the cache and return the requested features to the client.
- Subsequent requests will then attempt to retrieve the cached value.
Limitations​
- The maximum size of cached data allowed will be 100GB per Tecton Account with
a cap of 100,000 QPS.
- Contact Tecton support for guidance on cache sizings and workload requirements.
- The
online_serving_index
parameter is not supported for cached Feature Views. - The results of an On-Demand Feature View are not cached but the dependent
Feature Views that they rely on can be.
- The metadata status of an On-Demand Feature View will not reflect that its underlying Feature View data has been cached.
- You can instead verify that dependent FVs are cached by retrieving SLO
metadata and checking the
storeResponseSizeBytes
. In this case, if the underlying Feature View is cached, you should see thestoreResponseSizeBytes
decrease. - This functionality may be extended at a later time.
- Effective times will be omitted from the response if the feature is cached.
- Different subsets of a feature view are cached separately. For example, if a
feature view
fv
contains featuresf1
,f2
, andf3
and 1 feature service usesfv[f1, f2]
while the other usesfv[f2, f3]
then each of these feature views will be cached separately. - An On-demand Feature View that uses Feature View as inputs will always implicitly use all of the features of that feature view, even if the Feature View is using subsets of those inputs.
Running in Production​
See Caching in Production for guidelines on running in production and ballpark numbers on cost savings.