Production SLOs
This page details the Service Level Objective (SLO) for the Feature Server that Tecton commits to upholding. The SLO covers both Reliability and Latency, based on Service Level Indicators (SLIs) described below. The indicators are determined by Tecton on the server side, and compliance with the SLO is evaluated on a 30 day rolling period.
Reliability​
The SLI for reliability is the percentage of requests which do not return a server error (HTTP 5xx). For the purpose of this indicator, client errors (HTTP 4xx) are not considered errors. Additionally, requests which time out on the server side and return 504 do not count against the reliability SLI unless they are SLO-Eligible according to the definition in the Latency section.
The objective for this indicator is 99.95%.
Latency​
The SLI for latency is the percentage of SLO-Eligible requests which complete in 100 ms or less. The objective for this indicator is 99%.
An SLO-Eligible request is a request which:
- Does not require processing more than 2MiB of items returned from the Online Store.
- Reads from Redis take less than 25 ms if using Redis as the Online Store.
For Feature Services with On-Demand Feature Views, Tecton provides an SLO on feature serving time minus the On-Demand Feature View execution time. On-Demand Feature View execution times are not included in the latency SLO since they execute arbitrary user-defined Python code.
SLO Indicators​
Web UI​
Monitoring for both the latency and reliability SLOs are available in the Web UI under the monitoring tab for each Feature Service. In addition to the rate of SLO violations, the rate of requests to the service which are not SLO-Eligible is shown.