Incorrect Offline Retrieval Results
Overview and scope​
This troubleshooting article covers how to diagnose incorrect features returned
from a get_features_for_events()
or get_features_in_range()
call. In this
article, some troubleshooting steps are specific to whether Offline Retrieval
methods using pre-computed feature data. Refer to
Offline Retrieval Methods
to determine whether this is your case.
Resolution​
It is often difficult for Tecton Support Engineers to directly troubleshoot incorrect Offline Retrieval results as we typically do not have access to your notebooks or raw data to debug your issues. We therefore provide the following list of possible causes that you can check:
Naive timezone conversions​
-
Symptom : Feature values are off by one day, but otherwise correct
-
Explanation : Tecton uses UTC as its internal time zone. If you pass in timestamps but missing the time zone identifier, either into your feature views from your data sources, or in your
events
dataframe, then Tecton will assume they are already in UTC. This could be a problem if your timestamps were actually intended to be in a local time zone different than UTC. -
Resolution : Ensure you pass in timestamps with a time zone
data_delay
used​
-
Symptom : Feature values are off by one or more days, but otherwise correct
-
Explanation :
-
If you specify a
data_delay
in your data sources, then you are telling Tecton to wait a certain amount of time to run a materialization job after it normally would. So, if you had adata_delay
of 2 hours and abatch_schedule
of 1 day, Tecton will run materialization jobs every day at 02:00 UTC instead of 00:00 UTC. -
Tecton tries to minimize any skew between training (e.g.
get_features_for_events
output) and inference (HTTP API output). As a result, if you pass in a timestamp of July 3 01:00 UTC, in the above example, Tecton will return features computed from July 1 00:00-23:59, instead of July 2 00:00-23:59, since the July 2 materialization job runs at July 3 02:00 UTC.
-
-
Resolution : Either accept Tecton’s behavior or
data_delay
to yourget_features_for_events()
events
timestamps.
Late-arriving data​
-
Scope : Any version of Tecton SDK
-
Symptom : Offline Retrieval returns different values with
from_source
set toTrue
vs.False
when using built-in (tiled) aggregations -
Explanation background : When you use a built-in aggregation via the
aggregations=
parameter in batch or streaming feature views, Tecton computes a “tile” for eachbatch_schedule
interval of time and rolls them up at serving time (via the Offline Retrieval methods or the HTTP API). For example, if yourbatch_schedule
is 1 day and you are computing the count of transactions over 7 days, then Tecton stores 1 day counts and at request time, returns the sum of these 7 “tiles” of 1 day counts. -
Explanation : Since Tecton creates tiles, if you have data that arrives outside a tile window then Tecton won’t include that data when it writes a tile for
from_source=False
. Example: you have data with a timestamp of July 20 that is written July 21), then Tecton won't include that data in the July 20. However, when you runfrom_source=True
, Tecton pulls the latest version of the data from your data source, so the data would be available then. -
Resolutions :
-
Correct for your late-arriving data issue upstream
-
Be content with (presumably small) variations in
from_source=True
andfrom_source=False
, knowing that theFalse
version is the one that minimizes training/serving skew. -
Use a custom aggregation that re-computes the entire aggregation every day, for example, as opposed to rolling up historical tiles.
-