Test Data Sources
Data Sources can be tested in your notebook environment. Use the Tecton SDK to get the Tecton workspace where your Data Source is defined.
import tecton
ws = tecton.get_workspace("my_workspace")
Then get the Data Source.
data_source = ws.get_data_source("users_batch")
Verify that Tecton can connect to and read data from the batch source​
Set the start
and end
times that you will use to filter records from the
batch source.
end = datetime.now()
start = end - timedelta(days=30)
Call the get_dataframe
method of data_source
to get data from the batch
source, filtered by start
and end
:
batch_data_from_tecton = data_source.get_dataframe(start_time=start, end_time=end).to_pandas().head(10)
display(batch_data_from_tecton)
Note that although data_source
points to a stream source,
data_source.get_dataframe()
generates feature values from the batch source.
Verify that Tecton can connect to and read data from stream source​
This section is only applicable to Spark stream sources: Kinesis, Kafka, and Spark Data Source Functions.
Call the start_stream_preview
method on data_source
to write incoming
records from the data source to the TEMP_TABLE_TRANSLATED
table. Set
apply_translator=True
to run the post processor function.
The following command should only be run for a short period of time. The command will continuously read data from the stream source.
data_source.start_stream_preview(
table_name="TEMP_TABLE_TRANSLATED",
apply_translator=True,
option_overrides={"initialPosition": "earliest"},
)
Query the data in the table and display the output:
spark.sql("SELECT * FROM TEMP_TABLE_TRANSLATED LIMIT 10").show()
If no data is returned after running the previous command, run the command again after a short period of time.