Log reference data

Verta can monitor data drift by comparing a model's production data distribution against a reference set. The reference set can be a training, test or validation dataset.
Reference data can be uploaded as a dataset into Verta system and is linked to specific Registered Model Version. You do not need to upload your entire training set, but a statistically significant representation that mirrors the training distribution.
Given below is the code example:
dataset: Dataset = client.get_or_create_dataset("census-dataset")
dataset_version = dataset.create_version(Path(["census-train.csv"], enable_mdb_versioning=True))
model_version.log_dataset_version(key='reference', dataset_version=dataset_version)
The reference data uploaded is tracked in Dataset tab and linked to the registered model version in Catalog.
Note: With every release, as you update your model with a new model version, the reference data can be refreshed. Verta monitoring system will automatically pick the reference data from the latest model version that the endpoint is running on.