Log reference data

Verta can monitor data drift by comparing a model's production data distribution against a reference set. The reference set can be a training, test or validation dataset.

Reference data can be uploaded as a dataset into Verta system and is linked to specific Registered Model Version. You do not need to upload your entire training set, but a statistically significant representation that mirrors the training distribution.

Given below is the code example:

dataset: Dataset = client.get_or_create_dataset("census-dataset")
dataset_version = dataset.create_version(Path(["census-train.csv"], enable_mdb_versioning=True))

model_version.log_dataset_version(key='reference', dataset_version=dataset_version)

The reference data uploaded is tracked in Dataset tab and linked to the registered model version in Catalog.

Note: With every release, as you update your model with a new model version, the reference data can be refreshed. Verta monitoring system will automatically pick the reference data from the latest model version that the endpoint is running on.

Last updated