Log reference data
Verta can monitor data drift by comparing model product distribution against a reference set. The reference set can be a training, test or validation dataset.
Reference data can be uploaded as an artifact in your Registered Model Version. You do not need to upload your entire training set, but a statistically significant representation that mirrors the training distribution.
Given below is the code example:
df_train_reference = pd.read_csv(train_data_filename)
X_train_reference = df_train_reference.iloc[:,:-1]
Y_train_reference = df_train_reference.iloc[:, -1]
model_version.log_reference_data(X_train_reference, Y_train_reference)
The reference data uploaded is stored as an artifact in the registered model version.
Note: With every release, as you update your model with a new model version, the reference data can be refreshed. Verta monitoring system will automatically pick the reference data from the latest model version that the endpoint is running on.
Copy link