Log reference data
Verta can monitor data drift by comparing model product distribution against a reference set. The reference set can be a training, test or validation dataset.
Reference data can be uploaded as an artifact in your Registered Model Version. You do not need to upload your entire training set, but a statistically significant representation that mirrors the training distribution.
Given below is the code example:
df_train_reference = pd.read_csv(train_data_filename)
X_train_reference = df_train_reference.iloc[:,:-1]
Y_train_reference = df_train_reference.iloc[:, -1]
model_version.log_reference_data(X_train_reference, Y_train_reference)
The reference data uploaded is stored as an artifact in the registered model version.
Note: With every release, as you update your model with a new model version, the reference data can be refreshed. Verta monitoring system will automatically pick the reference data from the latest model version that the endpoint is running on.
