For live endpoints deployed on Verta inference service, model monitoring is fully automated (currently available for tabular data). This capability is deeply integrated into our deployment and registry modules.
This is how it works:
Get started with automated monitoring by following these steps:
You need to log reference data (training data) when you are registering a model. The logged reference data is used for drift detection and compare live vs reference data distributions. Additionally, you will be able to track and visualize feature distrubutions for your training data in the web UI of a Registered Model Version detail page. Learn more about how to log and visualize training data distribution.
This is how you log reference data via client;
#register a modelregistered_model = client.get_or_create_registered_model(name="monitoring-demo")
#create a model versionfrom verta.environment import Pythonfrom verta.utils import ModelAPImodel_version = registered_model.create_standard_model_from_sklearn(model,environment=Python(requirements=["scikit-learn"]),model_api=ModelAPI(X_train, Y_train),name="v1")
#profile training datamodel_version.log_training_data_profile(X_train, Y_train)
Verta uploads profiles of your training data to faciliate downstream monitoring. Individual data points are not uploaded—the client only passes along numerical and categorical distributions of the columns in your data.
View the reference data in Registered Model Version detail page.
When a model is deployed, Verta creates a live endpoint and a monitored entity is automatically created (the monitored entity is directly connected to the endpoint - 1:1 mapping).
Deploy a model using client (you can also deploy with our web UI)
#deploy a modelendpoint = client.create_endpoint("monitoring-demo")endpoint.update(model_version, wait=True)
A monitored entity is automatically logged. The monitored entity takes the name of the endpoint You can access the monitored entity from Web UI, Operations > Monitoring.
Statistical summaries (histograms, missing values etc.) are automatically defined using the information from endpoint and reference data
The system creates a range of histogram distributions, missing value summaries and more. You can visit the "Data Metrics" page to view, query and filter all the logged summaries.
You can navigate to a summary detailed view by clicking on the summary chart title from Data Metrics page.
Drift alerts are automatically configured with pre-defined thresholds (Users also have the ability to updqate system defined thresholds).
The manage alerts page, lists all the configured alerts. Click on the alert configuration to review alerts details, reference data and update alert threshold.
As you start running live predictions, you can query, aggregate and visualize various summary statistics, get alerted on drifts and perform root cause analysis.
The "Dashboard" page in web UI, shows all the alerted summaries. You can also access "Active Alerts" page to review the list of all the active alerts and resolve an alert.
Each alerted summary has a alert summary detail view that shows start and end time of alerts, time series view of the summary, reference sample and differeneces in distribution in order to drill down and perform analysis.
In order to monitor and view high volumes of live data, the system might do downsampling. This occurs in real time, as the data comes in. The downsampling logic is completely configurable for each deployment. For example: we can configure a limit of 10 samples per worker per second. So if you are seeing less resolution of data that expected, it could be because of downsampling.
If a particular summary has an active alert, the charts will show the aggregated samples that are alerting along with the alert window using a red highlighter. On top of the aggregated samples bell icons represent different alerts states.
Red bell icon - Alert is ongoing
Blue bell icon - Current alert has ended
Green bell icon - The alert has been resolved
Given below are links to few end-to-end notebook examples that showcase how to deploy and log automated monitoring.