Alerts

This is an alpha feature.

Verta Model Monitoring alerts let you automatically detect data drift, data quality issues, or anomalous performance degradations. You set up customizable alert rules for data drift, model performance and other available metrics. The platform will notify users when an alert threshold is crossed. The real-time alerts surface potential issues and pinpoint problem areas for deep dive analysis and resolution.

Alert types

Given below are the 2 different alert types:

Drift alerts

When a model is deployed in Verta, the system automatically creates drift alert rules for all input features and predictions and uses system defined thresholds.

Drift alert rules help track feature drift and model prediction drift from reference data. Learn how to provide your reference data here.

Drift is computed by measuring distribution changes between the model’s production values against a reference distribution. You can select the drift detection algorithm (Cosine distance or KL Divergence) when creating the alert rule.

Drift alerts can be configured for both features and model predictions.

Metric alerts (e.g. model performance)

Metric alert rules help detect anomalies and/or degradation in all various metrics including model performance metrics (e.g. accuracy, precision, recall, F1, MAE, MSE, true positive, false negative etc.).

The model performance metrics vary based on the model type provided by the user.

Alert rules

The alert rules are fully configurable and can be done with our easy to use interface.

Input fields in the alert rule

  • Alert name - A descriptive name of the alert rule. The name will be displayed in the alert list view.

  • Alert type - There are 2 different alert types supported by the system: Drift alerts and Metrics alerts.

  • Alert condition - Alert condition lets you choose threshold for alerting. You can choose a single threshold value and an operator (<, >, <= etc.). The alert is triggered if the threshold condition is met. You can also opt to get alerted if the data is within or outside a selected a range.

  • Aggregation window - The time window over which the data gets aggregated before they are evaluated for alert condition. You can select the aggregation window from 3 options - 5 minutes, 1 hour and 1 day. Select the aggregation window based on your expected model throughput. For example if your production model has sparse traffic, choose a bigger aggregation window.

  • Evaluation frequency - The duration for which the alert condition must be true before an alert fires. If you specify 5 minutes the condition must be true for 5 minutes before the alert fires. You can select the evaluation frequency from 3 options - 5 minutes, 1 hour and 1 day.

Alert status

Any alert rule can be in 4 different states:

  • Alert - When the alert condition is met during the recent evaluation window and the metric is actively alerting

  • OK - If the alert condition is not met during the recent evaluation window and the metric is normal

  • Pause - If the alert rule is paused by the user they stay in disabled state. Paused alert rules are not evaluated until the user resumes the rules.

  • No data If the alert rule metric has received no data during the evaluation period the status of the alert rule is shown as No data.

Changing alert status

For an actively alerting metric, if the data goes back to normal state during the next evaulation window, the system automatically resolved the alert and the status changes from Alert to 'OK`.

Users have also have the option to change the status from "Alert" to "OK" manually. The bulk update featue can help you take actions on multiple rows.

You can also disable specific alert rules by changing the status to Pause if you no longer want the rule to be evaulated and alerted on.

View active alerts and alert rules

All the alert rules and any active alerts can be accessed from the alerts tab of a model monitoring dashboard.

All the alert rules are sorted alphabetically by alert name. You can search, sort and filter based on various columns in the alert list view.

Actively alerting rules are bubbled up to the top of the list.

Create alert rules

  • Go to Alerts tab and click on Add Rule to create a new alert rule. Enter a unique alert name along with the metric you want to get alerted on.

  • If you are configuring a drift alert, select the feature or prediction class along with distance detection algorithm.

  • If you are configuring a metric alert, select the metric name and output class of the computed metric if applicable.

  • Choose the alert condition threshold(s) and operators (>, <, >=, <=, !=, =, between, not between).

  • Choose aggregation window (5 minutes, 1 hour, 1 day) and evaluation frequency (5 minutes, 1 hour, 1 day)

  • The click on create button and the alert rule will be created.

Modify alert rules

Modify alert rules individually or make bulk updates. You can modify the following fields:

  • Alert condition and threshold value

  • Aggregation window

  • Evaluation frequency

  • Alert status (Alert -> Ok, Alert -> Pause, Ok -> Pause)

You can also delete an alert rule.

Alert history

Alert history gives a you log of all changes to the alert organized by timestamp. You can access the alert history for every rule from the "Action" button. Alert history would include the following:

  • Any updates to the alert condition (e.g. changes to threshold value, aggregation window etc)

  • Any changes to alert status (e.g. Ok -> Alert, Alert -> Ok, Ok -> Pause etc.)

Last updated