Model Data Logging and Querying
Please contact Verta at help@verta.ai to set up model data logging in your system.
Overview
In this guide we will focus on Model Data Logging functionality and how to log model inference data and intermediary data for downstream monitoring, querying, and debugging.
Verta’s Model Data Logging capability allows users to log arbitrary key-value pairs, with the key being a string and the value being any JSON-compatible object, during model predictions and have those logs stored in a data-lake compatible format.
The benefits of having a data logging and query workflow include:
Production debugging of prediction requests and any intermediate data with robust search capabilities.
Model pipeline visibility - Log model inference, as well as data from pre and post-processing steps to gain visibility across your model inference pipeline.
Monitoring - Feed the data back to a monitoring system or a data-pipeline solution as needed.
How It Works
The verta.runtime.log() Python client API allows users to log model input, features, intermediate data, prediction information, or any complex data that is JSON compatible as key-value pairs.
Note that runtime.log()
must always be called within the scope of a model's required predict()
method.
Each time a prediction request is sent to the server, the model's predict()
function is wrapped inside an instance of a context manager class. Once the predict()
function has completed, any logs collected therein are written to storage. (Currently AWS S3 ) Writing of the logs to storage occurs after prediction results are returned to minimize added latency.
See the section below on Local Testing and Debugging for instructions on imitating the above behavior for local development.
The following data points are captured and included by default, and thus it is not necessary to add them into your logs:
verta_prediction_id: This is a unique identifier for the prediction request. This can be a custom value provided via the
prediction_id
argument passed to the predict() or predict_with_id() functions, or a random UUID by default.endpoint_id: This is the unique ID number associated with the endpoint called.
time of request: Year, month, and day the prediction request occurred.
Best Practices
Keep it simple!
While it is possible to log any JSON-compatible object, we recommend keeping your logs as simple as possible with individual key-value pairs. This will make it easier to write queries and analyze your logs without complex parsing of deeply nested data structures.
Don't break your schema!
Avoid changing the data type for any key-value pairs in your existing logs, as it could break any existing table schemas.
Don't share keys across models unless they are the same data type.
For any standard set of logs to be applied to all or most of your models, we recommend coordinating that effort with a single template to ensure consistency across your organization.
For more info the default S3 paths and partitioning, see the Schema and Table Generation section below.
Examples
Given below are some examples of model classes that log data during predictions.
Example 1. Basic Model
In this example, we demonstrate basic calls to runtime.log()
within the predict()
function of a model that simply alters and returns a string. Note that the call to runtime.log()
inside the make_loud()
function is valid only because make_loud()
is called within the scope of predict()
.
Assuming your application makes a prediction against the above model like so:
Output:
"ROAR!!!"
The expected logs for this prediction would be:
Note: The prediction_id value is randomly generated. See Custom Prediction IDs below for information on using your own custom ID.
Example 2: Basic Model With Batch Input
Note that because logs are collected within the context of a single prediction request, repeated calls to runtime.log(key, value)
using the same key within the scope of predict()
, will raise an error to prevent overwriting existing data. In this example, logs for the whole batch are aggregated first then logged once under a single key.
Assuming your application makes a prediction against the above model like so:
Output:
["ROAR!!!", "HOWL!!!"]
The expected logs for this prediction would be:
In this example, model_outputs
would be a single column in the resulting table, and querying for individual examples may require flattening or un-nesting the data.
Custom Prediction IDs
When making predictions against a deployed model in Verta via the predict() or predict_with_id() functions, it is possible to set a custom value for the prediction ID via the prediction_id
argument. This value is included by default in any logging data stored, making it another dimension by which you can query log data later.
Local Testing and Debugging
When the predict()
function of a model is called, it is wrapped inside a context manager class by default. It is this context manager that collects all the logs created when calls to runtime.log()
are made. When the prediction request is complete, this context manager is closed and the aggregated logs are stored in an instance attribute. The value of that attribute gets written to storage.
While adding logging to your models you can easily imitate this behavior in order to inspect your logs during development by simply wrapping your calls to predict() or predict_with_id() inside an instance of verta.runtime.context() as in the example below.
This example demonstrates how to view and debug model logs locally.
Output:
{'model_input': 'yell', 'model_output': 'YELL!!!'}
Querying Results
This section describes the process for configuring Athena tables to enable complex querying of your model logging data.
Requirements
AWS Permisions:
S3
Athena
GLUE
IAM:CreateRole
Existing model logs in S3 (At least one model has calls to
verta.runtime.log()
in thepredict()
function so data exists in the S3 bucket described below.)Values passed to
verta.runtime.log()
must be JSON serializable or they will not be queryable by Athena. See section on testing and debugging above for more info.
Useful Documentation
Verta Python Client:
AWS Docs:
Setup Process
When model logs are uploaded to S3, log files are written to the following S3 bucket:
s3://vertaai-model-data-logging/logs/date=YYYY-MM-DD/endpoint_id=XXX/<prediction_id>.json
Amazon Athena Database
You will need an Athena database to hold your new tables. If you already have one configured, skip to the next section.
Access the AWS Glue console, and select the “Databases” option in the navigation pane on the left. Click on “Add Database” in the top right corner.
Give your DB a name and optional description, then click “Create Database”.
Schema and Table Generation
Use AWS Glue to auto-generate schemas and tables for your existing logs.
In the AWS GLUE console, click on "Crawlers" in the navigation pane on the left.
Click on "Create crawler" in the top right corner.
Give the crawler a name and add any optional tags or descriptions, then click "Next".
On the "Choose data sources and classifiers" menu, leave "Not yet" selected for "Is your data already mapped to Glue tables?". Under "Data Sources", click "Add a data source".
Leave the defaults for “Data source” (S3) and “Location of S3 data” (“In this account”). Click “Next”.
Under “S3 path”, enter
s3://vertaai-model-data-logging/logs/
or click "Browse S3". The final entry should have a trailing slash. Leave "Crawl all sub-folders" selected.
Click “Add an S3 data source” at the bottom.
After returning to the "Choose data sources and classifiers" menu, click "Next". (No custom classifiers)
In the “Configure security settings” menu, you will need to select an existing IAM role or create a new one based on your organization’s requirements. We recommend the default option of “Create an IAM role” to ensure that the crawler has the appropriate permissions for the relevant buckets. Inadequate permissions will result in empty tables and empty query results.
On the “Set output and scheduling” menu, select the database created above.
Leave "Table name prefix" blank for a default table name of
verta_ai_model_logs
.For "Maximum table threshold", enter
1
.Under "S3 schema grouping" select the option "Create a single schema for each S3 path" and leave "Table level" blank.
For the "When the crawler detects schema change..." option, select "Add new columns only"
Select "Update all new and existing partitions with metadata from the table".
For "How should AWS Glue handle deleted objects in the data store?", select "Ignore the change and don't update the table in the data catalog".
Select "Create partition indexes automatically".
For the "Crawler Schedule", it is recommended that you select
Daily
orHourly
. ForDaily
, set the "Start hour" to on hour before the start of your typical work day.Finally, review all the configuration options and click “Create crawler”.
Once the crawler has been created, you should wind up on the menu page for the new crawler. Click the “Run crawler” button in the top right corner. Depending on the volume of logging data this may take several minutes to an hour. Monitor the status of the crawler in the "Crawler runs" section at the bottom. When the crawler is run for the first time, the column for “Table changes” should indicate that a new table was created.
Athena Search
Query model logs via Athena.
If prompted, You may need to set up an S3 bucket for query results if one is not already configured.
In the Query Editor, select the Database you configured for Verta model logs.
You should now see the new table under the “Tables” heading.
Example Queries
Generic query to make sure the table is working:
Query to get all logs for a specific endpoint and prediction:
Query to get all logs for a specific timeframe:
Troubleshooting
If the GLUE crawler does not have to correct set of permissions, the resulting table created will be empty. Ensure that the individual performing the setup has the ability to create IAM roles, and allow the GLUE UI to create the role as described in the setup process above.
If you are adding logging to a new or existing model for the first time, you will not be able to query data from that model until after the GLUE crawler created above has been run. You can run it manually, or wait until after the next scheduled run.
The crawler creates a new table partition for each day of the year. If your crawler runs less frequently than daily, you will not see data for any of the days since the most recent run until after the next run.
If you encounter a
HIVE_PARTITION_SCHEMA_MISMATCH
error for a mismatch between two different columns, follow this AWS guide to repair the table by dropping and recreating the affected partitions.If you encounter a
HIVE_BAD_DATA
error, it is possible that the data type for an existing log column has been altered, thus breaking the schema. You will need to delete the underlying S3 data for the day affected, resulting in some data loss. For this reason, we recommend you avoid changing existing logs in a way that uses the same key (column) with a different data type.
Last updated