Model Data Logging and Querying

Please contact Verta at help@verta.ai to set up model data logging in your system.

Overview

In this guide we will focus on Model Data Logging functionality and how to log model inference data and intermediary data for downstream monitoring, querying, and debugging.

Verta’s Model Data Logging capability allows users to log arbitrary key-value pairs, with the key being a string and the value being any JSON-compatible object, during model predictions and have those logs stored in a data-lake compatible format.

Quick Start Guide

The benefits of having a data logging and query workflow include:

  • Production debugging of prediction requests and any intermediate data with robust search capabilities.

  • Model pipeline visibility - Log model inference, as well as data from pre and post-processing steps to gain visibility across your model inference pipeline.

  • Monitoring - Feed the data back to a monitoring system or a data-pipeline solution as needed.

How It Works

The verta.runtime.log() Python client API allows users to log model input, features, intermediate data, prediction information, or any complex data that is JSON compatible as key-value pairs.

Note that runtime.log() must always be called within the scope of a model's required predict() method.

Each time a prediction request is sent to the server, the model's predict() function is wrapped inside an instance of a context manager class. Once the predict() function has completed, any logs collected therein are written to storage. (Currently AWS S3 ) Writing of the logs to storage occurs after prediction results are returned to minimize added latency.

See the section below on Local Testing and Debugging for instructions on imitating the above behavior for local development.

The following data points are captured and included by default, and thus it is not necessary to add them into your logs:

  • verta_prediction_id: This is a unique identifier for the prediction request. This can be a custom value provided via the prediction_id argument passed to the predict() or predict_with_id() functions, or a random UUID by default.

  • endpoint_id: This is the unique ID number associated with the endpoint called.

  • time of request: Year, month, and day the prediction request occurred.

Best Practices

Keep it simple!

  • While it is possible to log any JSON-compatible object, we recommend keeping your logs as simple as possible with individual key-value pairs. This will make it easier to write queries and analyze your logs without complex parsing of deeply nested data structures.

Don't break your schema!

  • Avoid changing the data type for any key-value pairs in your existing logs, as it could break any existing table schemas.

  • Don't share keys across models unless they are the same data type.

  • For any standard set of logs to be applied to all or most of your models, we recommend coordinating that effort with a single template to ensure consistency across your organization.

  • For more info the default S3 paths and partitioning, see the Schema and Table Generation section below.


Examples

Given below are some examples of model classes that log data during predictions.

Example 1. Basic Model

In this example, we demonstrate basic calls to runtime.log() within the predict() function of a model that simply alters and returns a string. Note that the call to runtime.log() inside the make_loud() function is valid only because make_loud() is called within the scope of predict().

from verta.registry import VertaModelBase, verify_io
from verta import runtime

class LoudEcho(VertaModelBase):
    """ Takes a string and makes it LOUD!!! """
    def __init__(self, artifacts):
        pass
            
    @verify_io
    def predict(self, input: str) -> str:
        runtime.log('model_input', input)  # log pre-processing data
        echo: str = self.make_loud(input)
        return echo

    def make_loud(self, input: str):
        loud_echo: str = input.upper() + '!!!'
        runtime.log('model_output', loud_echo)  # log post-processing data
        return loud_echo

Assuming your application makes a prediction against the above model like so:

client = verta.Client()                                               # create client connection
endpoint = client.get_or_create_endpoint('loud_echo_endpoint_name')   # fetch relevant endpoint
deployed_model = endpoint.get_deployed_model()                        # fetch model
deployed_model.predict('roar')                                        # make a prediction

Output:

"ROAR!!!"

The expected logs for this prediction would be:

{
    "verta_prediction_id": "1b15c5e1-64e4-4a22-9ff4-dc598eee565e",
    "model_input": "roar",
    "model_output": "ROAR!!!"
}

Note: The prediction_id value is randomly generated. See Custom Prediction IDs below for information on using your own custom ID.

Example 2: Basic Model With Batch Input

Note that because logs are collected within the context of a single prediction request, repeated calls to runtime.log(key, value) using the same key within the scope of predict(), will raise an error to prevent overwriting existing data. In this example, logs for the whole batch are aggregated first then logged once under a single key.

from verta.registry import VertaModelBase, verify_io
from verta import runtime
from typing import Any, Dict, List

class LoudEcho(VertaModelBase):
    """ Takes a list of strings and makes each LOUD!!! """
    def __init__(self, artifacts):
        pass
            
    @verify_io
    def predict(self, inputs: List[str]) -> List[str]:
        outputs: List[str] = []
        output_logs: List[Dict[str, Any]] = []
        for input in inputs:
            echo: str = input.upper() + '!!!'
            outputs.append(echo)
            output_logs.append({'input': input, 'output': echo})
        runtime.log('model_outputs', output_logs)  # write aggregated logs once under a single key
        return outputs

Assuming your application makes a prediction against the above model like so:

deployed_model.predict(['roar', 'howl'])  # abbreviated process from Example 1.
# In this case, since prediction_id is not explicitly provided, a random UUID is generated.

Output:

["ROAR!!!", "HOWL!!!"]

The expected logs for this prediction would be:

{
  "verta_prediction_id": "b9d586ed-8b54-43c9-a76d-54222e93738f",
  "model_outputs": [
    {
      "input": "roar",
      "output": "ROAR!!!"
    },
    {
      "input": "howl",
      "output": "HOWL!!!"
    }
  ]
}

In this example, model_outputs would be a single column in the resulting table, and querying for individual examples may require flattening or un-nesting the data.


Custom Prediction IDs

When making predictions against a deployed model in Verta via the predict() or predict_with_id() functions, it is possible to set a custom value for the prediction ID via the prediction_id argument. This value is included by default in any logging data stored, making it another dimension by which you can query log data later.


Local Testing and Debugging

When the predict() function of a model is called, it is wrapped inside a context manager class by default. It is this context manager that collects all the logs created when calls to runtime.log() are made. When the prediction request is complete, this context manager is closed and the aggregated logs are stored in an instance attribute. The value of that attribute gets written to storage.

While adding logging to your models you can easily imitate this behavior in order to inspect your logs during development by simply wrapping your calls to predict() or predict_with_id() inside an instance of verta.runtime.context() as in the example below.

This example demonstrates how to view and debug model logs locally.

from verta.registry import VertaModelBase, verify_io
from verta import runtime

# Given a model class like this one:
class LoudEcho(VertaModelBase):
    """ Takes a string and makes it LOUD!!! """
    def __init__(self, artifacts=None):
        pass
            
    @verify_io
    def predict(self, input: str) -> str:
        runtime.log('model_input', input)
        echo: str = input.upper() + '!!!'
        runtime.log('model_output', echo)
        return echo

# In order to inspect the logs generated by a prediction, nest 
# a direct call to predict() inside a verta.runtime.context() to 
# mimic the expected behavior of the model once deployed.
# `validate=True` ensures the value is JSON serializable.
with runtime.context(validate=True) as ctx:
    result = LoudEcho().predict('yell')
print(ctx.logs())

Output:

{'model_input': 'yell', 'model_output': 'YELL!!!'}


Querying Results

This section describes the process for configuring Athena tables to enable complex querying of your model logging data.

Requirements

  • AWS Permisions:

    • S3

    • Athena

    • GLUE

    • IAM:CreateRole

  • Existing model logs in S3 (At least one model has calls to verta.runtime.log() in the predict() function so data exists in the S3 bucket described below.)

  • Values passed to verta.runtime.log() must be JSON serializable or they will not be queryable by Athena. See section on testing and debugging above for more info.

Useful Documentation

Verta Python Client:

AWS Docs:

Setup Process

When model logs are uploaded to S3, log files are written to the following S3 bucket:

s3://vertaai-model-data-logging/logs/date=YYYY-MM-DD/endpoint_id=XXX/<prediction_id>.json

Amazon Athena Database

You will need an Athena database to hold your new tables. If you already have one configured, skip to the next section.

  1. Access the AWS Glue console, and select the “Databases” option in the navigation pane on the left. Click on “Add Database” in the top right corner.

  2. Give your DB a name and optional description, then click “Create Database”.

Schema and Table Generation

Use AWS Glue to auto-generate schemas and tables for your existing logs.

  1. In the AWS GLUE console, click on "Crawlers" in the navigation pane on the left.

  2. Click on "Create crawler" in the top right corner.

  3. Give the crawler a name and add any optional tags or descriptions, then click "Next".

  4. On the "Choose data sources and classifiers" menu, leave "Not yet" selected for "Is your data already mapped to Glue tables?". Under "Data Sources", click "Add a data source".

  5. Leave the defaults for “Data source” (S3) and “Location of S3 data” (“In this account”). Click “Next”.

  6. Under “S3 path”, enter s3://vertaai-model-data-logging/logs/ or click "Browse S3". The final entry should have a trailing slash. Leave "Crawl all sub-folders" selected.

  1. Click “Add an S3 data source” at the bottom.

  2. After returning to the "Choose data sources and classifiers" menu, click "Next". (No custom classifiers)

  3. In the “Configure security settings” menu, you will need to select an existing IAM role or create a new one based on your organization’s requirements. We recommend the default option of “Create an IAM role” to ensure that the crawler has the appropriate permissions for the relevant buckets. Inadequate permissions will result in empty tables and empty query results.

  4. On the “Set output and scheduling” menu, select the database created above.

  5. Leave "Table name prefix" blank for a default table name of verta_ai_model_logs.

  6. For "Maximum table threshold", enter 1.

  7. Under "S3 schema grouping" select the option "Create a single schema for each S3 path" and leave "Table level" blank.

  8. For the "When the crawler detects schema change..." option, select "Add new columns only"

  9. Select "Update all new and existing partitions with metadata from the table".

  10. For "How should AWS Glue handle deleted objects in the data store?", select "Ignore the change and don't update the table in the data catalog".

  11. Select "Create partition indexes automatically".

  12. For the "Crawler Schedule", it is recommended that you select Daily or Hourly. For Daily, set the "Start hour" to on hour before the start of your typical work day.

  13. Finally, review all the configuration options and click “Create crawler”.

  14. Once the crawler has been created, you should wind up on the menu page for the new crawler. Click the “Run crawler” button in the top right corner. Depending on the volume of logging data this may take several minutes to an hour. Monitor the status of the crawler in the "Crawler runs" section at the bottom. When the crawler is run for the first time, the column for “Table changes” should indicate that a new table was created.

Query model logs via Athena.

  1. If prompted, You may need to set up an S3 bucket for query results if one is not already configured.

  2. In the Query Editor, select the Database you configured for Verta model logs.

  3. You should now see the new table under the “Tables” heading.

Example Queries

Generic query to make sure the table is working:

SELECT *
FROM verta_ai_model_logs
LIMIT 100

Query to get all logs for a specific endpoint and prediction:

SELECT *
FROM verta_ai_model_logs
WHERE endpoint_id = '123'
    AND verta_prediction_id = 'my_prediction_id_here'
-- endpoint_id can be found in the top left corner of the specific endpoint page in the Verta Web App.

Query to get all logs for a specific timeframe:

SELECT *
FROM verta_ai_model_logs
WHERE from_iso8601_date(date) BETWEEN DATE('2023-01-01') AND DATE('2023-01-31')

Troubleshooting

  • If the GLUE crawler does not have to correct set of permissions, the resulting table created will be empty. Ensure that the individual performing the setup has the ability to create IAM roles, and allow the GLUE UI to create the role as described in the setup process above.

  • If you are adding logging to a new or existing model for the first time, you will not be able to query data from that model until after the GLUE crawler created above has been run. You can run it manually, or wait until after the next scheduled run.

  • The crawler creates a new table partition for each day of the year. If your crawler runs less frequently than daily, you will not see data for any of the days since the most recent run until after the next run.

  • If you encounter a HIVE_PARTITION_SCHEMA_MISMATCH error for a mismatch between two different columns, follow this AWS guide to repair the table by dropping and recreating the affected partitions.

  • If you encounter a HIVE_BAD_DATA error, it is possible that the data type for an existing log column has been altered, thus breaking the schema. You will need to delete the underlying S3 data for the day affected, resulting in some data loss. For this reason, we recommend you avoid changing existing logs in a way that uses the same key (column) with a different data type.

Last updated