Model Schema Specification and Validation with Pydantic

Introduction

The model schema feature allows you to specify a schema for the input and output data of your deployed model. In addition, you can optionally enforce validation of input and output when predictions are made against the model.

This is an improvement on our previous ModelAPI Schema in a few ways:

  1. Standardization. Model Schema uses OpenAPI standards, which are widely used.

  2. Flexibility. Model Schema allows any OpenAPI schema to be used, including nested schemas, whereas the legacy ModelAPI was built from flattened training dataset columns.

  3. Simplicity. Model Schema integrates seamlessly with Pydantic, which makes it easy to create schemas and correlating JSON objects.

  4. Validation. Model Schema allows you to enforce validation of input and output data, which makes it easier to catch errors and debug issues.

The legacy ModelAPI Schema and its type constraints are still required to use monitoring on the Verta platform. You can use Model Schema in combination with ModelAPI if you want monitoring and Model Schema's validation benefits.

To use the model schema feature, you'll need to follow these steps:

  1. Create a model with the VertaModelBase.predict() method defined.

    1. If verification is desired, include the @validate_schema decorator on the model's predict() method.

  2. Log the model schema, as an OpenAPI-compatible JSON schema, to the model version with the RegisteredModelVersion.log_schema() method. We highly recommend using Pydantic to make this manageable and wrote this guide with Pydantic in mind.

  3. Deploy the model to an endpoint.

  4. Predict as usual.

The model schema feature requires the 2023_07 release of the Verta platform.

This documentation will provide a detailed guide on how best to use Model Schema in combination with Pydantic. We're using Pydantic 1.10 here but the platform supports 2.0 as well. Let's get started!

Prerequisites

Before you can use the model schema feature, you must meet the following prerequisites:

  • Verta Python library (version 0.24.0 or higher): pip install "verta>=0.24.0".

  • Pydantic (we're using version 1.10 in this guide): pip install "pydantic==1.10".

Getting Started

To use the model schema feature, follow these steps:

  1. Create the input and output classes, which should subclass BaseModel from Pydantic. Output is optional; input is required to use the feature. Due to a constraint between Pydantic and Cloudpickle, these classes must be defined in a separate file and imported into your primary script. For example:

    input.py:

    from pydantic import BaseModel
    
    class Input(BaseModel):
        a: int
        b: str

    output.py:

    from pydantic import BaseModel
    
    class Output(BaseModel):
        c: int
        d: str
  2. Back in your primary script, import the necessary libraries:

    from pydantic import BaseModel
    from verta import Client
    from verta.environment import Python
    from verta.registry import VertaModelBase, validate_schema
    from input import Input
    from output import Output
  3. Create a Model class that subclasses Verta's VertaModelBase and defines the predict() method. predict() accepts and returns dictionaries, which are easily generated from Pydantic objects. For example:

    class Model(VertaModelBase):
        def __init__(self, artifacts=None):
            pass
    
        @validate_schema  # enables validation of prediction input and output against the schema
        def predict(self, input):
            input = Input(**input)  # convert input dictionary to Pydantic object
            # replace this with your own prediction logic
            output = Output(c=17, d="goodbye")
            return output.dict()  # convert Pydantic object back to dictionary

    In the predict() method, you should replace the example code with your own prediction logic that uses your trained model to make predictions on the input data.

  4. Create a new model version, log your schemas, and deploy the model version to an endpoint. Providing your input and output class files to code_dependencies enables you to access the classes within the live model's predict() method. For example:

    client = Client()
    
    model_ver = client.get_or_create_registered_model("My Model").create_standard_model(
        Model,
        code_dependencies=["input.py", "output.py"],
        environment=Python(requirements=["pydantic"]),
    )
    model_ver.log_schema(input=Input.schema(), output=Output.schema())  # output is optional
    endpoint = client.get_or_create_endpoint("my-model")
    endpoint.update(model_ver, wait=True)

    In this step, you should replace "My Model" with the name of your registered model, and "my-model" with the name of the endpoint to which you want to deploy the model. Then customize the requirements parameter of the Python environment to include any other required dependencies.

You may get a warning if you're not using the @verify_io decorator. If you've specified input and output schemas and are using @validate_schema, this can safely be ignored since input and output will be required to be JSON-compatible.

However, it is safe to use both @validate_schema and @verify_io together.

  1. Make a prediction:

    deployed_model = endpoint.get_deployed_model()
    input = Input(a=5, b="hello")
    output = deployed_model.predict(input.dict())

    Since your model's predict() method was decorated with @validate_schema, the prediction input will be validated against the input schema and the output will be validated against the output schema. If either is invalid, an error will be raised. If an output schema was not provided, only the prediction input will be validated.

If you want to use the model schema feature but don't want to use Pydantic, you can provide an OpenAPI-compatible JSON schema directly to the log_schema() method. However, we highly recommend using Pydantic.

Model Schema is also available for experiment runs and works in the same way. However, for new use cases, we recommend using model versions over experiment runs whenever possible.

Last updated