Model Schema Specification and Validation with Pydantic
Introduction
The model schema feature allows you to specify a schema for the input and output data of your deployed model. In addition, you can optionally enforce validation of input and output when predictions are made against the model.
This is an improvement on our previous ModelAPI Schema in a few ways:
Standardization. Model Schema uses OpenAPI standards, which are widely used.
Flexibility. Model Schema allows any OpenAPI schema to be used, including nested schemas, whereas the legacy ModelAPI was built from flattened training dataset columns.
Simplicity. Model Schema integrates seamlessly with Pydantic, which makes it easy to create schemas and correlating JSON objects.
Validation. Model Schema allows you to enforce validation of input and output data, which makes it easier to catch errors and debug issues.
The legacy ModelAPI Schema and its type constraints are still required to use monitoring on the Verta platform. You can use Model Schema in combination with ModelAPI if you want monitoring and Model Schema's validation benefits.
To use the model schema feature, you'll need to follow these steps:
Create a model with the
VertaModelBase.predict()
method defined.If verification is desired, include the
@validate_schema
decorator on the model'spredict()
method.
Log the model schema, as an OpenAPI-compatible JSON schema, to the model version with the
RegisteredModelVersion.log_schema()
method. We highly recommend using Pydantic to make this manageable and wrote this guide with Pydantic in mind.Deploy the model to an endpoint.
Predict as usual.
The model schema feature requires the 2023_07
release of the Verta platform.
This documentation will provide a detailed guide on how best to use Model Schema in combination with Pydantic. We're using Pydantic 1.10 here but the platform supports 2.0 as well. Let's get started!
Prerequisites
Before you can use the model schema feature, you must meet the following prerequisites:
Verta Python library (version 0.24.0 or higher):
pip install "verta>=0.24.0"
.Pydantic (we're using version 1.10 in this guide):
pip install "pydantic==1.10"
.
Getting Started
To use the model schema feature, follow these steps:
Create the input and output classes, which should subclass
BaseModel
from Pydantic. Output is optional; input is required to use the feature. Due to a constraint between Pydantic and Cloudpickle, these classes must be defined in a separate file and imported into your primary script. For example:input.py
:output.py
:Back in your primary script, import the necessary libraries:
Create a Model class that subclasses Verta's
VertaModelBase
and defines thepredict()
method.predict()
accepts and returns dictionaries, which are easily generated from Pydantic objects. For example:In the
predict()
method, you should replace the example code with your own prediction logic that uses your trained model to make predictions on the input data.Create a new model version, log your schemas, and deploy the model version to an endpoint. Providing your input and output class files to
code_dependencies
enables you to access the classes within the live model'spredict()
method. For example:In this step, you should replace "My Model" with the name of your registered model, and "my-model" with the name of the endpoint to which you want to deploy the model. Then customize the
requirements
parameter of the Python environment to include any other required dependencies.
You may get a warning if you're not using the @verify_io
decorator. If you've specified input and output schemas and are using @validate_schema
, this can safely be ignored since input and output will be required to be JSON-compatible.
However, it is safe to use both @validate_schema
and @verify_io
together.
Make a prediction:
Since your model's
predict()
method was decorated with@validate_schema
, the prediction input will be validated against the input schema and the output will be validated against the output schema. If either is invalid, an error will be raised. If an output schema was not provided, only the prediction input will be validated.
If you want to use the model schema feature but don't want to use Pydantic, you can provide an OpenAPI-compatible JSON schema directly to the log_schema()
method. However, we highly recommend using Pydantic.
Model Schema is also available for experiment runs and works in the same way. However, for new use cases, we recommend using model versions over experiment runs whenever possible.
Last updated