The most challenging and yet crucial operation in operationalization of models is model deployment and release. Due to the diversity of ML frameworks and libraries, and the lack of common systems for ML development vs. software delivery systems, it takes many months to release models into products.
One of Verta's key innovations is model deployment and release system that works seamlessly with models built in over a dozen frameworks and languages, and integrates with state-of-the-art DevOps and software delivery systems.
Verta endpoint is a containerized microservice serving a model that allows you to make real-time predictions using REST API or using client libraries. An endpoint can be deployed through the client, web app, or CLI interfaces.
An endpoint in Verta can be deployed from a Registered Model Version. You can use Verta Model Registry to manage different model versions and deploy endpoints. An endpoint can also be directly deployed from an Experiment Run.
Models can be packaged in a particular format )(e.g. docker container, Spark UDF, Python package, etc.). It can be pushed for deployment to any infrastructure. Deployments can be pushed to Spark, Kafka, UDF or any custom format. All APIs are exposed, so your CI/CD system can drive any part of the process.
Quick deploy is a one-click deploy option where the Verta platform applies default configurations and chooses an endpoint pathname for you. This is recommended for quick iterations. You can quickly deploy an endpoint and then update the configurations later.
Configurable deploy allows you to select all the configurations in advance before deploying an endpoint. Some of the supported advanced configurations are autoscaling, environment variables, and computational resources.
Verta endpoints provide easy access to the build details and run logs. The Build ID, model details are appended each time the endpoint is updated with a new release.
Autoscaling capabilities allow the system to adapt to changing query loads. Verta models are always warm, i.e., there is always at least one copy of a deployed model running and can instantly serve requests. You can define your scale-up and scale-down replica configuration and choose to trigger autoscaling based on memory utilization, CPU utilization, or the number of requests per worker.
The rollout strategy lets you define how you want to deploy or update your service. You can choose direct or canary rollouts for endpoints.
Direct rollout - With the direct rollout, all your workloads are instantly switched to the new model version.
Canary rollout - Canary deploys update versions by progressively switching traffic from one version of the instances to another. This offers a safe deployment choice with a faster rollback option if the release fails. You can configure the following canary parameters:
Interval: The time duration for each phase of the rollout (30sec, 1m, 15m, 1h, etc.).
Steps: The percentage of production workload to be switched in phase.
Rules: Define one or more rules when you want to auto roll back the release. For example; if the maximum error percentage is above a defined threshold or maximum average latency is more than the acceptable limit etc.
Once an endpoint is live, you can perform different levels of updates.
Make a dynamic update to the endpoint using the same build by changing autoscaling parameters or enabling/disabling autoscaling.
Make a build update using the same image by updating compute resource allocation or environment variables etc.
Perform a completely new roll-out using a different image and configuring all the deployment parameters from scratch.