Endpoint autoscaling
Through an endpoint update, you can configure the deployment's autoscaling behavior: upper and lower bounds for replication, scale-up rate, and metrics to trigger it.

Using the client

Endpoint.update() provides a parameter for configuring the endpoint's autoscaling behavior. It can be used alongside any update strategy.
from verta.endpoint.update import DirectUpdateStrategy
model_version, DirectUpdateStrategy(),
autoscaling takes an Autoscaling object, which itself is used to establish upper and lower bounds for the number of replicas running the model. Autoscaling must also have at least one metric associated with it, which sets a threshold for triggering a scale-up.
from verta.deployment.autoscaling import Autoscaling
from verta.deployment.autoscaling.metrics import CpuUtilizationTarget
autoscaling = Autoscaling(max_replicas=4, min_scale=0.5)
Here, CPU utilization exceeding 75% will lead to more replicas being created. For the full list of available metrics, see the autoscaling-metrics API documentation.
Copy link