Endpoint.update()
provides a parameter for configuring the endpoint's autoscaling behavior. It can be used alongside any update strategy.autoscaling
takes an Autoscaling
object, which itself is used to establish upper and lower bounds for the number of replicas running the model. Autoscaling must also have at least one metric associated with it, which sets a threshold for triggering a scale-up.autoscaling-metrics
API documentation.