Exporting metrics for an endpoint
Each endpoint publishes a number of performance metrics suitable for monitoring behavior and performance.
Supported Integrations
Datadog
The Verta platform supports exporting endpoint metrics as Custom Metrics to Datadog with the labels indicated below. For more information about using custom metrics within Datadog, please consult the Datadog documentation.
CloudWatch
Endpoint metrics can also be exported to Amazon CloudWatch as Metrics. The labels for each metric below are associated to CloudWatch Metrics as Dimensions. Please see the CloudWatch documentation for information about using Metrics in CloudWatch.
Note: A metric will not be exported to CloudWatch unless its scenario has occurred at least once. This means one cannot build a dashboard on api_throughput_5xx_by_endpoint
errors until the endpoint returns at least one 5xx error.
Metric Types
Endpoint State
Name | Description | Min Value | Max Values | Labels |
---|---|---|---|---|
state_up | indicates worker is available to service requests | 0 | 1 | worker, model_name, model_version, endpoint_path |
state_pending | indicates worker is in 'pending' state | 0 | 1 | worker, model_name, model_version, endpoint_path |
state_allocated | indicates worker is in 'running' state | 0 | 1 | worker, model_name, model_version, endpoint_path |
state_restart_count | count of the number of worker restarts | 0 | ∞ | worker, model_name, model_version, endpoint_path |
workers_up_by_endpoint | number of workers currently up for an endoint | 0 | ∞ | model_name, model_version, endpoint_path |
When an endpoint is being updated, its worker has the following state transition:
Endpoint Utilization
Name | Description | Min Value | Max Values | Time Range | Labels |
---|---|---|---|---|---|
api_throughput | rate of requests made to a worker (requests/second) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
api_latency_avg | average request latency for a worker (seconds) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
api_latency_p99 | 99th percentile upper bound for request latency for a worker (seconds) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
api_throughput_by_endpoint_by_code | rate of requests made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path, code |
api_throughput_by_endpoint | rate of requests made to an endpoint, across workers and codes (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_throughput_2xx_by_endpoint | rate of requests with 2xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_throughput_3xx_by_endpoint | rate of requests with 3xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_throughput_4xx_by_endpoint | rate of requests with 4xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_throughput_5xx_by_endpoint | rate of requests with 5xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_throughput_not_2xx_by_endpoint | rate of requests with non-2xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_latency_p99_by_endpoint | 99th percentile upper bound for request latency for an endpoint, across workers (seconds) | 0 | ∞ | 2m | endpoint_path |
api_latency_p50_by_endpoint | 50th percentile upper bound for request latency for an endpoint, across workers (seconds) | 0 | ∞ | 2m | endpoint_path |
api_increase_2xx_by_endpoint | count of requests with 2xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_increase_3xx_by_endpoint | count of requests with 3xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_increase_4xx_by_endpoint | count of requests with 4xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_increase_5xx_by_endpoint | count of requests with 5xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
api_increase_not_2xx_by_endpoint | count of requests with non-2xx codes made to an endpoint, across workers (requests/second) | 0 | ∞ | 2m | endpoint_path |
Endpoint Resources
Name | Description | Min Value | Max Values | Time Range | Labels |
---|---|---|---|---|---|
resources_cpu | [deprecated] | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
resources_cpu_cores | amount of cpu utilized by worker (cores) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
resources_cpu_ratio | amount of cpu utilized by worker (ratio) | 0 | 1 | 2m | worker, model_name, model_version, endpoint_path |
resources_memory | [deprecated] | 0 | ∞ | NA | worker, model_name, model_version, endpoint_path |
resources_memory_bytes | amount of memory utilized by worker (bytes) | 0 | ∞ | NA | worker, model_name, model_version, endpoint_path |
resources_memory_ratio | amount of memory utilized by worker (ratio) | 0 | 1 | NA | worker, model_name, model_version, endpoint_path |
resources_rx_bytes | amount of received network traffic by worker (bytes/sec) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
resources_tx_bytes | amount of transmitted network traffic by worker (bytes/sec) | 0 | ∞ | 2m | worker, model_name, model_version, endpoint_path |
Label Definitions
worker
: name of the worker for the endpointmodel_name
: name of the registered model configured in the endpointmodel_version
: version of registered model configured in endpointendpoint_path
: the URI path suffix configured for the endpoint
Last updated