Exporting metrics for an endpoint

Each endpoint publishes a number of performance metrics suitable for monitoring behavior and performance.

Supported Integrations

Datadog

The Verta platform supports exporting endpoint metrics as Custom Metrics to Datadog with the labels indicated below. For more information about using custom metrics within Datadog, please consult the Datadog documentation.

CloudWatch

Endpoint metrics can also be exported to Amazon CloudWatch as Metrics. The labels for each metric below are associated to CloudWatch Metrics as Dimensions. Please see the CloudWatch documentation for information about using Metrics in CloudWatch.

Note: A metric will not be exported to CloudWatch unless its scenario has occurred at least once. This means one cannot build a dashboard on api_throughput_5xx_by_endpoint errors until the endpoint returns at least one 5xx error.

Metric Types

Endpoint State

Name	Description	Max Values	Labels
state_up	indicates worker is available to service requests	1	worker, model_name, model_version, endpoint_path
state_pending	indicates worker is in 'pending' state	1	worker, model_name, model_version, endpoint_path
state_allocated	indicates worker is in 'running' state	1	worker, model_name, model_version, endpoint_path
state_restart_count	count of the number of worker restarts	∞	worker, model_name, model_version, endpoint_path
workers_up_by_endpoint	number of workers currently up for an endoint	∞	model_name, model_version, endpoint_path

Name

Description

Min Value

Max Values

Labels

state_up

indicates worker is available to service requests

worker, model_name, model_version, endpoint_path

state_pending

indicates worker is in 'pending' state

worker, model_name, model_version, endpoint_path

state_allocated

indicates worker is in 'running' state

worker, model_name, model_version, endpoint_path

state_restart_count

count of the number of worker restarts

∞

worker, model_name, model_version, endpoint_path

workers_up_by_endpoint

number of workers currently up for an endoint

∞

model_name, model_version, endpoint_path

When an endpoint is being updated, its worker has the following state transition:

pending (waiting for resources) -> allocated (waiting to start) -> up (running)

Endpoint Utilization

Name	Description	Max Values	Time Range	Labels
api_throughput	rate of requests made to a worker (requests/second)	∞	2m	worker, model_name, model_version, endpoint_path
api_latency_avg	average request latency for a worker (seconds)	∞	2m	worker, model_name, model_version, endpoint_path
api_latency_p99	99th percentile upper bound for request latency for a worker (seconds)	∞	2m	worker, model_name, model_version, endpoint_path
api_throughput_by_endpoint_by_code	rate of requests made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path, code
api_throughput_by_endpoint	rate of requests made to an endpoint, across workers and codes (requests/second)	∞	2m	endpoint_path
api_throughput_2xx_by_endpoint	rate of requests with 2xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_throughput_3xx_by_endpoint	rate of requests with 3xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_throughput_4xx_by_endpoint	rate of requests with 4xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_throughput_5xx_by_endpoint	rate of requests with 5xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_throughput_not_2xx_by_endpoint	rate of requests with non-2xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_latency_p99_by_endpoint	99th percentile upper bound for request latency for an endpoint, across workers (seconds)	∞	2m	endpoint_path
api_latency_p50_by_endpoint	50th percentile upper bound for request latency for an endpoint, across workers (seconds)	∞	2m	endpoint_path
api_increase_2xx_by_endpoint	count of requests with 2xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_increase_3xx_by_endpoint	count of requests with 3xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_increase_4xx_by_endpoint	count of requests with 4xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_increase_5xx_by_endpoint	count of requests with 5xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path
api_increase_not_2xx_by_endpoint	count of requests with non-2xx codes made to an endpoint, across workers (requests/second)	∞	2m	endpoint_path

Name

Description

Min Value

Max Values

Time Range

Labels

api_throughput

rate of requests made to a worker (requests/second)

∞

worker, model_name, model_version, endpoint_path

api_latency_avg

average request latency for a worker (seconds)

∞

worker, model_name, model_version, endpoint_path

api_latency_p99

99th percentile upper bound for request latency for a worker (seconds)

∞

worker, model_name, model_version, endpoint_path

api_throughput_by_endpoint_by_code

rate of requests made to an endpoint, across workers (requests/second)

∞

endpoint_path, code

api_throughput_by_endpoint

rate of requests made to an endpoint, across workers and codes (requests/second)

∞

endpoint_path

api_throughput_2xx_by_endpoint

rate of requests with 2xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_throughput_3xx_by_endpoint

rate of requests with 3xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_throughput_4xx_by_endpoint

rate of requests with 4xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_throughput_5xx_by_endpoint

rate of requests with 5xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_throughput_not_2xx_by_endpoint

rate of requests with non-2xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_latency_p99_by_endpoint

99th percentile upper bound for request latency for an endpoint, across workers (seconds)

∞

endpoint_path

api_latency_p50_by_endpoint

50th percentile upper bound for request latency for an endpoint, across workers (seconds)

∞

endpoint_path

api_increase_2xx_by_endpoint

count of requests with 2xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_increase_3xx_by_endpoint

count of requests with 3xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_increase_4xx_by_endpoint

count of requests with 4xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_increase_5xx_by_endpoint

count of requests with 5xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

api_increase_not_2xx_by_endpoint

count of requests with non-2xx codes made to an endpoint, across workers (requests/second)

∞

endpoint_path

Endpoint Resources

Name	Description	Max Values	Time Range	Labels
resources_cpu	[deprecated]	∞	2m	worker, model_name, model_version, endpoint_path
resources_cpu_cores	amount of cpu utilized by worker (cores)	∞	2m	worker, model_name, model_version, endpoint_path
resources_cpu_ratio	amount of cpu utilized by worker (ratio)	1	2m	worker, model_name, model_version, endpoint_path
resources_memory	[deprecated]	∞	NA	worker, model_name, model_version, endpoint_path
resources_memory_bytes	amount of memory utilized by worker (bytes)	∞	NA	worker, model_name, model_version, endpoint_path
resources_memory_ratio	amount of memory utilized by worker (ratio)	1	NA	worker, model_name, model_version, endpoint_path
resources_rx_bytes	amount of received network traffic by worker (bytes/sec)	∞	2m	worker, model_name, model_version, endpoint_path
resources_tx_bytes	amount of transmitted network traffic by worker (bytes/sec)	∞	2m	worker, model_name, model_version, endpoint_path

Name

Description

Min Value

Max Values

Time Range

Labels

resources_cpu

[deprecated]

∞

worker, model_name, model_version, endpoint_path

resources_cpu_cores

amount of cpu utilized by worker (cores)

∞

worker, model_name, model_version, endpoint_path

resources_cpu_ratio

amount of cpu utilized by worker (ratio)

worker, model_name, model_version, endpoint_path

resources_memory

[deprecated]

∞

worker, model_name, model_version, endpoint_path

resources_memory_bytes

amount of memory utilized by worker (bytes)

∞

worker, model_name, model_version, endpoint_path

resources_memory_ratio

amount of memory utilized by worker (ratio)

worker, model_name, model_version, endpoint_path

resources_rx_bytes

amount of received network traffic by worker (bytes/sec)

∞

worker, model_name, model_version, endpoint_path

resources_tx_bytes

amount of transmitted network traffic by worker (bytes/sec)

∞

worker, model_name, model_version, endpoint_path

Label Definitions

worker: name of the worker for the endpoint
model_name: name of the registered model configured in the endpoint
model_version: version of registered model configured in endpoint
endpoint_path: the URI path suffix configured for the endpoint

PreviousQuerying an endpoint NextRunning batch predictions

Last updated 8 months ago