Exporting metrics for an endpoint

Each endpoint publishes a number of performance metrics suitable for monitoring behavior and performance.

Supported Integrations

Datadog

The Verta platform supports exporting endpoint metrics as Custom Metrics to Datadog with the labels indicated below. For more information about using custom metrics within Datadog, please consult the Datadog documentation.

CloudWatch

Endpoint metrics can also be exported to Amazon CloudWatch as Metrics. The labels for each metric below are associated to CloudWatch Metrics as Dimensions. Please see the CloudWatch documentation for information about using Metrics in CloudWatch.

Metric Types

Endpoint State

NameDescriptionMin ValueMax ValuesLabels

state_up

indicates worker is available to service requests

0

1

worker, model_name, model_version, endpoint_path

state_pending

indicates worker is in 'pending' state

0

1

worker, model_name, model_version, endpoint_path

state_allocated

indicates worker is in 'running' state

0

1

worker, model_name, model_version, endpoint_path

state_restart_count

count of the number of worker restarts

0

worker, model_name, model_version, endpoint_path

When an endpoint is being updated, its worker has the following state transition:

pending (waiting for resources) -> allocated (waiting to start) -> up (running)

Endpoint Utilization

NameDescriptionMin ValueMax ValuesLabels

api_throughput

number of requests made to a worker (requests/second)

0

worker, model_name, model_version, endpoint_path

api_latency_avg

average request latency for a worker (seconds)

0

worker, model_name, model_version, endpoint_path

api_latency_p99

99th percentile upper bound for request latency for a worker (seconds)

0

worker, model_name, model_version, endpoint_path

Endpoint Resources

NameDescriptionMin ValueMax ValuesLabels

resources_cpu

[deprecated]

0

worker, model_name, model_version, endpoint_path

resources_cpu_cores

amount of cpu utilized by worker (cores)

0

worker, model_name, model_version, endpoint_path

resources_cpu_ratio

amount of cpu utilized by worker (ratio)

0

1

worker, model_name, model_version, endpoint_path

resources_memory

[deprecated]

0

worker, model_name, model_version, endpoint_path

resources_memory_bytes

amount of memory utilized by worker (bytes)

0

worker, model_name, model_version, endpoint_path

resources_memory_ratio

amount of memory utilized by worker (ratio)

0

1

worker, model_name, model_version, endpoint_path

resources_rx_bytes

amount of received network traffic by worker (bytes/sec)

0

worker, model_name, model_version, endpoint_path

resources_tx_bytes

amount of transmitted network traffic by worker (bytes/sec)

0

worker, model_name, model_version, endpoint_path

Label Definitions

  • worker: name of the worker for the endpoint

  • model_name: name of the registered model configured in the endpoint

  • model_version: version of registered model configured in endpoint

  • endpoint_path: the URI path suffix configured for the endpoint

Last updated