It seems like metrics.yaml doesn't apply to my TorchServe service

33 views Asked by At

I'm running TorchServe in WSL2. There are three issues with the metrics:

  1. Even if metrics_config parameter in ts.config points to non-existing file everything works without any problems. It looks like the parameter doesn't work in my case
  2. Even if I comment/remove some metrics (ts_metrics or model_metrics) I can see these metrics in the ts_metrics.log or model_metrics.log
  3. I can't get any of the ts metrics via metrics API. There are only ts_queue_latency_microseconds and ts_inference_requests_total metrics

ts.config:

models={\
  "doc_model": {\
    "1.0": {\
        "defaultVersion": true,\
        "minWorkers": 1,\
        "maxWorkers": 1,\
        "batchSize": 1\
    }\
  }\
}
inference_address=http://0.0.0.0:8080
management_address=http://0.0.0.0:8081
metrics_address=http://0.0.0.0:8082
metrics_mode=prometheus
metrics_config=./metrics.yaml

number_of_netty_threads=32
job_queue_size=1000
model_store=/home/model-server/model-store
workflow_store=/home/model-server/wf-store

metrics.yaml:

dimensions:
  - &model_name "ModelName"
  - &worker_name "WorkerName"
  - &level "Level"
  - &device_id "DeviceId"
  - &hostname "Hostname"

ts_metrics:
  counter:
    # - name: Requests2XX
    #   unit: Count
    #   dimensions: [*level, *hostname]
    - name: Requests4XX
      unit: Count
      dimensions: [*level, *hostname]
    # - name: Requests5XX
    #   unit: Count
    #   dimensions: [*level, *hostname, *model_name]
    - name: ts_inference_requests_total
      unit: Count
      dimensions: [*level,"model_name", "model_version", "hostname"]
    - name: ts_inference_latency_microseconds
      unit: Microseconds
      dimensions: ["model_name", "model_version", "hostname"]
    - name: ts_queue_latency_microseconds
      unit: Microseconds
      dimensions: ["model_name", "model_version", "hostname"]

  histogram:
    - name: NameOfHistogramMetric
      unit: ms
      dimensions: [*model_name, *level]

  gauge:
    - name: QueueTime
      unit: Milliseconds
      dimensions: [*level, *hostname]
    - name: WorkerThreadTime
      unit: Milliseconds
      dimensions: [*level, *hostname]
    - name: WorkerLoadTime
      unit: Milliseconds
      dimensions: [*worker_name, *level, *hostname]
    - name: CPUUtilization
      unit: Percent
      dimensions: [*level, *hostname]
    - name: MemoryUsed
      unit: Megabytes
      dimensions: [*level, *hostname]
    - name: MemoryAvailable
      unit: Megabytes
      dimensions: [*level, *hostname]
    - name: MemoryUtilization
      unit: Percent
      dimensions: [*level, *hostname]
    - name: DiskUsage
      unit: Gigabytes
      dimensions: [*level, *hostname]
    - name: DiskUtilization
      unit: Percent
      dimensions: [*level, *hostname]
    - name: DiskAvailable
      unit: Gigabytes
      dimensions: [*level, *hostname]
    - name: GPUMemoryUtilization
      unit: Percent
      dimensions: [*level, *device_id, *hostname]
    - name: GPUMemoryUsed
      unit: Megabytes
      dimensions: [*level, *device_id, *hostname]
    - name: GPUUtilization
      unit: Percent
      dimensions: [*level, *device_id, *hostname]

model_metrics:
  # Dimension "Hostname" is automatically added for model metrics in the backend
  gauge:
    - name: HandlerTime
      unit: ms
      dimensions: [*model_name, *level]
    - name: PredictionTime
      unit: ms
      dimensions: [*model_name, *level]

metrics API output:

# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 4925586.4
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 3.0
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="57984a8f-c19a-4c93-b9cd-cc7eb8b1fa55",model_name="doc_model",model_version="default",} 291.4

Here is how I build mar:

 torch-model-archiver \
--model-name doc_model \
--version 1.0 \
--serialized-file model/pytorch_model.bin \
--handler ./src/transformers_vectorizer_handler.py \
--extra-files "./model/config.json,./tokenizer" \
-f
    mkdir -p model_store && mv doc_model.mar model_store/

And how I start TS:

torchserve \
--start \
--model-store model_store \
--models doc_model=doc_model.mar \
--ncs \
--ts-config ./ts.config
1

There are 1 answers

0
feeeper On

The problems mentioned in the question were gone when I've updated torchserve and torch-model-archiver to 0.9.0 version