deploying models with Tensorflow Serving with models mapped in Dockerfile

78 views Asked by At

I'm trying to deploy two models as REST endpoints using Tensorflow Serving (TFX). I'm having trouble mapping the directories with SavedModel artefacts to TFX model paths.

Contrary to all tutorials and SO questions I've seen, I'm specifying the mapping within a Dockerfile using the base TFX image, by copying model directories from an AWS S3 bucket to model paths. This is because my end goal is to push the image to dockerhub and then build the container in kubernetes as a service. So it's important that I can do the mapping in Dockerfile, rather than via command line arguments.

I've had partial success with the follwing spec:

FROM tensorflow/serving:latest

RUN apt-get update && \
    apt-get install -y --no-install-recommends gcc awscli

# create directories for models
RUN mkdir -p /models/model

# copy SavedModels from S3 to image
RUN aws s3 cp s3://model_dir/model1/ /models/model/1/ --recursive
RUN aws s3 cp s3://model_dir/model2/ /models/model/2/ --recursive

After I build the image and run the container, it seems that model2 is being served, as shown in the log:

2023-11-01 21:36:46.065739: I tensorflow_serving/model_servers/server.cc:74] Building single TensorFlow model file config:  model_name: model model_base_path: /models/model
2023-11-01 21:36:46.065968: I tensorflow_serving/model_servers/server_core.cc:465] Adding/updating models.
2023-11-01 21:36:46.065988: I tensorflow_serving/model_servers/server_core.cc:594]  (Re-)adding model: model
2023-11-01 21:36:46.238609: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 2}
2023-11-01 21:36:46.238657: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 2}
2023-11-01 21:36:46.238671: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 2}
2023-11-01 21:36:46.238747: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:45] Reading SavedModel from: /models/model/2
2023-11-01 21:36:46.273840: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:91] Reading meta graph with tags { serve }
2023-11-01 21:36:46.273879: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:132] Reading SavedModel debug info (if present) from: /models/model/2
2023-11-01 21:36:46.273964: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-01 21:36:46.370149: I external/org_tensorflow/tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:375] MLIR V1 optimization pass is not enabled
2023-11-01 21:36:46.389262: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:231] Restoring SavedModel bundle.
2023-11-01 21:36:46.564435: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:215] Running initialization op on SavedModel bundle at path: /models/model/2
2023-11-01 21:36:46.674443: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:314] SavedModel load for tags { serve }; Status: success: OK. Took 435694 microseconds.
2023-11-01 21:36:46.683500: I tensorflow_serving/servables/tensorflow/saved_model_warmup_util.cc:62] No warmup data file found at /models/model/2/assets.extra/tf_serving_warmup_requests
2023-11-01 21:36:46.759637: I tensorflow_serving/core/loader_harness.cc:95] Successfully loaded servable version {name: model version: 2}
2023-11-01 21:36:46.765908: I tensorflow_serving/model_servers/server_core.cc:486] Finished adding/updating models
2023-11-01 21:36:46.765967: I tensorflow_serving/model_servers/server.cc:118] Using InsecureServerCredentials
2023-11-01 21:36:46.765982: I tensorflow_serving/model_servers/server.cc:383] Profiler service is enabled
2023-11-01 21:36:46.767306: I tensorflow_serving/model_servers/server.cc:409] Running gRPC ModelServer at 0.0.0.0:8500 ...
[warn] getaddrinfo: address family for nodename not supported
2023-11-01 21:36:46.772334: I tensorflow_serving/model_servers/server.cc:430] Exporting HTTP/REST API at:localhost:8501 ...
[evhttp_server.cc : 245] NET_LOG: Entering the event loop ...

Why is only 2 served but not 1?

Then, when I try to get predictions from the model, the only model path that works is this:

curl http://localhost:8501/v1/models/model:predict -d '{"instances": [1.0, 1.5, 2.0, 2.5, 3.0]}'

If I try to use http://localhost:8501/v1/models/model/2:predict, I get "error": "Malformed request: POST /v1/models/model/2:predict", same if I use 1.

So which model is actually being served?

The thing is I thought that integer numbers refer to leaf directories indicating model versions, not model names. But when I try mapping the artefacts to /models/model/model1/ or /models/model1/ or /models/models/model1/1/ or models/model1/1/, the serving fails:

2023-11-01 21:20:07.332818: E tensorflow_serving/sources/storage_path/file_system_storage_path_source.cc:353] FileSystemStoragePathSource encountered a filesystem access error: Could not find base path /models/model for servable model with error NOT_FOUND: /models/model not found

I thought I could substitute model with my model name, but the server seems to expect a directory explicitly called model, but then it only serves one model.

So what would be the correct mapping in my Dockerfile to get both models served with distinct names and model versions?

0

There are 0 answers