Serve Keras Model with Tensorflow Serving On AWS Sagemaker
In Part I of the series, we converted a Keras models into a Tensorflow servable saved_model
format and serve and test the model locally using tensorflow_model_server
. Now we should put it in a Docker container and launch it to outer space AWS Sagemaker. The process will be propelled by lots of Bash scripts and config files. And a successful launch will be supported by verification tests written in Python.
Contain It
Inferencing on AWS Sagemaker has two endpoints on port 8080 - /invocations
and /ping
. /invocations
invokes the model and /ping
shows the health status of the endpoint. However, Tensorflow Serving uses port of 8500 for gRPC and 8501 for REST API. To bridge the gap, we use NGINX to proxy the internal ports 8500/8501 to external port 8080. Here’s our nginx.conf
to be copied onto our Docker image.
With that, we can construct a dockerfile
that extends the reference tensorflow/serving
image, install nginx
and git
to the image, copy our model directory and nginx.conf
to the image, and start the NGINX server piping it to the tensorflow_model_server
command. At runtime, the Docker container will execute tensorflow_model_serving
on localhost:8501
and proxy the REST API port to external port 8080 as specified in the nginx.conf
file above.
Now we can build the Docker image using command line:
docker build -t <<name your image here>> .
After building the image, it can be run using this command:
docker run --rm -p 8080:8080 <<name your image here>>
Similar to the first part of the tutorial, it would be prudent to write test code to determine if the Docker container and the model inside Docker container is behaving the same way as the local model.
Push It
In order to deploy a Docker image on AWS Sagemaker, ECS, EKS, etc, we would have to store the Docker image on AWS Elastic Container Registry. We have made a post previously on how to do psuh Docker images to ECR. If you have AWS CLI installed, we have created a convenienece script to automate the process of pushing a local Docker image to ECR.
Serve It
In order for AWS Sagemasker to successfully work and deploy a model, we need to create an AmazonSageMaker-ExecutionRole
from the IAM console and give it both AmazonSagemakerFullAccess
and AmazonS3FullAccess
. It can be done in AWS IAM console. The role allows the container to have full access to both Sagemaker for inferencing and S3 for storing model artifacts.
After that’s done, here’s a bash script to create a Sagemaker model using the Docker image we pushed to ECR.
Now all there’s left is for us to create the actual Sagemaker inference endpoint. Here’s a bash script to create the actual endpoint. We will be using the el cheapo ml.c4.large
in the example. As of 2019-12-20, it costs $0.14 per hour on US-West2
Your model is now deployed on AWS Sagemaker Inference and online! You can quickly verify from the AWS Sagemaker console -> Inference -> Endpoints.
Use It
WARNING: AWS Sagemaker Inferece has a JSON payload size limit of 5MB. However, a typical 100kB image when converted to an image array string will be ~5.5MB. AWS Sagemaker will elegantly throw a ConnectionResetError: [Errno 104] Connection reset by peer
and tell you to go away. Due to this payload limit, Sagemaker or Elastic Inferencing is not a suitable choice at the moement for deployment for anything involves images or large inputs without preprocessing steps. And at the same time including the preprocessing steps tend to slow down the service or blocks the potential for either Sagemaker Inference API or Tensorflow Serving for batch inferencing.
There are two options to send and test requests to the newly airborn endpoint. First, there’s the always useful aws cli
. The following bash script invokes the endpoint with a bogus payload request of [1.0,2.0,3.0]
.
We can also invoke the endpoint programmatically using Python or Node via the boto3
library. Since Python is the weapon of choice for machine learning engineers, here’s a short snippet on how to run the same bogus request as above using Python.
predict_request = f'{{"instances" : [1.0,2.0,3.0]}}'
response = client.invoke_endpoint(EndpointName=ENDPOINT_NAME,
ContentType='application/json',
Body=json.dumps(predict_request))
prediction=response['body'].read()
That’s it. We have shown how to convert Keras models into Tensorflow servable format in our last post. And in this post, we have demonstrated a quick workflow to package the model into Docker, push it to AWS ECR, and create the model on Sagemaker, and deploy it using Sagemaker Inference. As a reference, we have implemented this architecture in our Memefly Project REST API Endpoint.
To cite this content, please use:
@article{
leehanchung,
author = {Lee, Hanchung},
title = {Serve Keras Model with Tensorflow Serving On AWS Sagemaker},
year = {2019},
howpublished = {\url{https://leehanchung.github.io}},
url = {https://leehanchung.github.io/blogs/2019/12/20/TFServing-Sagemaker/}
}