Need autonomous driving training data? ›

How to Deploy Segmentation Models with TensorFlow Serving

How to Deploy Segmentation Models with TensorFlow Serving

Google’s TensorFlow has emerged as the most popular deep learning framework of our time. TensorFlow makes it easy to design and train machine learning models, many of which are released as research prototypes in the TensorFlow models repository. Using TensorFlow, even individuals and hobbyists can experiment with some of the most sophisticated models currently known in computer vision, natural language processing, and other domains.


TensorFlow also makes it easy to take the next step and deploy these models in a production environment. TensorFlow Serving is a robust, high-performance system able to manage and serve multiple versions of multiple models, using either CPU or GPU hardware. TensorFlow and TensorFlow Serving are, however, rather opinionated: to extract the best performance from them, we have to be able to transform our research prototypes into their preferred idiom. While TensorFlow offers documentation covering relatively simple, straightforward models, it can still be challenging to productionalize more complex models, particularly ones built on other researchers’ code.


Let’s take a recent, cutting-edge computer vision model, the DeepLab V3+ convolutional neural network for semantic segmentation, and see what it takes to deploy it with TensorFlow Serving.




Semantic segmentation is a sophisticated task in computer vision. A semantic segmentation system takes an input image as a rectangular array of pixel values (typically RGB), performs some mathematical alchemy on it, and then assigns to each pixel a semantic class label. That is, for each pixel in the image, the system predicts what kind of object that pixel belongs to, from a predefined list of object classes the system has been trained to recognize. As with many tasks in computer vision, the leading semantic segmentation systems available today are built on deep convolutional neural networks, where the “mathematical alchemy” involved consists of successive layers, perhaps hundreds deep, built out of operations like convolution, pooling, and normalization. The parameters in these layered operations are derived from a set of fully-labeled training data via an optimization technique like backpropagation with stochastic gradient descent. The DeepLab family of neural networks has been one of the top-performing semantic segmentation systems of recent years on many of the standard benchmark challenges.


DeepLab V3+ is the most recent variant, with the researchers’ own implementation available in TensorFlow. It comes complete with a zoo of pretrained models, code to train models on your own custom datasets and classes, and code to export a trained model checkpoint into a frozen graph for some basic inference. However, if we want to run one of these DeepLab models in TensorFlow Serving, we have to export it into the SavedModel format, not a frozen graph.


There are a number of other tutorials on exporting deep computer vision models for TensorFlow Serving, although none as far as we can find specific to this implementation of DeepLab. Gaurav Kaila posted a helpful article on exporting TensorFlow’s object detection framework, which can predict bounding boxes and instance-level segmentation masks. More recently, Thalles Santos Silva posted a tutorial on TensorFlow Serving using their own implementation of DeepLab V3 (not V3+), which includes a lot of useful background detail on the nuts and bolts of TensorFlow Serving.


Mighty AI has been experimenting with the DeepLab V3+ code to train segmentation models on our own proprietary automotive datasets. In the remainder of this post, we will walk through the custom export and client code required to load these models into TensorFlow Serving and deploy them performatively at scale. Before we get deep into the technical details, let’s look at an example of what DeepLab can do: here’s a street scene we took here in Seattle, and the semantic segmentation output by a DeepLab model running in TensorFlow Serving where we have mapped the integer class labels into RGB color values for ease of visual inspection.



Exporting DeepLab V3+

In principle, exporting a trained checkpoint into a SavedModel is straightforward. We can use tensorflow.train.saver to restore the data from a checkpoint into a TensorFlow session, and then tensorflow.saved_model.builder.SavedModelBuilder with just one extra ingredient to export from that session into SavedModel format. That extra ingredient is a SignatureDef to keep track of the input and output tensors from the model in question. A SignatureDef requires three pieces of data:

  1. a dictionary of inputs, mapping string names to TensorInfo objects built from all the necessary input placeholder tensors for your model;
  2. a dictionary of outputs, mapping string names to TensorInfo objects built from all the desired output placeholder tensors for your model;
  3. and a method name, one of the three defined as signature constants: for other models, the CLASSIFY or REGRESS methods may be appropriate, but for our purposes we will use the more flexible PREDICT method. This gives us the most control over the inputs and outputs.


All the subtlety comes in defining these input and output placeholder tensors. We must generate a recipe in the TensorFlow idiom using TensorFlow operations laying out how to compute these output tensors from the input tensors. Here for the first time we must consider the details of the model we’re trying to export. Much of what we need can be gleaned from the authors’ code to export to frozen inference graph format: let us see how to adapt this to our needs.


First, we expect there to be only one input: a single RGB image, of arbitrary height and width. We will represent this with a “placeholder” image tensor, of shape [1, None, None, 3].

  • 1 is the batch size: we process a single image at a time.
  • The first “None” is the height: by leaving this as “None”, we can fill it with an image of any height later.
  • The second “None” is the width, similarly.
  • Finally 3 is the number of channels per image.


Furthermore, we will use the usual convention of representing images with unsigned 8-bit integers.


   input_image = tf.placeholder(tf.uint8, [1, None, None, 3])
   original_image_size = tf.shape(input_image)[1:3]


Before we can pass an image through a DeepLab model, we must first perform a little pre-processing. This requires resizing the image to fit within a receptive field of a size that is fixed during the export process, casting to 32-bit floating point format, and subtracting a mean pixel value derived from the training dataset. In the authors’ original export code, this receptive field size is set by command line arguments handled using TensorFlow’s flags system. We must resize the image so that the height is no more than FLAGS.crop_size[0], and so that the width is no more than FLAGS.crop_size[1]. Let us perform this resizing ourselves, before we start applying the DeepLab preprocessing utilities:


   height = tf.cast(original_image_size[0], tf.float64)

   width = tf.cast(original_image_size[1], tf.float64)

   # Squeeze the dimension in axis=0 since preprocess_image_and_label assumes

   # image to be 3-D.

   image = tf.squeeze(input_image, axis=0)

   # Resize the image so that height <= FLAGS.crop_size[0]

   # and width <= FLAGS.crop_size[1]

   height_ratio = FLAGS.crop_size[0] / original_image_size[0]

   width_ratio = FLAGS.crop_size[1] / original_image_size[1]

   resize_ratio = tf.minimum(height_ratio, width_ratio)

   target_height = tf.to_int32(tf.floor(resize_ratio * height))

   target_width = tf.to_int32(tf.floor(resize_ratio * width))

   target_size = (target_height, target_width)

   image = tf.image.resize_images(







Now we can apply the authors’ preprocessing:


   resized_image, image, _ = input_preprocess.preprocess_image_and_label(












Since we have already shrunk our image down to fit within the receptive field, we don’t need to worry about the flags like min_resize_value, max_resize_value, or resize_factor. We will, however, need to be careful to set the model_variant flag as appropriate to our model! This DeepLab implementation supports a number of different convolutional network architectures for its feature extractor. These model_variant values are given in the DeepLab team’s model zoo for their pretrained networks. If you perform transfer learning with your own dataset on top of one of these models, the model_variant will remain the same (e.g. “xception_65”).


Now that we’ve resized and normalized our image, we can pass it through the network. This version of DeepLab allows us to run inference in both single- and multi-scale modes, fixed at export time by the FLAGS.inference_scales parameter: the default, [1.0], perform single-scale inference. Multi-scale inference will take somewhat longer to run but will generally produce smoother, more accurate output. Again, we can largely re-use the authors’ export code for this step; note that this requires us to manually pass in the total number of classes the model was trained to predict, as FLAGS.num_classes.


   resized_image_size = tf.shape(resized_image)[:2]


   # Expand the dimension in axis=0, since the following operations assume the

   # image to be 4-D.

   image = tf.expand_dims(image, 0)


   model_options = common.ModelOptions(

       outputs_to_num_classes={common.OUTPUT_TYPE: FLAGS.num_classes},






   if tuple(FLAGS.inference_scales) == (1.0,):“Exported model performs single-scale inference.”)

       predictions = model.predict_labels(





   else:“Exported model performs multi-scale inference.”)

       predictions = model.predict_labels_multi_scale(







   predictions = tf.cast(predictions[common.OUTPUT_TYPE], tf.float32)

   # Crop the valid regions from the predictions.

   semantic_predictions = tf.slice(


       [0, 0, 0],

       [1, resized_image_size[0], resized_image_size[1]]


   # Resize back the prediction to the desired output size.

   def _resize_label(label, label_size):

       # Expand dimension of label to [1, height, width, 1] for

       # resize operation.

       label = tf.expand_dims(label, 3)

       resized_label = tf.image.resize_images(






       return tf.cast(tf.squeeze(resized_label, 3), tf.int64)

   semantic_predictions = _resize_label(




   semantic_predictions = tf.identity(




After this final resizing operation, we should end up with an output tensor of class labels that has been resized (using the nearest neighbor method, to preserve the exact class labels as far as possible) to the original image dimensions.


Also note that we are casting the data type of this output to tf.int64. This is different from the original authors’ export code, which uses tf.int32. We have found that it is essential to cast integers to int64 to get results back from TensorFlow Serving, if using the official prebuilt Docker images. This is by far the easiest way to start running TensorFlow Serving; it can be quite challenging to build from source.


We have now defined a complete pipeline of tensor operations from a single input image tensor to an output semantic prediction tensor. We can now define a SignatureDef and invoke the SavedModelBuilder to complete the export.


Let’s package this all together into a single Python script. One thing to note: to properly import code from the TensorFlow models repository, we should ensure that we have cloned this repo from GitHub and added it to our PYTHONPATH environment variable. Finally, TensorFlow serving expects its SavedModel files to live in a directory structure like <model name>/<version number>, for versioning of models. We will enforce this by passing in flags export_dir and model_version from the command line, and exporting our SavedModel under the path




Export Script


See the complete export script as a GitHub gist:


Running Exported Models in TensorFlow Serving


Suppose we have exported a trained DeepLab model using this code under the path




To run this with TensorFlow serving in Docker is now as easy as


   docker run -p 8500:8500 -v /data/my_deeplab_model:/models/my_deeplab_model \

   -e MODEL_NAME=my_deeplab_model tensorflow/serving:1.12.0    


To write a Python client for our exported model, we will need to install the tensorflow-serving-api Python package. To read in input images, and write the output segmentation maps to lossless PNG format, we will also install the imageio package.


Client Script


See a complete client script as a GitHub gist:



TensorFlow Serving makes it easy to deploy cutting-edge computer vision models at scale in the cloud. Any model that you can write in TensorFlow, you can serve in a production environment with a minimal amount of custom code. We hope this tutorial helps you close the gap between prototyping and production.