Serving Machine Learning models in Google Cloud

Part 4: Deploying NVIDIA Triton Inference Server into AI Platform

Image for post
Image for post
Image Credit NASA/JPL/USGS

Before we let loose on a couple of examples, here is some background information that you may find helpful.

Model artifacts in AI Platform are organized in a model/model version hierarchy, looking something like:

To create model version, you will first need to create the model where the model version will reside.

Then you can create the model:

Where config_simple.yaml:

You will need to provide the following parameters:

  • VERSION_NAME: this is the name of this Model Version, such as v2.
  • PATH_TO_MODEL_ARTIFACTS: AI Platform will look to this location to copy the model artifacts into this model version.
  • REGION: region where the container image is located.
  • PROJECT_ID: this is the Project ID where the container image is located.
  • REPO: repository where the container image is located.
  • IMAGE:VERSION: this is the image and version of the container.
  • MODEL_NAME: this must match the name of a model in the model artifacts. In this guide, we are setting up a single model, and this tells Triton the model name of the model you would like Triton to run prediction requests on.
  • REGION: region where the container image is located.

Other model version parameters:

  • Triton by default listens on port 8000 for HTTP requests. This tells AI Platform which port to communicate with Triton.
  • In this example, we will use a n1-standard-4 and one nvidia-tesla-t4 GPU.
  • AIP_STORAGE_URI is an AI Platform-provided environment variable. This is the location where model artifacts are copied to during this model version creation. Content will come from deploymentUri. If your model server requires a location to retrieve model artifacts, use AIP_STORAGE_URI and not deploymentUri.

What’s next

Google Cloud Platform has many other tutorials covering a wide range of topics. Try them out here.

Written by

Solutions Architect at Google Cloud focusing on ML Inference, MLOps, and Distributed Training.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store