Serving Machine Learning models in Google Cloud

Part 4: Deploying NVIDIA Triton Inference Server into AI Platform

Kevin Tsai
2 min readOct 29, 2020
Image Credit NASA/JPL/USGS

Before we let loose on a couple of examples, here is some background information that you may find helpful.

Model artifacts in AI Platform are organized in a model/model version hierarchy, looking something like:

model: customer_propensity
version: v01
version: v02
version: v03
model: inventory_forecast
version: v02
version: v03

To create model version, you will first need to create the model where the model version will reside.

gcloud ai-platform models create customer_propensity --region us-central1

Then you can create the model:

gcloud beta ai-platform versions create v02 \
--model customer_propensity \
--accelerator count=1,type=nvidia-tesla-t4 \
--config config_simple.yaml
--region us-central1

Where config_simple.yaml:

autoScaling:
minNodes: 1
container:
args:
- tritonserver
- --model-repository=$(AIP_STORAGE_URI)
env: []
image: $REGION-docker.pkg.dev/$PROJECT_ID/$REPO/$IMAGE:VERSION
ports:
containerPort: 8000
deploymentUri: $PATH_TO_MODEL_ARTIFACTS
machineType: n1-standard-4
routes:
health: /v2/models/$MODEL_NAME
predict: /v2/models/$MODEL_NAME/infer

You will need to provide the following parameters:

  • VERSION_NAME: this is the name of this Model Version, such as v2.
  • PATH_TO_MODEL_ARTIFACTS: AI Platform will look to this location to copy the model artifacts into this model version.
  • REGION: region where the container image is located.
  • PROJECT_ID: this is the Project ID where the container image is located.
  • REPO: repository where the container image is located.
  • IMAGE:VERSION: this is the image and version of the container.
  • MODEL_NAME: this must match the name of a model in the model artifacts. In this guide, we are setting up a single model, and this tells Triton the model name of the model you would like Triton to run prediction requests on.
  • REGION: region where the container image is located.

Other model version parameters:

  • Triton by default listens on port 8000 for HTTP requests. This tells AI Platform which port to communicate with Triton.
  • In this example, we will use a n1-standard-4 and one nvidia-tesla-t4 GPU.
  • AIP_STORAGE_URI is an AI Platform-provided environment variable. This is the location where model artifacts are copied to during this model version creation. Content will come from deploymentUri. If your model server requires a location to retrieve model artifacts, use AIP_STORAGE_URI and not deploymentUri.

What’s next

Google Cloud Platform has many other tutorials covering a wide range of topics. Try them out here.

--

--

Kevin Tsai
Kevin Tsai

Written by Kevin Tsai

Applied AI Engineering at Google

No responses yet