Getting started with TensorRT

WML CE 1.7.0 includes TensorRT. TensorRT is a C++ library provided by NVIDIA which focuses on running pre-trained networks quickly and efficiently for inferencing. Full technical details on TensorRT can be found in the NVIDIA TensorRT Developers Guide.

Installing TensorRT

Support for TensorRT in PyTorch is enabled by default in WML CE. Therefore, TensorRT is installed as a prerequisite when PyTorch is installed.

For detailed instructions to install PyTorch, see Installing the MLDL frameworks.

TensorRT is also available as a standalone package in WML CE. However, those installation details are not covered in this section.

Validate TensorRT installation

You can validate the installation of TensorRT alongside PyTorch, Caffe2, and ONNX by running the following commands:

(my-py3-env) $ python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:18:58)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import caffe2
>>> import tensorrt
>>>

If desired, extended validation of the Caffe2, ONNX and TensorRT features found in PyTorch can be accessed using the caffe2-test script.

The extended tests can be executed as follows:

caffe2-test -t trt/test_trt.py

The tests will take a few minutes to complete.

TensorFlow with NVIDIA TensorRT (TF-TRT)

NVIDIA TensorRT is a plaform for high-performance deep learning inference. Trained models can be optimized with TensorRT; this is done by replacing TensorRT-compatible subgraphs with a single TRTEngineOp that is used to build a TensorRT engine. TensorRT can also calibrate for lower precision (FP16 and INT8) with a minimal loss of accuracy. After a model is optimized with TensorRT, the TensorFlow workflow is still used for inferencing, including TensorFlow-Serving.

A saved model can be optimized for TensorRT with the following python snippet:

from tensorflow.python.compiler.tensorrt import trt_convert as trt
converter = trt.TrtGraphConverterV2(
    input_saved_model_dir=input_saved_model_dir)
converter.convert()
converter.save(output_saved_model_dir)

TensorRT is enabled in the tensorflow-gpu and tensorflow-serving packages.

For additional information on TF-TRT, see the official Nvidia docs.

Code Samples for TensorRT

The sample code provided by NVIDIA can be installed as a separate package.

Installing TensorRT sample code

Install the TensorRT samples into the same virtual environment as PyTorch:

conda install tensorrt-samples
If you plan to run the python sample code, you also need to install PyCuda:
pip install pycuda

After the installation of the samples has completed, an assortment of C++ and Python-based samples will be located in the $CONDA_PREFIX/samples/tensorrt/samples directory. Users can optionally set an environment variable to point to the TensorRT samples install location. Setting the variable TRT_SAMPLE_ROOT will enable the examples to find the default data location $CONDA_PREFIX/samples/tensorrt/sample/data without passing the -d parameter to the sample.

Run this command to set the environment variable:
export TRT_SAMPLE_ROOT=$CONDA_PREFIX/samples/tensorrt/

C++ Samples

Every C++ sample includes a README.md file. Refer to the $CONDA_PREFIX/samples/tensorrt/samples/<sample-name>/README.md file for detailed information about how the sample works, sample code, and step-by-step instructions about how to run and verify its output.

In addition to the README files, there are online descriptions of each C++ sample on the NVIDIA website.

The tensort-samples package includes pre-compiled binaries for each of the C++ examples. During install, these executable examples will be installed and available to run from the command line.

Running the C++ Samples

For best results, run the samples from the tensorrt/samples root directory.

Example: Activate the conda environment and run the sample_mnist executable:

conda activate my-py3-env
export TRT_SAMPLE_ROOT=$CONDA_PREFIX/samples/tensorrt
cd $TRT_SAMPLE_ROOT
sample_mnist

Compiling the C++ Samples

The source code for each of the samples can be found in the tensorrt samples root directory.

cd $TRT_SAMPLE_ROOT/samples

The required compiler and makefile adjustments have been made to enable the samples to be easily compiled in the WML CE environment.

To compile a sample, cd to the sample source location and run make:
cd $TRT_SAMPLE_ROOT/samples/<sample-name>
make -j 20
To compile all the samples, cd to the samples source location and run make:
cd $TRT_SAMPLE_ROOT/samples
make -j 20

Additional information for working with the C++ API

For additional information, refer to this document from NVIDIA: Working With TensorRT Using The C++ API.

Python Samples

Every Python sample includes a README.md file. Refer to the $CONDA_PREFIX/samples/tensorrt/python/<sample-name>/README.md file for detailed information about how the sample works, sample code, and step-by-step instructions on how to run and verify its output.

In addition to the readme files, an online description of the python samples can be found on the NVIDIA website Python Samples.

Find additional information for working with the Python API at Using The Python API.

Known Issues

The WML CE team is aware of the following issues:

In TensorFlow, when converting a saved model for use with TensorRT, the following warning is issued because internally TensorFlow calls the TensorRT optimizer for certain objects unnecessarily. This warning can be ignored:

`W tensorflow/core/framework/op_kernel.cc:1676] OP_REQUIRES failed at trt_engine_resource_ops.cc:183 : 
Not found: Container TF-TRT does not exist. (Could not find resource: TF-TRT/TRTEngineOp_0)`