Tesla P4 And P40 GPUs Boost Deep Learning Inference Performance With INT8, TensorRT Support

nvidia-tesla-p40

Nvidia continues to beat on deep learning GPUs with the release of two new “inference” GPUs, the Tesla P4 and the Tesla P40. The pair are the 16nm FinFET direct successors to Tesla M4 and M40, with much improved performance and support for 8-bit (INT8) operations.

Deep learning consists of two steps: training and inference. For training, it can take billions of TeraFLOPS to achieve an expected result over a matter of days (while using GPUs). For inference, which is the running of the trained models against new data, it can take billions of FLOPS, and it can be done in real-time.

nvidia-deeplearning

Unlike the Pascal-based Tesla P100, which comes with support for the already quite low 16-bit (FP16) precision, the two new GPUs bring support for the even lower 8-bit INT8 precision. This is because the researchers have discovered that you don’t need especially high precision for deep learningtraining.

The expected results will appear significantly faster if you use twice as much data with half the precision. Because inference operates on already-trained data, even less precision is needed than for training, which is why Nvidia’s new cards now have support for INT8 operations.

Tesla P4

The Tesla P4 is the lower-end GPU from the two that were announced, and it’s targeted at scale-out servers that want highly-efficient GPUs. Each Tesla P4 GPU uses between 50W and 75W of power, for a peak performance of 5.5 (FP32) TeraFLOP/s and 21.8 INT8 TOP/s (Tera-Operations per second).

gpu-fpga-cpu

Nvidia compared its Tesla P4 GPU to an Intel Xeon E5 general purpose CPU and alleged that the P4 is up to 40x more efficient on the AlexNet image processing test. The company also claimed that the Tesla P4 is 8x more efficient than an Arria 10-115 FPGA (made by Altera, which Intel acquired).

Tesla P40

The Tesla P40 was designed for scale-up servers, where performance matters most. Thanks to improvements in the Pascal architecture as well as the jump from the 28nm planar process to a 16nm FinFET process, Nvidia claimed that the P40 is up to 4x faster than its predecessor, the Tesla M40.

The P40 GPU has a peak performance of 12 (FP32) TeraFLOP/s and 47 TOP/s, so it’s about twice as fast as its little brother, the Tesla P4. Tesla P40 has a maximum power consumption of 250W.

DeepStream SDK

Nvidia also announced the DeepStream SDK, which can utilize a Pascal-based server to decode and analyze up to 93 HD video streams in real time. According to Nvidia, this will allow companies to understand video at scale for applications such as self-driving cars, interactive robots, and filtering and ad placement.

Partnership With Coursera, Udacity, and Microsoft

Nvidia’s Deep Learning Institute, which offers online courses and workshops around the world for using deep learning to solve real problems, partnered with Coursera and Udacity to make its courses available to more people.

The courses include teaching people how to become self-driving car engineers and using deep learning to predict risk of a disease. Thanks to a partnership with Microsoft, there will also be a workshop for teaching robots how to think through the use of deep learning.