RTX On - The Nvidia Turing Gpu
RTX On - The Nvidia Turing Gpu
GPU
TENSOR CORE FOR ACCELERATED DEEP
LEARNING
• Integration of Tensor Core into the SM subcore
• Real-time deep learning inference on a consumer CPU
• Acceleration of matrices multiplication (base of deep learning)
-Pascal: serial computation
-Turing Tensor Core: parallel computation across tiles of streamed data
• Throughput: 114 TFLOPS of FP16
228 TOPS 8-bit integer math GeForce GTX 2080 Ti GPU
455 TOPS 4-bit integer math
• Multi-thread collaborative matrix math operation (4-8 clocks)
-Transparent sharing data across threads
=>Saving of threads and memory bandwidth
• Maximum algorithmic flexibility: alongside matrix operations:
-Different activation functions, Batch normalization variants
• Exploitation of the large capacity and bandwidth of the register file and
shared memory
• Tesla T4 datacenter product: flexible acceleration of all AI workloads
-Up to 5X faster than Tesla P4 solution on DeepSpeech2
-Up to 36X faster than CPU-based solutions for natural language processing
• The combination of workloads provides higher level solutions
• Rendering technique
• Reverse tracing process:
View camera –>2D viewing plane (pixel plane) –>3D scene –>back to light sources
• Final color and illumination levels of pixels:
Contribution of incoming light at the point of intersection and of the surface
properties of the object
END OF PRESENTATION