Fastvideo SDK benchmarks on NVIDIA Quadro RTX 6000

Author: Fyodor Serzhenko

Fastvideo SDK for Image and Video Processing on NVIDIA GPU offers super fast performance and high image quality. Now we've done testing of Fastvideo SDK on NVIDIA® Quadro RTX™ 6000 which is powered by the NVIDIA Turing™ architecture and NVIDIA RTX™ platform. That new technology brings the most significant advancement in computer graphics in over a decade to professional workflows. That new hardware is intended to boost the performance of image and video processing dramatically. To check that, we've done benchmarks for mostly frequently utilized image processing features.

 

nvidia quadro rtx 6000 benchmarks on fastvideo sdk

 

We've done time measurements for most frequently used image processing algorithms like demosaic, resize, denoise, jpeg encoder and decoder, jpeg2000, etc. This is just a small part of Fastvideo SDK modules, though they could be valuable to understand the performance speedup on the new hardware.

To evaluate more complicated image processing pipelines we would suggest to download and to test Fast CinemaDNG Processor software which is based on Fastvideo SDK. With that software you will be able to create your own pipeline and to check the benchmarks for your images.

How we do benchmarking

As usual, performance benchmarks can just give an idea about the speed of processing, though exact values depend on OS, hardware, image content, resolution and bit depth, processing parameters, an approach of time measurements, etc. The origin of the particular image processing task could imply any specific type of benchmarking.

To get maximum performance for any GPU software, we need to ensure maximum GPU occupancy, which is not easy to accomplish. That's why we could evaluate max performance by the following ways:

  • Repetition for particular function to get averaged computation time
  • Multithreading with copy/compute overlap
  • Software profiling on NVIDIA Visual Profiler to get total GPU time for all kernels for particular image processing module

Hardware and software

  • CPU Intel Core i7-5930K (Haswell-E, 6 cores, 3.5–3.7 GHz)
  • GPU NVIDIA Quadro RTX 6000
  • OS Windows 10 (x64), version 1803
  • CUDA Toolkit 10
  • Fastvideo SDK 0.14.0

Demosaicing benchmarks

In the Fastvideo SDK we have three different GPU-based demosaicing algorithms at the moment:

  • HQLI - High Quality Linear Interpolation, window 5×5
  • DFPD - Directional Filtering and a Posteriori Decision, window 11×11
  • MG - Multiple Gradients, window 23×23

All these algorithms are implemented for 8-bit and 16-bit workflows, and they take into account pixels new image borders. To demonstrate the performance, we imply that initial and processed data reside in GPU memory. This is the case for complicated pipelines in raw image processing applications.

Demosaicing algorithm 2K (1920 × 1080) 4K (3840 × 2160)
HQLI (8-bit) 30,000 fps 9,300 fps
HQLI (16-bit) 13,000 fps 4,400 fps
DFPD (8-bit) 12,600 fps 4,700 fps
DFPD (16-bit) 7,100 fps 2,700 fps
MG (16-bit) 3,400 fps 1,200 fps

To check image quality for each demosaicing algorithm in real case, you can download Fast CinemaDNG Processor software from www.fastcinemadng.com together with sample DNG image series for evaluation.

JPEG encoding and decoding benchmarks

JPEG codec from Fastvideo SDK offers very high performance both for encoding and decoding. To get better results, we need to have more data to achieve maximum GPU occupancy. This is very important issue to get good results. Here we present results for the best total kernel time for JPEG encoding and decoding. JPEG compression quality q=90%, subsampling 4:2:0 (visually lossless compression), optimum number of restart markers.

  JPEG Encoding JPEG Decoding
2K (1920 × 1080) 3,500 fps 1,380 fps
4K (3840 × 2160) 1,900 fps 860 fps
5K (5320 × 3840) 1,100 fps 520 fps

JPEG2000 encoding benchmarks

We have high performance JPEG codec on GPU in the Fastvideo SDK and this is the algorithm which is partially utilizing CPU, so total performance is also CPU-dependent, but still it's much faster than any CPU-based J2K codecs like OpenJPEG. In the tests we utilized optimal number of threads, compression ratio corresponded to visually lossless compression.

JPEG2000 encoding parameters Lossy encoding Lossless encoding
2K image, 24-bit, cb 32×32 504 fps 281 fps
4K image, 24-bit, cb 32×32 160 fps 85 fps
8K image, 24-bit, cb 32×32 56 fps 23 fps

Image resize

This is frequently utilized feature and here we present our results for GPU-based resize according to Lanczos algorithm.

"1/2 resolution" means 960 × 540 for 2K and 1920 × 1080 for 4K.
"1 pixel" means 1919 × 1079 for 2K and 3839 × 2159 for 4K.

Resize BMP/PPM 1/2 resolution 1 pixel
2K image, 24-bit 4,200 fps 3,300 fps
4K image, 24-bit 1,700 fps 1,120 fps

 

Apart from that, we have done benchmarks for the following pipeline: jpeg decoding - resize - jpeg encoding, which is utilized in web applications.

Decode JPEG - Resize - Encode JPEG 1/2 resolution 1 pixel
2K jpg image, 24-bit 996 fps 845 fps
4K jpg image, 24-bit 586 fps 425 fps

 

To summarize, Fastvideo SDK benchmarks are quite fast, though we can see possibilities to make them better by further optimization of our CUDA kernels for Turing architecture.

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.