GPU Denoiser with very high performance for still images and video

Image/video denoising is widely used in many camera applications, especially for tasks with low-light illumination. We have developed several GPU-accelerated denoise kernels which run on existing hardware from NVIDIA on Windows/Linux/ARM. We've got very high performance both for image and video processing.

GPU Denoiser Library Features

  • Input format: 8/10/12/14/16-bit per channel input data array from CPU or GPU memory
  • Output format: 24/48-bit output data array in CPU or GPU memory
  • Denoising with 16/32-bit accuracy
  • High speed denoising without AI
  • Denoising algorithms
    • Wavelet denoiser (raw and rgb) CDF 5/3 and CDF 9/7 with Hard, Soft, Garrote thresholding
    • Bilateral denoiser
    • NLM denoiser
  • Compatibility with FastVCR software for machine vision cameras
  • Timing and performance measurements
  • OS Windows-10/11, Linux Ubuntu and L4T (Jetson Nano, TX2, NX, Xavier, Orin)
  • Compatibility with NVIDIA GPUs (Jetson, GeForce, Quadro, Tesla), cc >=5.0, CUDA-12.6
gpu denoiser

Benchmarks for GPU Denoiser

Image resolution: 4112×2176 (8.9 MPix), 16-bit per channel, RGB

Test description: all data in GPU memory, timing includes GPU computations only

2D Wavelet transform: CDF 9/7
Number of DWT resolutions: up to 7
DWT thresholds for YCbCr: 80;150;150

NLM denoiser parameters: blur window 3×3 and more, search window 3×3 and more, strength 1-3000
That algorithm could work with internal 4:4:4 or 4:2:0 subsampling
NLM could also have independent denosing parameters for Y and Cb/Cr channels for 4:2:0 and 4:4:4 subsampling modes

NLM denoiser parameters: blur window 3×3, search window 5×5, strength 500
Bilateral denoiser parameters: diameter 3, sigmaColor 5, sigmaSpace 500

Software: OS Windows-11, CUDA-12.6
Hardware: NVIDIA GeForce RTX 4090

  • RAW DWT denoiser - 1.8 ms (4.9 GPix/s)
  • DWT denoiser (YCbCr, 4:4:4) - 3.05 ms (2.9 GPix/s)
  • NLM denoiser (RGB) - 1.44 ms (6.2 GPix/s)
  • NLM denoiser (YCbCr, 4:2:0) - 0.93 ms (9.5 GPix/s)
  • NLM denoiser (YCbCr, 4:4:4) - 1.64 ms (5.4 GPix/s)
  • Bilateral denoiser (RGB) - 1.21 ms (7.3 GPix/s)

The above results show super fast performance and they are comparable with the processing time of our best MG debayer algorithm which is around 1.05 ms (8.5 GPix/s) for the same image on that GPU. Our denoisers used to be much slower than demosaicing algorithms.

We have designed that software as a part of our GPU Image & Video Processing SDK. Now our customers have opportunity to utilize these GPU-accelerated denoisers in their applications as a part of their image processing pipeline.

Testing

To test our GPU denoiser, please download Fast VCR software which is capable of working not only with machine vision cameras at real time, but also with RAW or PGM images from SSD. This is a real test to evaluate image quality and performance.

This is the direct link to download trial software for Windows-10/11: Fast VCR software.

It's working on CUDA-12.6, so please install the latest NVIDIA driver before testing.

GPU-based denoising roadmap

  • Acceleration of Bilateral denoiser - in progress
  • Temporal denoiser on the GPU - in progress

P.S. The latest version of our NLM denoiser is reaching the performance of 30 GPix/s on the NVIDIA GeForce RTX 4090 for 12 MPix, color 16-bit image (blur window 3×3 and search window 3×3).

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.