GPU Denoiser with very high performance for still images and video

Image/video denoising is widely used in many camera applications, especially for tasks with low-light illumination. We have developed several GPU-accelerated denoise kernels that run on existing NVIDIA hardware on Windows/Linux/ARM. We've achieved very high performance for both image and video processing.

GPU Denoiser Library Features

  • Input format: 8/10/12/14/16-bit per channel input data array from CPU or GPU memory
  • Output format: 24/48-bit output data array in CPU or GPU memory
  • Denoising with 16/32-bit accuracy
  • High speed denoising without AI
  • Denoising algorithms
    • Wavelet denoiser (raw and rgb) CDF 5/3 and CDF 9/7 with Hard, Soft, Garrote thresholding
    • Bilateral denoiser
    • NLM denoiser
  • Compatibility with FastVCR software for machine vision cameras
  • Timing and performance measurements
  • OS Windows-10/11, Linux Ubuntu and L4T (Jetson Nano, TX2, NX, Xavier, Orin)
  • Compatibility with NVIDIA GPUs (Jetson, GeForce, Quadro), cc >=5.0, CUDA-12.6
gpu denoiser

Benchmarks for GPU Denoiser

Image resolution: 4112×2176 (8.9 MPix), 16-bit per channel, RGB/RGGB

Test description: all data in GPU memory, timing includes GPU computations only

2D Wavelet transform: CDF 9/7
Number of DWT resolutions: up to 7
DWT thresholds for YCbCr: 80;150;150

NLM denoiser parameters: blur window 3×3 and more, search window 3×3 and more, strength 1-3000
That algorithm could work with internal 4:4:4 or 4:2:0 subsampling
NLM could also have independent denosing parameters for Y and Cb/Cr channels for 4:2:0 and 4:4:4 subsampling modes

NLM denoiser parameters for testing: blur window 3×3, search window 5×5, strength 500
Bilateral denoiser parameters for testing: diameter 3, sigmaColor 5, sigmaSpace 500

Software: OS Windows-10/11, CUDA-12.6
Hardware: NVIDIA GeForce RTX 4090

  • RAW DWT denoiser - 1.8 ms (4.9 GPix/s)
  • DWT denoiser (YCbCr, 4:4:4) - 3.05 ms (2.9 GPix/s)
  • NLM denoiser (RGB) - 0.19 ms (40 GPix/s)
  • NLM denoiser (YCbCr, 4:2:0) - 0.20 ms (40 GPix/s)
  • NLM denoiser (YCbCr, 4:4:4) - 0.37 ms (21 GPix/s)
  • Bilateral denoiser (RGB) - 0.13 ms (61 GPix/s)

The results above show super fast performance and are much faster than the processing time of our best MG debayer algorithm, which is about 0.6 ms (13 GPix/s) for the same image on this GPU. Our denoisers used to be much slower than this demosaicing algorithm.

We have developed this software as part of our GPU Image & Video Processing SDK. Now our customers can use these GPU-accelerated denoisers in their applications as part of their image processing pipeline.

Testing

To test our GPU denoiser, please download Fast VCR software which is capable of working not only with machine vision cameras in real time, but also with RAW or PGM images from SSD. This is a real test to evaluate image quality and performance.

This is the direct link to download trial software for Windows-10/11: Fast VCR software.

It works on CUDA-12.6, so please install the latest NVIDIA driver before testing.

GPU-based denoising roadmap

  • Acceleration of the Bilateral denoiser - done
  • Noise profile calibration and implementation for raw denoising - in progress
  • Temporal denoiser on the GPU - in progress

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.