3D LUT on NVIDIA GPU

3D LUT Transform is massively used for color grading and toning applications. To solve the task of 3D LUT grading, we have developed corresponding kernels that run on existing GPU hardware from NVIDIA. We have implemented various formats for 3D LUTs and achieved very high performance for color grading.

3DLUT Transform Features

  • Input data up to 16-bit with arbitrary width and height
  • 2.5D and 3D LUT formats: cube
  • Color representation: RGB, HSV
  • Standard color cube resolution 65×65×65 (up to 256)
  • Compatibility with Fastvideo Image & Video Processing SDK

Hardware and software

  • CPU Intel Core i7-5930K (Haswell-E, 6 cores, 3.5–3.7 GHz)
  • GPU NVIDIA GeForce GTX 1080 (Pascal, 20 SMM, 2560 cores, 1.6–1.7 GHz)
  • OS Windows 10 (x64)
  • CUDA Toolkit 10.1

Performance of 2.5D and 3D LUT Transforms on GPU

Test images: 16-bit RGB, 2432×1366 (2.5K) and 4032×2192 (4K), fast trilinear interpolation
Test info: all data (input and output) in GPU memory, timing measurements include GPU computations only, timings for 2.5K/4K images

  • 2.5D LUT (HSV, 90×30 points) – 0.26 ms / 0.64 ms
  • 2.5D LUT (HSV, 90×117 points) – 0.26 ms / 0.65 ms
  • 3D LUT (HSV, 36×8×8 points) – 0.29 ms / 0.65 ms
  • 3D LUT (HSV, 36×29×16 points) – 0.31 ms / 0.68 ms
  • 3D LUT (HSV, 36×57×61 points) – 0.44 ms / 0.77 ms
  • 3D LUT (RGB, 17×17×17) – 0.22 ms / 0.55 ms
  • 3D LUT (RGB, 33×33×33) – 0.22 ms / 0.56 ms
  • 3D LUT (RGB, 65×65×65) – 0.30 ms / 0.60 ms

We have designed that software as a part of our GPU image & video processing SDK. Now our customers have opportunity to use fast 3D LUT transforms on NVIDIA GPU in their realtime color grading applications.

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.