Machine Vision Software on GPU

Author: Fyodor Serzhenko

Cameras for machine vision, industrial and scientific applications are quite widespread nowadays. These solutions utilize USB3, GigE, 5GigE, 10GigE, 25GigE, 50GigE, 100GigE, CameraLink, CameraLink HS, Coax, Thunderbolt, PCI-Express interfaces to send data from cameras to PC. Usually cameras transfer RAW data and we need to implement quite complicated image processing pipeline to convert RAW to RGB in realtime. This is computationally heavy task, especially for high speed and high data rate cameras.

Realtime image processing for machine vision cameras could be done on Intel/AMD CPUs, but that solution is difficult to speedup further, which is the case for multicamera systems. To overcome such a bottleneck, one could implement simplified algorithms on CPU to be on time, but it can't be a good solution. Most of high quality algorithms are slow even at multicore CPUs. The slowest algorithms for CPU are demosaicing, denoising, color grading, undistortion, resize, rotation to an arbitrary angle, compression, etc.

To solve that task we utilize Fastvideo SDK which is working on NVIDIA GPU. With that SDK we can implement a software solution where full image processing pipeline is done on graphics processing unit (GPU). In that case the software architecture is much more simple, because processing part is fully done on GPU and it doesn't interfere any more with CPU-based features.

Fast CinemaDNG Processor software

As a good example of high performance and high quality raw image processing we could recommend Fast CinemaDNG Processor software for Windows, where all computations are done on NVIDIA GPU and its core is based on Fastvideo SDK engine. With that software we can get high quality image processing according to digital cinema workflow. One should note that Fast CinemaDNG Processor is offering image quality which is comparable to the results of raw processing at Raw Therapee, Adobe Camera Raw and Lightroom Photo Editor software, but significantly faster. Total speedup could be estimated as 10-20 times or even more.

To check GPU-based performance for your machine vision camera on Fast CinemaDNG Processor software, we can convert RAW images to DNG format with our open source PGM2DNG converter, which could be downloaded from Github. After such a convertion the software will be able to work both with digital cinema cameras like Blackmagic Design and with machine vision / industrial cameras.

GPU Camera Sample Project

Here we will show how to implement a software with GPU image processing pipeline for any machine vision camera. To accomplish the task we need the following:

Camera SDK (XIMEA, Balluff, Basler, FLIR, JAI, Lucid Vision Labs, Daheng Imaging, Imperx, Baumer, etc.) for Windows
Optional GenICam package with camera vendor GenTL producer (.cti)
Fastvideo SDK (demo) ver.0.18.0.0 for Windows
NVIDIA CUDA-12.1 for Windows and the latest NVIDIA driver
Qt ver.5.13.1 for Windows
Compiler MSVC 2022

Source codes and links to supplementary libraries for GPU Camera Sample project you can find on Github.

As a starting point we've implemented the software to capture raw images from XIMEA cameras. We have utilized sample application from XIMEA SDK and we have incorporated that code into our software. In the software we set default camera parameters to focus on GPU-based image processing, though you can add any GUI to control camera parameters as well.

You can download binaries for Windows to work with XIMEA cameras or with raw images in PGM format with GPU image processing.

Simple image processing pipeline on GPU for machine vision applications

Raw image capture (8-bit, 12-bit packed/unpacked, 16-bit, monochrome or bayer)
Import to GPU
Raw data convertion and unpacking
Linearization curve
Bad pixel removal
Dark frame subtraction
Flat-field correction
White Balance
Exposure correction (brightness control)
Debayer with HQLI (5×5 window), DFPD (11×11), MG (23×23) algorithms
Wavelet-based denoising
Gamma
JPEG compression
Output to monitor with minimum latency
Export from GPU to CPU memory
Storage of compressed data to SSD or streaming via FFmpeg RTSP

It's possible to modify that image processing pipeline according to your needs, as soon as source codes are available. There are much more image processing options in the Fastvideo SDK to implement at such a software for GPU-based image processing in camera applications.

The software has the following architecture

Thread for GUI and visualization (app main thread)
Thread for image acquisition from a camera
Thread to control CUDA-based image processing
Thread for OpenGL rendering
Thread for async data writing to SSD or streaming

With that software one could also build a multi-camera solution for any machine vision or industrial cameras with image processing on NVIDIA GPU. In the simplest case, user can run several processes (one per camera) at the same time to accomplish the task. In more sofisticated approach it would be better to create one image loader which will collect frames from different cameras for further processing on GPU.

There is also an opportunity to utilize different compression options on GPU at the end of the pipeline. We can utilize JPEG (Motion JPEG), JPEG2000 (MJ2K), H.264 and H.265 encoders on GPU. Please note that H.264 and H.265 are implemented via hardware-based NVIDIA NVENC encoder and that video encoding could be done in parallel with CUDA code.

From the benchmarks on NVIDIA GeForce RTX 2080ti we can see that GPU-based raw image processing is very fast and it could offer very high quality at the same time. The total performance could reach 4 GPix/s. The performance strongly depends on complexity of the pipeline. Multiple GPU solutions could significanly improve the performance.

The software could also work with raw images in PGM format (bayer or grayscale) which are stored on external SSD. This is a good method for software evaluation and testing without a camera. User can download the source codes from Github or ready binaries for Windows from GPU Camera Sample project page.

We've done some tests with raw frames in PGM format from Gpixel GMAX 3265 image sensor. That image sensor has resolution 9433 × 7000. For 8-bit mode we've got total processing time on GPU around 12 ms which is more than 4 GPix/s. The pipeline includes data copy from host to device, dark frame, FFC, linearization, BPC, white balance, debayer L7, gamma sRGB, 16/8-bit transform, JPEG compression (subsampling 4:2:0, quality 90), viewport texture copy, monitor output at 30 fps. The same solution for 12-bit raw image could give us 14 ms processing time.

Here we can see absolutely incredible performance for JPEG encoding on NVIDIA GPU: 65 MPix color image (24-bit) could be compressed within 3.3 ms on NVIDIA GeForce RTX 2080ti.

We recommend to utilize that software as a testing tool to evaluate image quality and performance. User can also test different NVIDIA GPUs to choose the best hardware in terms of price, quality and performance for a particular task.

Glass-to-Glass Time Measurements

To check system latency we've implemented the software to run G2G tests in the gpu-camera-sample application.

We have the following choices for G2G tests:

Camera captures frames with the info about current time from high resolution timer at the monitor, we send data from camera to the software, do image processing on GPU and then show processed image at the same monitor close to the window with the timer. If we stop the software, we see one the screen two different times and their difference is system latency.
We have implemented more compicated solution: after image processing on GPU we've done JPEG encoding (MJPEG on CPU or on GPU), then send MJPEG stream to receiver process, where we do MJPEG parcing and decoding, then frame output to the monitor. Both processes (sender and receiver) are running at the same PC.
The same solution as in the previous approach, but with H.264 encoding/decoding (CPU or GPU), both processes are at the same PC.

We can also measure the latency for the case when we stream compressed data from one PC to another over network. Latency depends on camera frame rate, monitor fps, NVIDIA GPU performance, network bandwidth, complexity of image processing pipeline, etc.

Custom software design

GPU Camera Sample project is just a simple application which shows how user can quickly integrate Fastvideo SDK into real project with machine vision, industrial and scientific cameras. We can create custom solutions with specified image processing pipeline on NVIDIA GPU (mobile, laptop, desktop, server), which are much more complicated in comparison with that project. We can also build custom GPU-based software to handle multicamera systems for machine vision or industrial applications.

Roadmap for GPU Camera Sample project

GPU pipeline for monochrome cameras - done
GenICam Standard support - done
Support for XIMEA, Balluff, Basler, FLIR, JAI and Daheng Imaging cameras (USB3 and PCIe) - done
Video streaming option (MJPEG via RTSP) - done
Linux version - done
Software for NVIDIA Jetson Nano, TX2, AGX Xavier - done
Glass-to-Glass (G2G) test - done
H.264/H.265 encoders on GPU - done
Support for Imperx and Lucid Vision Labs - done
Support for EVT, IDS, Baumer, FLIR, IOI, Mikrotron cameras - in progress
GenICam option for Fast CinemaDNG Processor
CUDA JPEG2000 encoder
Transforms to Rec.601 (SD), Rec.709 (HD), Rec.2020 (4K)
3D LUT for HSV and RGB with cube size 17, 33, 65, 96, 256, etc.
Interoperability with external FFmpeg and gStreamer

References

Fast VCR software for XIMEA cameras (realtime raw image processing on NVIDIA GPU with integrated camera control)
Source codes and binaries for Windows, Linux and L4T for GPU Camera Sample project
Ximea SDK
Fastvideo SDK for Image & Video Processing
Fast CinemaDNG Processor software