Fast RAW Compression on GPU

Author: Fyodor Serzhenko

Recording performance for RAW data acquisition task is essential issue for 3D/4D, VR and other camera applications. Quite often we need to do realtime recordings to portable SSD and here we face questions about throughput, compression ratio, image quality, recording duration, etc. As soon as we need to store RAW data from a camera, the general approach for raw image encoding is not exactly the same as for color. Here we review several methods to solve that matter.

Why do we need Raw Image Compression on GPU?

We need to compress raw stream from a camera (industrial, machine vision, digital cinema, scientific, etc.) in realtime at high fps, for example 4K (12-bit raw data) at 60 fps, 90 fps or faster. This is vitally important issue for realtime applications, external raw recorders and for in-camera raw recordings. As an example we can consider RAW or RAW-SDI format to send data from a camera to PC or to external recorder.

As soon as most of modern cameras have 12-bit dynamic range, it's a good idea to utilize JPEG compression which could be implemented for 12-bit data. For 14-bit and 16-bit cameras (including photo cameras) this is not the case and for high bit depth cameras we would recommend to utilize either Lossless JPEG encoding or lossless/lossy JPEG2000. These algorithms are not super fast, but they can process high bit depth data.

Lossy methods to solve the task of Fast RAW Compression

Standard 12-bit JPEG encoding for raw images
Optimized 12-bit JPEG encoding (double width, half height, Standard 12-bit JPEG encoding for grayscale images)
Standard JPEG2000 encoding
Raw Bayer encoding (split RGGB pattern to 4 planes and then apply 12-bit JPEG or JPEG2000 encoding for each plane)

The problem with Standard JPEG or JPEG2000 algorithms for RAW encoding is evident - we don't have slowly varying changes in pixel values at the image and this could cause problems with image quality due to Discrete Cosine Transform (Discrete Wavelet Transform) which is the part of JPEG (JPEG2000) algorithm. In that case the main idea of JPEG/JPEG2000 compression is questionable and we expect to get higher level of distortion for RAW images with JPEG/JPEG2000 compression.

The idea about "double width" is also well-known. It's working well at Lossless JPEG compression for RAW bayer data. After such a transform we get the same colors for vertical pixel neighbours for two adjacent rows and it could decrease high-frequency values after DCT for Standard JPEG. That method is also utilized in Blackmagic Design BMD RAW 3:1 and 4:1 formats.

If we split RAW image into 4 planes according to available bayer pattern, we get 4 downsized images, one for each bayer component. Here we can get slowly varying intensity, but for images with halved resolution. That algorithm looks promising, though we could expect slightly slower performance becase of additional split algorithm in the pipeline.

We focus on JPEG-based methods as soon as we have high performance solution for JPEG codec on CUDA. That codec is capable of working with all range of NVIDIA GPUs: mobile Jetson Nano, TK1/TX1/TX2, AGX Xavier, laptop/desktop GeForce series and server GPUs Quadro and Tesla. That codec also supports 12-bit JPEG encoding which is the key algorithm for that RAW compression task.

There is also an opportunity to apply JPEG2000 encoding instead of JPEG for all three cases, but here we will consider JPEG only because of the following reasons:

JPEG encoding on GPU is much faster than JPEG2000 encoding (approximately ×20)
Compression ratio is almost the same (it's bigger for J2K, but not too much)
There is a patent from RED company to implement J2K encoding for splitted channels inside the camera

There are no open patent issues connected with JPEG algorithm and this is serious advantage of JPEG. Nevertheless, the case with JPEG2000 compression is very interesting and it could be applied at PC, not in a camera. That approach could give us GPU lossless raw image compression, which can't be done with JPEG.

To solve the task of RAW image compression, we need to specify both metric and criteria to measure image quality losses. We will try SSIM which is considered to be much more reliable in comparison with PSNR and MSE. SSIM means structural similarity and it's widely used to evaluate image resemblance. This is well known image quality metric.

Quality and Compression Ratio measurements

To find the best solution among chosen algorithms we have done some tests to calculate Compression Ratio and SSIM for standard values of JPEG Quality Factor. We've utilized the same Standard JPEG quantization table and the same 12-bit RAW image. As soon as Compression Ratio is content-dependent, this is just an example of what we could get in terms of SSIM and Compression Ratio.

For the testing we've utilized uncompressed RAW bayer image from Blackmagic Design URSA camera with resolution 4032×2192, 12-bit. Compression Ratio was measured with relation to the packed uncompressed 12-bit image file size, which is equal to 12.6 MB, where two pixel values are stored in 3 Bytes.

Output RGB images were created with Fast CinemaDNG Processor software. Output colorspace was sRGB, 16-bit TIFF, no sharpening, no denoising. SSIM measurements were performed with these 16-bit TIFF images. Source image was compared with the processed image, which was encoded and decoded with each compression algorithm.

Table 1: Results for SSIM for encoding with standard JPEG quantization table

	Q = 50	Q = 60	Q = 70	Q = 80	Q = 90	Q = 95	Q = 100
SSIM - Standard JPEG	0.637	0.654	0.667	0.676	0.685	0.690	0.692
SSIM - JPEG with 2W	0.715	0.716	0.714	0.709	0.706	0.708	0.710
SSIM - RAW Bayer JPEG	0.724	0.722	0.718	0.712	0.706	0.708	0.710

These results show that SSIM metrics is not really suitable for such tests. According to visual estimation, we can conclude that image quality Q = 80 and higher could be considered acceptable for all three algorithms, but the images from the third algorithm look better.

Table 2: Compression Ratio (CR) for encoding with standard JPEG quantization table

	Q = 50	Q = 60	Q = 70	Q = 80	Q = 90	Q = 95	Q = 100
CR - Standard JPEG	4.75	4.29	3.77	3.26	2.62	2.15	1.46
CR - JPEG with 2W	7.16	6.44	5.54	4.46	3.21	2.48	1.65
CR - RAW Bayer JPEG	9.7	8.24	6.63	5.08	3.49	2.64	1.75

Performance for RAW encoding is the same for the first two methods, though for the third it's slightly less (performance drop is around 10-15%) because we need to spend additional time to split raw image to 4 planes according to the bayer pattern. Time measurements have been done with Fastvideo SDK for different NVIDIA GPUs. These are hardware-dependent results and you can do the same measurements for your particular NVIDIA hardware.

How to improve image quality, compression ratio and performance

There are several ways to get even better results in terms of image quality, CR and encoding performance for RAW compression:

Image sensor calibration
RAW image preprocessing: dark frame subtraction, bad pixel correction, white balance, LUT, denoise, etc.
Optimized quantization tables for 12-bit JPEG encoding
Optimized Huffman tables for each frame
Minimum metadata in JPEG images
Multithreading with CUDA Streams to get better performance
Better hardware from NVIDIA