Benchmarks for JPEG2000 encoders on CPU and GPU

Benchmarks JPEG2000 Encoders are important to see the difference between CPU-based and GPU-based codecs. Here we present a comparison for available open source and proprietary J2K encoding software. Some of them are CPU-only, while the others use GPU to accelerate JPEG2000 computations.

Approaches for JPEG2000 performance measurements

There are two standard approaches to performance measurements of JPEG2000 codecs, which utilize GPU. They correspond to the two most common use cases for J2K encoders and decoders.

1. Single image mode consists in processing of single image and could be called "latency-oriented" or "low latency" approach. In that case the time interval (latency) between availability of original image in RAM and availability of the processed image in RAM is measured. It means that software cannot expect that any additional images will be processed at the same time and therefore cannot take advantage of multiple image encoding or decoding. Overlapping of current image processing with other activities is undesirable because it would increase total latency. We need single image mode almost in all camera applications. You can get more info from our Image & Video Processing SDK.

2. Batch mode consists in processing of batch of images and could be called "throughput-oriented" or "maximum performance". In that case frame rate becomes more important than latency. It is calculated via division of the total time of processing by the number of processed images. Some JPEG2000 codecs are optimized for this use case, meaning that exploiting of task parallelism leads to better frame rate (throughput) at the expense of increased processing time for separate images. It is possible, because we actually have three devices (CPU, GPU and bus interface between them), which can be used simultaneously in that mode, whereas at single image mode these devices are used sequentially for different stages of JPEG2000 algorithm. Moreover, GPU can process several images simultaneously to increase frame rate even more, if each image is too small to load a multitude of GPU cores (especially at Tier-1 stage). Important limitation for simultaneous processing of several images is imposed by amount of free GPU memory. Batch mode is a must for streaming applications when the pipeline contains J2K encoder or decoder. For more complicated workflow it could be better to utilize single image mode, though fps will be reduced.

jpeg2000 encoder benchmarks

Briefly, JPEG2000 batch mode can take into account specific methods of task parallelism, based on the following:

  • both upload to GPU and download from GPU could overlap with JPEG2000 processing on GPU (CUDA Streams)
  • Tier-1 and Tier-2 could be done in parallel: Tier-1 on GPU and multithreaded Tier-2 on CPU at the same time (this is also possible at single image mode)
  • multiple (batch) JPEG2000 processing to increase general GPU occupancy
  • multiple JPEG2000 processing at Tier-1 to improve GPU occupancy for that particular stage

CPU-based J2K applications have no explicit implementation of batch mode, because all processing stages are done on CPU and complete loading of available CPU cores can be achieved by simply running multiple decoders in separate processes. Multithreaded mode of CPU-based JPEG2000 decoders decreases latency of single image processing, so we can consider this mode as single image mode.

At the moment we don't consider here the following possible modes for J2K encoding on GPU:

  • multiple GPU mode
  • multiple tile mode for big images
  • fast parallel J2K processing with RESET, RESTART, CAUSAL and BYPASS modes

Results for all modes will be published as soon as their implementations are ready.

We don't hide anything concerning benchmarking procedures and achieved results. Thus, everyone can always reproduce our benchmarks, because we publish not only timing and performance, we supply full info about hardware, JPEG2000 parameters, test images and testing modes.

J2K encoder benchmarks

We've carried out time and performance measurements for JPEG2000 encoding for 24-bit images with 2K and 4K resolutions. All results don't include any host I/O latency (image loading to RAM from HDD/SSD and saving back) and we've also excluded host-to-device transfer time. We've done such an assumption to reproduce J2K encoder usage in our conventional image processing pipeline, when initial data reside in GPU memory. Results for GPU-based J2K encoder software also include Tier-2 time on CPU, because this stage in our implementation is performed on CPU. In the tables below, you can find averaged measurements results for the best series of 1000 encoded frames.

jpeg2000 encoderJPEG2000 encoding parameters

  • File format – JP2
  • Algorithm 1 – lossy JPEG 2000 compression with CDF 9/7 wavelet
  • Algorithm 2 – lossless JPEG 2000 compression with CDF 5/3 wavelet
  • Compression ratio (for lossy encoding) ~ 12.0 which corresponds to visually lossless encoding
  • Subsampling mode – 4:4:4
  • Number of DWT resolutions – 7
  • Codeblock size – 32×32
  • MCT – on
  • PCRD – off
  • Tiling – off
  • Window – off
  • Quality layers – one
  • Progression order – LRCP (L = layer, R = resolution, C = component, P = position)
  • Modes of operation – single or batch
  • 2K test image (24-bit) – 2k_wild.ppm
  • 4K test image (24-bit) – 4k_wild.ppm
  • This is JPEG2000 Viewer on GPU to check JP2 images

Hardware and software

  • CPU Intel Core i9-9960X
  • GPU NVIDIA GeForce RTX 2080TI
  • OS Windows 10 (x64), version 1803
  • CUDA Toolkit 10.2

JPEG2000 Encoders for comparison

  • OpenJPEG 2.3.1
  • Jasper 2.0.16
  • Kakadu JPEG2000 7.10.2
  • CUJ2K 1.1
  • Fastvideo JPEG2000 (SDK version 0.16.0.0)

JPEG2000 lossy encoding at single image mode for 2K image: 2k_wild.ppm (1920×1080, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second PSNR (dB) MSE Compression ratio Hardware
OpenJPEG (single thread) 271 ms 21.9 MB/s 3.7 fps 39.54 7.23 12.00 CPU
Jasper 407 ms 14.6 MB/s 2.5 fps 39.53 7.24 12.00 CPU
Kakadu 7.10.2 (single thread) 115 ms 51.6 MB/s 8.7 fps 39.44 7.39 12.00 CPU
Kakadu 7.10.2 (16 threads) 20 ms 297 MB/s 50.0 fps 39.44 7.39 12.00 CPU
CUJ2K Encoder 75 ms 79.2 MB/s 13.3 fps 35.60 17.9 12.00 GPU + CPU
Fastvideo JPEG2000 Encoder 4.06 ms 1360 MB/s 229 fps 39.50 7.29 12.01 GPU + CPU

JPEG2000 lossy encoding at single image mode for 4K image: 4k_wild.ppm (3840×2160, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second PSNR (dB) MSE Compression ratio Hardware
OpenJPEG (single thread) 1242 ms 19.1 MB/s 0.8 fps 45.10 2.01 12.02 CPU
Jasper 1746 ms 13.6 MB/s 0.6 fps 45.09 2.02 12.02 CPU
Kakadu 7.10.2 (single thread) 428 ms 55.4 MB/s 2.3 fps 44.78 2.16 12.00 CPU
Kakadu 7.10.2 (16 threads) 64 ms 371 MB/s 15.6 fps 44.78 2.16 12.00 CPU
CUJ2K Encoder 179 ms 133 MB/s 5.6 fps 41.42 4.69 12.05 GPU + CPU
Fastvideo JPEG2000 Encoder 11.9 ms 1994 MB/s 84.0 fps 45.08 2.02 12.04 GPU + CPU

MB/s – MegaBytes per second

Fig.1: Fastvideo J2K encoder performance on GeForce RTX 2080ti (lossy encoding, single image mode)

J2K encoder performance

From the above figure we can see the encoding speed (JPEG 2000 performance for lossy compression) as a function of image size for Fastvideo J2K encoder at single image mode. Maximum J2K encoder performance could be achieved with codeblock size 32×32 in most cases. For images with frame size more than 6 MB, preferred codeblock size is 32×32 at single image mode. It could also be seen that there is a performance saturation, which is dependent on image size for different codeblocks. This is the key point to get better results at batch mode. For 8K image compression with visually lossless parameters, performance saturation is reached for any codeblock size at single image mode.

Figure 1 shows that on NVIDIA GeForce GTX 2080TI it's possible to achieve important milestones at single image mode for visually lossless J2K encoding. For codeblocks 16×16 one can overcome 900 MB/s performance, for codeblocks 32×32 maximum performance exceeds 1300 MB/s, for codeblocks 64×64 maximum performance could reach 1100 MB/s. Performance saturation for codeblocks 16×16 occurs at 4K resolution for visually lossless compression.

Fig.2: Fastvideo J2K encoder performance as a function of compression ratio (lossy encoding, single image mode)

Fastvideo J2K encoder performance vs image compression ratio

Figure 2 shows Fastvideo JPEG 2000 encoder performance as a function of compression ratio for different image resolutions for lossy compression at single image mode at standard testing conditions as stated above.

Lossless JPEG2000 encoding at single image mode for 2K image: 2k_wild.ppm (1920×1080, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second Compression ratio Hardware
OpenJPEG (single thread) 652 ms 9.1 MB/s 1.5 fps 2.097 CPU
Jasper 807 ms 7.4 MB/s 1.2 fps 2.097 CPU
Kakadu 7.10.2 (single thread) 493 ms 12.0 MB/s 2.0 fps 2.097 CPU
Kakadu 7.10.2 (16 threads) 54 ms 110 MB/s 18.5 fps 2.097 CPU
CUJ2K encoder 98 ms 60.5 MB/s 10.2 fps 2.095 GPU + CPU
Fastvideo JPEG2000 encoder 6.03 ms 957 MB/s 161.3 fps 2.098 GPU + CPU

Lossless JPEG2000 encoding at single image mode for 4K image: 4k_wild.ppm (3840×2160, 4:4:4, 24-bit)

JPEG2000 encoders Average encoding time Performance Frames per second Compression ratio Hardware
OpenJPEG (single thread) 2247 ms 10.6 MB/s 0.4 fps 2.776 CPU
Jasper 2647 ms 9.0 MB/s 0.4 fps 2.776 CPU
Kakadu 7.10.2 (single thread) 1486 ms 16.0 MB/s 0.7 fps 2.776 CPU
Kakadu 7.10.2 (16 threads) 152 ms 156.1 MB/s 6.6 fps 2.776 CPU
CUJ2K Encoder 195 ms 127.1 MB/s 5.1 fps 2.773 GPU + CPU
Fastvideo JPEG2000 Encoder 17.0 ms 1404 MB/s 59.2 fps 2.776 GPU + CPU

Superior performance of J2K encoder at batch multithreaded mode

For the multithreaded batch mode we've carried out performance measurements for JPEG 2000 encoding exactly with the same parameters as we used at the single image mode. All results don't include host I/O latency (image loading to RAM from HDD/SSD and saving back). In the table below, one can find averaged measurement results for the best series of frames (each lasting 10 seconds).

JPEG2000 encoding parameters Lossy Lossless
2K image, 24-bit, cb 32×32 765 fps 413 fps
4K image, 24-bit, cb 32×32 212 fps 117 fps

As we know, these are the fastest benchmarks on the market for J2K encoder both for CPU and GPU.

To the best of our knowledge, the above performance benchmarks for J2K lossy and lossless encoding are the fastest among all existing open source and commercial J2K encoders on CPU or GPU both for single image mode and for batch mode. To make it transparent and simple, we have published all info concerning time measurements, together with sample images, JPEG2000 parameters and hardware specifications to offer everyone an opportunity to reproduce our results and to check performance measurements of other J2K encoders at the same testing conditions. Our demo GPU J2K encoder for Windows could be downloaded here. This is the link to Fastvideo J2K decoder benchmarks.

One can also download our latest benchmarks for Jetson Nano, TX2, AGX Xavier, GeForce GTX 1080 and Quadro P6000. You can find there not only results for J2K encoding on GPU, but also benchmarks for other image processing algorithms from Fastvideo Image & Video Processing SDK. New benchmarks for NVIDIA GeForce RTX 4090 are expected soon.

Please let us know about your performance results for JPEG2000 encoders that you could have: Aware, Comprimato, Elecard, ERDAS ECW, FFmpeg, Kakadu, Leadtools, Lizardtech, Lurawave, Mainconcept, Morgan, etc.

J2K encoder benchmarks on the NVIDIA GeForce RTX 4090 at batch multithreaded mode

These are the same benchmarks for JPEG 2000 encoder on the GeForce RTX 4090:

JPEG2000 encoding parameters Lossy encoding Lossless encoding
2K image, 24-bit, cb 32×32 2215 fps 1370 fps
4K image, 24-bit, cb 32×32 750 fps 470 fps

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.