Benchmarks for J2K decoders on CPU and GPU

Below we provide the benchmarks for Fastvideo JPEG2000 Decoder on NVIDIA GPU in comparison with other freely available open source and proprietary J2K decoding software on CPU.

Approaches for J2K decoder performance measurements

There are two standard approaches to performance measurements of JPEG2000 codecs, which utilize GPU. They correspond to the two most common use cases for J2K decoders.

1. Single image mode consists in processing of single image at a time and could be called "latency-oriented" or "low latency" approach. In that case the time interval (latency) between availability of original image in RAM and availability of the processed image in RAM is measured. It means that software cannot expect that any additional images will be processed at the same time and therefore cannot take advantage of multiple image decoding. Overlapping of current image processing with other activities is undesirable because it would increase total latency.

2. Batch mode consists in processing of batch of images and could be called "throughput-oriented" or "maximum performance". In that case frame rate becomes more important than latency. It is calculated via division of the total time of processing by the number of processed images. Some J2K decoders are optimized for this use case, meaning that exploiting of task parallelism leads to better frame rate (throughput) at the expense of increased processing time for separate images. It is possible, because we actually have three devices (CPU, GPU and bus interface between them), which can be used simultaneously in that mode, whereas at single image mode these devices are used sequentially for different stages of JPEG2000 algorithm. Moreover, GPU can process several images simultaneously to increase frame rate even more, if each image is too small for decoder to load a multitude of GPU cores (especially at Tier-1 stage). Important limitation for simultaneous processing of several images is imposed by amount of free GPU memory. Batch mode is a must for streaming applications when the pipeline contains J2K decoder. For more complicated workflow it could be better to utilize single image mode, though fps will be reduced.

j2k decoder benchmarks

Briefly, J2K decoder at batch mode can take into account specific methods of task parallelism, based on the following:

  • both upload to GPU and download from GPU could overlap with JPEG2000 processing on GPU (CUDA Streams)
  • Tier-1 and Tier-2 could be done in parallel: Tier-1 on GPU and multithreaded Tier-2 on CPU at the same time
  • multiple (batch) J2K processing to increase general GPU occupancy

CPU-based JPEG2000 solutions have no explicit implementation of batch mode, because all processing stages are done on CPU and complete loading of available CPU cores can be achieved by simply running multiple decoders in separate processes. Multithreaded mode of CPU-based J2K decoders decreases latency of single image processing, so we can consider this mode as single image mode.

At the moment we don't consider here the following possible modes for J2K decoding on GPU:

  • multiple GPU mode
  • multiple tile mode for big images
  • fast parallel J2K processing with RESET, RESTART, CAUSAL and BYPASS modes

Results for all these modes will be published as soon as their implementations are ready.

We don't hide anything concerning benchmarking procedures and the achieved results. Thus, everyone can always reproduce our benchmarks, because we publish not only timing and performance, we supply full info about hardware, JPEG2000 parameters, test images and testing modes.

J2K decoding benchmarks

We've carried out time and performance measurements for JPEG2000 decoding for 24-bit images with 2K and 4K resolutions. All results don't include any host I/O latency (image loading to RAM from HDD/SSD and saving back) and we've also excluded host-to-device transfer time. We've done such an assumption to reproduce J2K decoder usage in our conventional image processing pipeline, when decompressed data reside in GPU memory. Results for GPU-based JPEG2000 decoding software also include Tier-2 time on CPU, because this stage in our implementation is performed on CPU. In the tables below, one can find averaged results for the best series of 100 measurements.

JPEG2000 decoderJPEG2000 decoding parameters

  • File format – JP2
  • Lossy JPEG 2000 with CDF 9/7 wavelet
  • Lossless JPEG 2000 with CDF 5/3 wavelet
  • Compression ratio (for lossy algorithm) ~ 12.0 which corresponds to visually lossless compression
  • Subsampling mode – 4:4:4
  • Number of DWT resolutions – 7
  • Codeblock size – 32×32
  • MCT – on
  • PCRD – off
  • Tiling – off
  • Quality layers – one
  • Progression order – LRCP (L = layer, R = resolution, C = component, P = position)
  • Modes of operation – single or batch

Test images

Hardware and software

  • CPU AMD Ryzen9 7950X (16 cores, 4.5–5.7 GHz)
  • GPU NVIDIA GeForce RTX 4090 (Ada Lovelace, 128 SMMs, 16384 cores, 2.2–2.5 GHz)
  • OS Windows 11 Pro (x64), version 23H2
  • CUDA Toolkit 12.6

JPEG2000 Decoders for comparison

  • OpenJPEG 2.5.2
  • Jasper 2.0.16
  • J2K-Codec 2.2
  • Kakadu 7.10.2
  • Fastvideo JPEG2000 (SDK version 0.18.0.0)

J2K decoding at single image mode for 2K image with lossy compression: 2k_wild_lossy.jp2 (1920×1080, 4:4:4, 24-bit)

JPEG2000 decoders Average decoding time Performance Frames per second Hardware
OpenJPEG (single thread) 66 ms 90 MB/s 15.2 fps CPU
OpenJPEG (multiple threads) 23 ms 258 MB/s 43.5 fps CPU
Jasper 385 ms 15 MB/s 2.6 fps CPU
J2K-Codec 110 ms 54 MB/s 9.1 fps CPU
Kakadu (single thread) 84 ms 71 MB/s 11.9 fps CPU
Kakadu (32 threads) 19 ms 312 MB/s 52.6 fps CPU
Fastvideo JPEG2000 decoder 6.1 ms 971 MB/s 164 fps GPU + CPU

J2K decoding at single image mode for 4K image with lossy compression: 4k_wild_lossy.jp2 (3840×2160, 4:4:4, 24-bit)

JPEG2000 decoders Average decoding time Performance Frames per second Hardware
OpenJPEG (single thread) 270 ms 88 MB/s 3.7 fps CPU
OpenJPEG (multiple threads) 86 ms 276 MB/s 11.6 fps CPU
Jasper 1478 ms 16 MB/s 0.7 fps CPU
J2K-Codec 469 ms 51 MB/s 2.1 fps CPU
Kakadu (single thread) 372 ms 64 MB/s 2.7 fps CPU
Kakadu (32 threads) 71 ms 334 MB/s 14.1 fps CPU
Fastvideo JPEG2000 decoder 11.1 ms 2138 MB/s 90.1 fps GPU + CPU

MB/s – MegaBytes per second

J2K decoding at single image mode for 8K image with lossy compression: 8k_wild_lossy.jp2 (7680×4320, 4:4:4, 24-bit)

JPEG2000 decoders Average decoding time Performance Frames per second Hardware
OpenJPEG (single thread) 3640 ms 26 MB/s 0.3 fps CPU
OpenJPEG (multiple threads) 380 ms 250 MB/s 2.6 fps CPU
Jasper 6792 ms 14 MB/s 0.15 fps CPU
J2K-Codec 812 ms 117 MB/s 1.2 fps CPU
Kakadu (single thread) 1470 ms 65 MB/s 0.7 fps CPU
Kakadu (32 threads) 239 ms 397 MB/s 4.2 fps CPU
Fastvideo JPEG2000 decoder 33.6 ms 2825 MB/s 29.8 fps GPU + CPU

 

J2K decoding at single image mode for 2K image with lossless compression: 2k_wild_lossless.jp2 (1920×1080, 4:4:4, 24-bit)

JPEG2000 decoders Average decoding time Performance Frames per second Hardware
OpenJPEG (single thread) 196 ms 30 MB/s 5.1 fps CPU
OpenJPEG (multiple threads) 27 ms 220 MB/s 37 fps CPU
Jasper 820 ms 7.2 MB/s 1.2 fps CPU
J2K-Codec 390 ms 15 MB/s 2.6 fps CPU
Kakadu (single thread) 500 ms 12 MB/s 2.0 fps CPU
Kakadu (32 threads) 47 ms 126 MB/s 21 fps CPU
Fastvideo JPEG2000 decoder 8.5 ms 695 MB/s 117 fps GPU + CPU

J2K decoding at single image mode for 4K image with lossless compression: 4k_wild_lossless.jp2 (3840×2160, 4:4:4, 24-bit)

JPEG2000 decoders Average decoding time Performance Frames per second Hardware
OpenJPEG (single thread) 673 ms 35 MB/s 1.5 fps CPU
OpenJPEG (multiple threads) 90 ms 264 MB/s 11.1 fps CPU
Jasper 3141 ms 7.6 MB/s 0.3 fps CPU
J2K-Codec 1312 ms 18 MB/s 0.8 fps CPU
Kakadu (single thread) 1562 ms 15.2 MB/s 0.6 fps CPU
Kakadu (32 threads) 139 ms 171 MB/s 7.2 fps CPU
Fastvideo JPEG2000 decoder 15.5 ms 1527 MB/s 64.4 fps GPU + CPU

MB/s – MegaBytes per second

Superior performance of JPEG 2000 decoding at batch mode

For batch mode we've carried out performance measurements for JPEG 2000 decoding exactly with the same parameters as we used at single image mode. In the table below, you can find averaged results for the best series of measurements (each lasting 10 seconds). All results don't include host I/O latency (image loading to RAM from HDD/SSD and saving back).

JPEG2000 decoding benchmarks at the multithreaded batch mode

  2k_wild_lossy.jp2 4k_wild_lossy.jp2 8k_wild_lossy.jp2 2k_wild_lossless.jp2 4k_wild_lossless.jp2
Fastvideo J2K decoder 1317 fps 362 fps 85 fps 524 fps 163 fps
Kakadu 7.10.2 (32 threads) 76.9 fps 23.8 fps 6.8 fps 62.5 fps 18.5 fps
OpenJPEG 2.5.2 (multiple threads) 43.5 fps 11.6 fps 2.6 fps 37.0 fps 11.1 fps
J2K-Codec 2.2 21.3 fps 5.8 fps 1.2 fps 7.1 fps 2.0 fps

We have published all info concerning time measurements, together with sample images, JPEG2000 parameters and hardware specifications to offer everyone an opportunity to reproduce our results and to check performance measurements for other J2K decoders at the same testing conditions.

Contact Form

This form collects your name and email. Check out our Privacy Policy on how we protect and manage your personal data.