Skip to content

chammoru/Q1Net

Repository files navigation

Q1Net: Quality Level Prediction of Image Compression using Block-wise Confidence-aware CNN

CI Paper License

Official implementation of the BMVC 2021 paper. Paper: https://bmva-archive.org.uk/bmvc/2021/conference/papers/paper_0813.html

Q1Net predicts the quality level of a compressed image (e.g. the JPEG quality factor) directly from the image, using a block-wise, confidence-aware CNN.

Highlights

  • Real-time: predicts the compression quality level in milliseconds, fast enough to run on mobile devices.
  • Accurate: over 99% accuracy in the paper's experiments.
  • Block-wise & confidence-aware: exploits the characteristic deformations transform coding leaves on small blocks, estimates a per-patch confidence, and fuses only the reliable patches instead of processing the whole image.
  • Deployable: exports to TensorFlow Lite for on-device inference.

How it works

Instead of looking at the whole image, Q1Net samples small patches around coding blocks, runs a lightweight CNN on each patch to predict a quality value together with a confidence, keeps only the high-confidence patches, and fuses them:

flowchart LR
    A[Input image] --> B[Sample small patches<br/>around coding blocks]
    B --> C[CNN backbone]
    C --> D[Per-patch:<br/>confidence + quality]
    D --> E{confidence above<br/>threshold?}
    E -- yes --> F[Keep patch]
    E -- no --> G[Discard patch]
    F --> H[Fuse by median<br/>= predicted quality]
Loading

The per-patch backbone is a compact residual CNN operating on 16x16x3 patches:

flowchart LR
    I[16x16x3 patch] --> S["CBR + Bottleneck stages<br/>channels 8 - 16 - 32 - 64 - 32 - 16"]
    S --> CV[Conv 3x3, ReLU]
    CV --> P[Global average pooling]
    P --> O["Dense 2, sigmoid x100<br/>= confidence, quality"]
Loading

CBR is Conv to BatchNorm to ReLU; the bottleneck is a 1x1 to 3x3 to 1x1 residual block. The confidence-aware loss down-weights unreliable patches during training.

Results

Confusion matrices over 10,000 compressed images spanning all 100 quality levels (Figure 4 from the paper). A sharper diagonal means more accurate quality prediction: Q1Net (c) produces a markedly tighter diagonal than MobileNetV2 (a) and JQE (b), staying accurate across the full quality range.

Confusion matrices comparing MobileNetV2, JQE, and Q1Net

Authors

  • Kyuwon Kim (chammoru at gmail, q1.kim at samsung)
  • Chulju Yang (ijn9429 at gmail, chulju at samsung)

Citation

@InProceedings{kim2021q1net,
  title     = {Quality Level Prediction of Image Compression using Block-wise Confidence-aware CNN},
  author    = {Kim, Kyuwon and Yang, Chulju},
  booktitle = {Proceedings of the British Machine Vision Conference (BMVC)},
  month     = {November},
  year      = {2021}
}

Requirements

  • Python 3 (tested on 3.12)
  • The pinned packages in requirements.txt (TensorFlow 2.16, installed in the setup step below)

TensorFlow 2.16 defaults to Keras 3, but this project uses the Keras 2 API. env.sh exports TF_USE_LEGACY_KERAS=1 so the tf-keras (Keras 2) implementation is used.

Dataset

This project uses the DIV2K dataset.

Clone and setup

The pretrained model weights are tracked with Git LFS, so install it before cloning:

# install Git LFS once per machine, then enable it for your user
git lfs install

git clone https://github.com/chammoru/Q1Net.git
cd Q1Net

# (recommended) create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate

# install the pinned dependencies
pip install -r requirements.txt

# go to the source directory and set up the environment
# (adds the repo root to PYTHONPATH and exports TF_USE_LEGACY_KERAS=1)
cd classifier
. ./env.sh

If you already cloned the repository without Git LFS, fetch the weights with:

git lfs pull

Pretrained weights without Git LFS

If you cannot use Git LFS, download q1net-weights.zip from the Releases page and extract it at the repository root so that classifier/save/<comp_type>/best/ contains the checkpoint files (.index and .data-*). The commands that load the model print a clear error if the weights are missing or are still unresolved Git LFS pointers.

The supported compression types (--comp_type) are jpeg_paper and jpeg_paper_k12.

Prediction

Predict the quality level of a single image:

python3 ./predict_cls.py --in_path ../sample_image/monarch_jpeg_q20.png --comp_type jpeg_paper

The sample image is JPEG quality 20, so the output is close to 20:

predicted quality 20.01, estimated in 0.135 seconds

Evaluation

Evaluate over a directory of images. Each image is compressed at every quality level, predicted, and compared against the ground truth; the mean absolute error is reported and a confusion matrix is saved to --out_path (default out/).

# Download the validation set
wget https://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_valid_HR.zip
unzip DIV2K_valid_HR.zip

python3 evaluate_cls.py --comp_type jpeg_paper --in_path DIV2K_valid_HR

Training

# Download the training set
wget https://data.vision.ee.ethz.ch/cvl/DIV2K/DIV2K_train_HR.zip
unzip DIV2K_train_HR.zip

sh batch_train_jpeg_paper.sh

During training, gen_data.py generates an HDF5 file of training data that train.py then consumes.

Convert the model to TFLite

python3 ./to_tflite.py --comp_type jpeg_paper

Applications

Q1Net can benefit a wide range of applications, including:

  • Image/photo editors
  • (Streaming) video players and photo viewers
  • Web browsers
  • Video conferencing
  • Instant messaging apps
  • And many more

For example, knowing the compression quality of a photo (such as the ID photo in a mobile driver's-license app below) lets an app decide whether to enhance it before display:

Mobile ID-photo quality use case

Image source: Yonhap News (watermarked); used here for illustration only.

License

This code is released for non-commercial research and evaluation purposes only. The methods implemented here are covered by U.S. Patent No. 12,462,356 B2, owned by Samsung Electronics Co., Ltd.; no patent license is granted, and commercial use requires a separate license. See LICENSE for the full terms.

About

Block-wise confidence-aware CNN that predicts the compression quality level (e.g. JPEG quality factor) of an image in real time — official TensorFlow implementation of the BMVC 2021 paper

Topics

Resources

License

Stars

Watchers

Forks

Contributors