Skip to content

AutoCompSysLab/C3-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

C3-Bench: A Context-Aware Change Captioning Benchmark

This repository represents the official implementation of the paper titled "C3-Bench: A Context-Aware Change Captioning Benchmark (ECCV 2026)".

Paper Dataset License ECCV 2026

Jaewoo Kim · Hyeongbeom Kim · Uehwan Kim
ECCV 2026



Overview of C3-Bench. The examples are from each context in C3-Bench.


💡 Problem Formulation

Change Captioning aims to describe the changes between two images.

However, what counts as change is inherently context-dependent. For example, when one is asked to describe the change between the given image pair (see right), what might first come to mind is "in which context?", as the definition of correct change can vary depending on the given context:

the valid description would be "the snow has covered the ground, and cloud cover has decreased." in respect of weather, whereas it is "a train has appeared on the left side of the tracks" for railway surveillance, with weather differences treated as pseudo-changes.



What has changed? (Motivation)
Without context, this generic question can admit multiple logically valid descriptions.


To meaningfully communicate and determine the correct change description among multiple logically valid alternatives in a heterogeneous visual world, each change must be grounded in specific contexts and associated criteria which clearly define the underlying semantics.

🌐 C3-Bench

We introduce C3-Bench, a comprehensive benchmark for Context-aware Change Captioning, featuring:

  • 4,996 human-annotated image pairs with change caption and context-specific criteria

  • 51 real-world change contexts

  • 4 visual domains:

    • Natural Scenes
    • Remote Sensing Imagery
    • Image Editing
    • Anomalies
  • Human-aligned LLM-as-a-Judge evaluation for fine-grained semantics and reversibility

  • Comprehensive benchmarking of 32 models, including:

    • 6 conventional change captioning models, such as DUDA
    • 9 leading proprietary MLLMs, such as GPT-5.2 and Gemini 3
    • 17 open-source MLLMs, such as Qwen3 and InternVL3.5


Examples from C3-Bench. Each image pair is displayed with its Domain: Context.

🏆 Results

Key Findings

  • Humans still set the upper bound.
    Human evaluators outperform the strongest LMM, GPT-5.2, by 1.73 points in Aggregation and achieve a high Reversibility score of 0.93, revealing a clear gap between current models and human-level change understanding.

  • Fluency is not understanding.
    Conventional change captioning models often generate fluent sentences, but their performance drops sharply across diverse real-world contexts, showing that linguistic quality alone does not guarantee correct change reasoning.

  • Context matters.
    The failure of conventional models highlights the limitation of prior benchmarks: models trained on narrow, dataset-specific change definitions struggle when the target change semantics shift across contexts.

  • LMMs reshape the landscape.
    Proprietary LMMs deliver the strongest overall performance, with GPT-5.2 leading the benchmark, demonstrating the benefit of large-scale multimodal reasoning under explicit context conditioning.

  • Open-source LMMs are catching up fast.
    Qwen3-VL-32B achieves highly competitive results, approaching proprietary models and trailing GPT-5.2 by only 0.35 points in Aggregation.



C3-Bench results. Mean and standard deviation are reported over three GPT-5.2 runs.

📃 Citation

If you find the work useful for your research, please cite:

@InProceedings{Kim_2026_ECCV,
    author    = {Kim, Jae-Woo and Kim, Hyeongbeom and Kim, Ue-Hwan},
    title     = {C3-Bench: A Context-Aware Change Captioning Benchmark},
    booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
    year      = {2026}
}

=

About

This repository represents the official implementation of the paper titled "C3-Bench (ECCV 2026)".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors