Skip to content

FIX: Use sequence=0 for both pieces in multimodal dataset loaders#1756

Open
romanlutz wants to merge 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/fix-multimodal-sequence
Open

FIX: Use sequence=0 for both pieces in multimodal dataset loaders#1756
romanlutz wants to merge 1 commit into
microsoft:mainfrom
romanlutz:romanlutz/fix-multimodal-sequence

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Four multimodal remote dataset loaders were assigning sequence=0 to one piece (image or text) and sequence=1 to the other while sharing the same prompt_group_id. Per SeedPrompt.sequence (pyrit/models/seeds/seed_prompt.py:43-44), prompts are only grouped into a single multimodal user message when they share both prompt_group_id and sequence. With mismatched sequences, the image and text were being delivered as two separate turns rather than as a single multimodal message, which defeats the purpose of these datasets (the model is supposed to reason over image + text together).

This PR brings the four affected loaders in line with the correct pattern already used by harmbench_multimodal_dataset.py and the recently added msts_dataset.py: both pieces share prompt_group_id and sequence=0.

Loader changes (pyrit/datasets/seed_datasets/remote/):

  • vlguard_dataset.py - image sequence=1 -> 0.
  • vlsu_multimodal_dataset.py - image sequence=1 -> 0.
  • visual_leak_bench_dataset.py - text sequence=1 -> 0. Reworded class and fetch_dataset_async docstrings that described the old behavior.
  • comic_jailbreak_dataset.py - text sequence=1 -> 0. Reworded fetch_dataset_async and _build_seed_group docstrings. The SeedObjective in the group is unchanged - only the image+text pair needs to share sequence=0.

Tests and Documentation

Updated the four corresponding unit tests under tests/unit/datasets/ to assert the new shared sequence == 0 for both pieces (one assertion change per file).

  • uv run ruff format pyrit tests - clean
  • uv run ruff check pyrit tests - clean
  • uv run -m ty check pyrit/datasets/seed_datasets/remote - clean
  • uv run pytest tests/unit/datasets -q - 422 passed
  • pre-commit hooks passed on commit

No JupyText/doc changes needed (no docs reference these sequence numbers).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant