PRX pixel pipeline by DavidBert · Pull Request #6 · Photoroom/diffusers

DavidBert · 2026-06-04T12:19:23Z

PRX-Pixel in 🧨 diffusers

Quick note for running a trained PRX-Pixel checkpoint (7B, pixel-space RGB, no VAE, Qwen3-VL
text tower) through PRXPixelPipeline. Three steps: convert → load → predict.

Checkpoints

The research checkpoints live on the other cluster (point --checkpoint_path at one of these):

Model	Path
Base model (SFT)	`/mnt/data/users/davidb/checkpoints/PRX7B-ckpt/SFT`
RLHF (FDFO)	`/mnt/data/users/davidb/checkpoints/PRX7B-ckpt/FDFO_forensic_omniaid`

1. Setup

PRX-Pixel needs the Qwen3-VL text tower → transformers >= 4.57 (pin < 5, 5.x breaks torchvision).

# from the diffusers repo root
uv venv --python 3.12 --system-site-packages .venv_prxpixel
uv pip install --python .venv_prxpixel/bin/python "transformers>=4.57,<5" accelerate

2. Convert the checkpoint

Reads the research checkpoint (a DCP dir *.distcp or a .pt file) and writes a diffusers folder.

CUDA_VISIBLE_DEVICES=5 uv run --no-project --python .venv_prxpixel/bin/python \
  scripts/convert_prx_to_diffusers.py \
  --checkpoint_path /path/to/ep0-ba400 \
  --output_path     checkpoints_prx/prxpixel-diffusers \
  --variant         pixel \
  --resolution      1024

Look for ✓ All parameters loaded successfully (0 missing, 0 unexpected)!. This also downloads the
Qwen3-VL text encoder + tokenizer into the output folder.

3. Load + predict

import torch, numpy as np
from PIL import Image
from diffusers import PRXPixelPipeline

pipe = PRXPixelPipeline.from_pretrained("checkpoints_prx/prxpixel-diffusers", torch_dtype=torch.bfloat16).to("cuda:0")

out = pipe(
    "A polished brass weathervane shaped like a rooster against a deep blue sky",
    height=1024, width=1024,
    num_inference_steps=50,
    guidance_scale=1.0,                          # CFG 1 = no guidance; works great here
    output_type="pt",                            # pixel-space, no VAE -> get the raw tensor
    generator=torch.Generator("cuda:0").manual_seed(0),
).images                                         # tensor in [-1, 1]

img = (out.float().clamp(-1, 1) + 1) / 2         # -> [0, 1]
arr = (img[0].permute(1, 2, 0).cpu().numpy() * 255).round().astype(np.uint8)
Image.fromarray(arr).save("prxpixel.png")

The pipeline already handles the PRX-Pixel specifics (x0-prediction, noise_scale=2, 256-token
budget, full-res RGB). Good defaults: 50 steps, CFG 1, scheduler shift ≈ 3.

Run scripts with .venv_prxpixel/bin/python <script> (or uv run --no-project --python .venv_prxpixel/bin/python <script> — --no-project is needed since this repo has no [project] table).

- Use relative imports in pipeline_prx_pixel.py - Register prompt_max_tokens/noise_scale to config for save/load round-trip - Fix Optional[int] annotation for bottleneck_size - Fix PRXResolutionEmbedder dtype handling (cast to compute dtype, fixes layerwise casting with float8 storage) - Fix import ordering (PRXPipeline before PRXPixelPipeline) in __init__ files and dummy objects - Add PRXPixelPipeline autodoc entry and pixel-variant mention to docs - Add fast pipeline tests (tests/pipelines/prx/test_pipeline_prx_pixel.py) - make style/quality/fix-copies clean Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The inherited PRXPipeline.__call__ raised for output_type='pil'/'np' when no VAE was loaded. Pixel-space outputs are already images in [-1, 1], so PRXPixelPipeline now creates a PixArtImageProcessor (vae_scale_factor=1) and the base post-processing denormalizes the denoised latents directly instead of requiring a VAE decode. This also enables resolution binning for the pixel pipeline. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

check_inputs only checked divisibility by vae_scale_factor, which is 1 for the pixel pipeline (and ignores the patch size for latent ones), so sizes like 1000px passed validation and crashed mid-denoising with an opaque reshape RuntimeError in img2seq. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Add the pixel model to the available-models table, a pixel-space loading example in the docs, and an Examples block in the PRXPixelPipeline docstring now that the weights are public. Verified end-to-end: from_pretrained + 1024px generation. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Lead the intro, models table, loading examples, and autodoc sections with the pixel-space model; present the latent-space checkpoints as earlier PRX versions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

- PRXPixelPipeline now inherits DiffusionPipeline directly (not PRXPipeline); shared methods copied via # Copied from, __call__ and pixel-specific methods reimplemented standalone - tokenizer_max_length and skip_text_cleaning added as explicit __call__ and encode_prompt args in PRXPipeline (per comment 1) - prediction_type removed entirely (baked per-class); noise_scale is a proper PRXPixelPipeline __init__ arg registered to config (per comment 2) - Remove xfail mark from pixel tests (per comment 4) - Add docs/source/en/api/pipelines/prx_pixel.md + toctree entry; restore prx.md to pre-PR state Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

PRX pixel pipeline

35ba10e

github-actions Bot added pipelines models utils labels Jun 4, 2026

DavidBert and others added 3 commits June 11, 2026 21:34

github-actions Bot added documentation Improvements or additions to documentation tests labels Jun 12, 2026

DavidBert and others added 3 commits June 12, 2026 12:04

Restructure PRX docs around PRXPixel as the flagship model

9b03b6f

Lead the intro, models table, loading examples, and autodoc sections with the pixel-space model; present the latent-space checkpoints as earlier PRX versions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRX pixel pipeline#6

PRX pixel pipeline#6
DavidBert wants to merge 7 commits into
mainfrom
prx-pixel-pipeline

DavidBert commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DavidBert commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PRX-Pixel in 🧨 diffusers

Checkpoints

1. Setup

2. Convert the checkpoint

3. Load + predict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DavidBert commented Jun 4, 2026 •

edited

Loading