PyC

A unified HPC toolchain — deterministic compiler, async runtime, CUTLASS kernels, and SciML inference.

PyC pairs a rock-stable, cross-platform CMake core with an experimental "compiler-next" optimization stack: an SSA-style IR, a deterministic pass pipeline, a memory planner, kernel autotuning, and a reason-coded CUDA dispatch path with graph replay.

Why PyC

PyC is built around two contracts that evolve independently:

Contract	Promise	Scope
Stable core	Reproducible CMake targets, deterministic CI, clean downstream linking	`pyc_core_obj`, `pyc_core`, `pyc_foundation`, `pyc`
Compiler-next	Fast-moving optimization research with explicit, observable behavior	IR, passes, allocator, kernel registry, runtime rails, CUDA backend, AI bridge

The guiding design rule is isolation: the experimental compiler stack can move quickly without ever destabilizing the stable artifacts that CI and downstream consumers depend on.

What sets the compiler-next stack apart:

Deterministic by construction — module fingerprinting, a content-addressed compile cache, and strict-mode guards that fail loud with explicit reasons instead of drifting silently.
Reason-coded fallback — every CUDA dispatch records why it took the native path, replayed a captured graph, or fell back to the CPU executor.
Observable optimization — decision logs and pyc_run_stats expose objective mode, pressure score, kernel selection traces, graph-break taxonomy, and autotune state.
Co-designed memory + kernels — a first-fit lifetime-reuse allocator and a scoring kernel registry that share pressure signals.

Architecture

PyC is a polyglot stack. A thin Python/Rust surface sits over a deterministic C engine, which dispatches to CUDA/CUTLASS kernels when a GPU is present and falls back to a CPU executor otherwise.

flowchart TD
    subgraph Surface["Surface — Python / Rust"]
        PY["pyc Python package<br/>(CLI, runtime control plane)"]
        API["FastAPI inference portal<br/>apps/inference_api"]
        RS["vortex_core (Rust)<br/>async runtime + PyO3 bindings"]
    end

    subgraph Engine["Compiler-next Engine — C11"]
        CAPI["compiler_api.c<br/>compile/run lifecycle, cache, decision log"]
        IR["ir.c<br/>op graph, verify, serialize"]
        PASS["pass_manager.c<br/>canonicalize · shape · fuse · liveness · graph-break"]
        ALLOC["runtime_allocator.c<br/>lifetime reuse + rematerialization"]
        KREG["kernel_registry.c<br/>candidate scoring + autotune"]
        CTRL["runtime_control.c<br/>objective mode + rollback rails"]
    end

    subgraph Backends["Backends"]
        CUDA["cuda_backend.c<br/>native cuBLAS/cuBLASLt + graph replay"]
        CUTLASS["CUTLASS kernels<br/>GEMM · attention · conv2d · Ada fast path"]
        CPU["CPU executor<br/>deterministic reference path"]
        COMM["collective comm<br/>MPI · NCCL · RCCL · stub"]
    end

    PY --> RS --> CAPI
    API --> CAPI
    CAPI --> IR --> PASS --> ALLOC --> KREG --> CTRL
    CTRL --> CUDA
    CUDA --> CUTLASS
    CUDA -. reason-coded fallback .-> CPU
    CTRL --> COMM

Compile → run lifecycle:

flowchart LR
    A["pyc_compile_model"] --> B["verify IR"]
    B --> C["fingerprint + cache key"]
    C --> D{"cache hit?"}
    D -- yes --> E["restore plan/kernel"]
    D -- no --> F["pass pipeline"]
    F --> G["allocation plan"]
    G --> H["select kernel + autotune"]
    H --> I["store cache entry"]
    E --> J["pyc_run_model"]
    I --> J
    J --> K{"GPU available?"}
    K -- yes --> L["CUDA dispatch<br/>(native / graph replay)"]
    K -- no --> M["CPU executor"]
    L --> N["runtime rails + stats"]
    M --> N

For the full component matrix, IR rules, pass order, allocator algorithm, selection scoring, and CUDA control variables, see docs/architecture/system-architecture.md.

Repository Layout

PyC/
├── src/
│   ├── core/              # Stable C core: file adapter, symbol table, stack, CI driver
│   ├── compiler/          # Compiler-next: IR, passes, runtime rails, CUDA backend, AI bridge
│   │   ├── ir/            # Op-graph IR + deterministic serialization
│   │   ├── passes/        # Canonicalize, shape inference, fusion, liveness, graph-break
│   │   ├── runtime/       # Allocator, kernel registry, controller, CUDA + comm backends
│   │   └── cutlass_kernels/  # GEMM, attention, conv2d, Ada async fast path
│   └── runtime/           # Rust async runtime (vortex_core) + PyO3 bindings
├── include/pyc/           # Public C headers (the compiler-next API surface)
├── python/pyc/            # Python package: CLI, runtime control plane, telemetry
├── apps/inference_api/    # FastAPI inference portal
├── kernels/               # Kernel lab tooling + Ada/Hopper prototype families
├── benchmark/             # Deterministic benchmark harness, workloads, regression gates
├── tests/                 # C compiler-next tests + Python tests + repo validators
├── infra/                 # GPU-host bootstrap (Docker image + setup scripts)
├── scripts/               # Developer + benchmark automation
├── docs/                  # Architecture, references, plans, reports, roadmap
└── web/site/              # Static site, download page, inference portal, published results

Quick Start

Prerequisites: CMake ≥ 3.10, a C11 compiler, and Python 3 (for the benchmark harness).

# Configure and build the stable core targets
cmake -S . -B build
cmake --build build --parallel --target pyc pyc_core pyc_foundation

# Smoke test
./build/pyc          # Linux/macOS
# .\build\Release\pyc.exe   # Windows multi-config generators

Expected output:

PyC CI driver: core targets configured successfully.

To build the experimental compiler-next stack and its smoke test:

cmake -S . -B build -D PYC_BUILD_COMPILER_NEXT=ON -D PYC_BUILD_COMPILER_NEXT_TESTS=ON
cmake --build build --parallel --target pyc_compiler_next pyc_compiler_next_smoke
./build/pyc_compiler_next_smoke
ctest --test-dir build --output-on-failure

Incremental builds: reuse the same build/ directory and re-run cmake --build build --parallel. CI's Linux/macOS jobs use ccache to cut repeated compile time.

Build Targets

Target	Type	Description
`pyc_core_obj`	object library	Stable core objects, compiled once
`pyc_core`	static library	Canonical static library for downstream linking
`pyc_foundation`	static library	Compatibility alias of the same objects
`pyc`	executable	Minimal deterministic CI/smoke driver
`pyc_compiler_next`	static library	Compiler-next engine (requires `PYC_BUILD_COMPILER_NEXT=ON`)
`pyc_compiler_next_smoke`	executable	Compiler-next smoke test
`pyc_compiler_next_test_*`	executables	Deterministic compiler-next test suite (requires `PYC_BUILD_COMPILER_NEXT_TESTS=ON`)

Continuous Integration

A single canonical workflow — cmake-multi-platform.yml — runs on Ubuntu, macOS, and Windows:

CMake configure
Build + smoke-test the compiler-next targets
ctest across the deterministic suite
Build + smoke-test the stable targets
Benchmark regression guardrail (Ubuntu)

CI also enforces source coverage: every active .c file under src/core/C_Files, src/compiler, and tests/compiler_next must be referenced by CMakeLists.txt, or the build fails.

Using PyC

Link `pyc_core` into your project

cmake -S . -B build
cmake --build build --parallel --target pyc_core

In your C/C++ project, add the include path src/core/Header_Files/ and link build/libpyc_core.a (or the platform equivalent). Use pyc_foundation only where downstream compatibility requires it.

Compiler-next public API

The compiler-next surface is the set of headers under include/pyc/:

Header	Purpose
`compiler_api.h`	Compile/run lifecycle, options, output stats, decision log
`ir.h`	Op-graph IR construction and verification
`pass_manager.h`	Pass pipeline configuration and reports
`runtime_allocator.h`	Lifetime-based allocation planner
`kernel_registry.h`	Kernel candidate registration, scoring, autotune
`runtime_control.h`	Objective-mode switching and rollback rails
`cuda_backend.h`	CUDA dispatch, graph replay, reason-coded fallback
`ai_bridge.h`	AI-assisted optimization bridge layer

Benchmarking

PyC ships a deterministic benchmark harness for the stable core targets.

python3 benchmark/harness.py --repeats 7 --micro-rounds 4000

Outputs:

benchmark/benchmarks/results/json/latest_core.json
benchmark/benchmarks/results/reports/latest_core.md
docs/reports/performance-results.md

Publish website-ready artifacts:

python3 scripts/publish_site_results.py
# -> web/site/results/manifest.json, latest-summary.json, artifacts/**

See docs/reference/benchmarking.md for methodology.

GPU Benchmarking (Remote CUDA)

For real GPU testing on a rented Linux host:

# 1. Provision Ubuntu + NVIDIA GPU, then reuse the bootstrap image
bash infra/build_bootstrap_image.sh
INSTALL_SYSTEM_DEPS=0 bash infra/run_bootstrap_image.sh

# 2. Validate the toolchain if needed
bash scripts/setup_cuda_remote_ubuntu.sh
source .venv/bin/activate

# 3. Run the standardized GPU suite
python3 benchmark/benchmarks/gpu/run_gpu_suite.py --device cuda --tag gpu_baseline

Human-readable single run:

bash scripts/run_pyc_bench_pretty.sh cuda 64 1024 5 2

Kernel-level Ada work via the kernel lab:

python3 kernels/lab/kernel_lab.py task-create ada-sm89-gemm --task-kind gemm --candidate-tag ada
python3 benchmark/benchmarks/gpu/run_gemm_suite.py \
    --matrix-file benchmark/benchmarks/gpu/configs/ada_fp32_gemm_shapes.json --dry-run

The adapter comparison covers torch_eager, torch_compile, pyc, tvm, xla, tensorrt, and glow. For non-PyTorch backends, point the corresponding env var (e.g. TVM_BENCH_CMD, TENSORRT_BENCH_CMD, PYC_GPU_BENCH_CMD) at a command that emits JSON. Full guide: docs/compiler-next/gpu-testing-playbook.md.

Documentation

Start with the documentation index. High-value entry points:

Topic	Document
System architecture & data flow	`docs/architecture/system-architecture.md`
Project overview & terminology	`docs/reference/project-overview.md`
Build & CI reference	`docs/reference/build-and-ci.md`
Benchmark methodology	`docs/reference/benchmarking.md`
Compiler-next overview	`docs/compiler-next/overview.md`
IR specification	`docs/compiler-next/ir-spec.md`
Pass pipeline	`docs/compiler-next/pass-pipeline.md`
Memory planner	`docs/compiler-next/runtime-memory-planner.md`
CUDA GEMM fast path	`docs/compiler-next/cuda-gemm-fast-path.md`
Project status	`docs/reports/project-status.md`

Project Status

Stable core — cross-platform CMake/link targets are in place and gated by deterministic CI.
Compiler-next — functional behind PYC_BUILD_COMPILER_NEXT=ON; covered by a deterministic test suite but not yet part of the stable CI guarantees. The stack spans deterministic guards, compile cache, speculative plans, phantom-graph tracking, runtime controller rails, rematerialization policy, kernel + allocator co-selection, and the Ada FP32 CUDA fast path.
Release binaries — published per OS by release-binaries.yml (pyc-linux-x86_64.tar.gz, pyc-macos-arm64.tar.gz, pyc-windows-x86_64.zip), with an OS-detecting download page at web/site/index.html.

Contributing

Contributions are welcome. Please read CONTRIBUTING.md for the workflow and validation checklist, docs/reference/repository-rules.md for layout guardrails, and CODE_OF_CONDUCT.md. Security issues: see SECURITY.md.

License

Licensed under the Apache License 2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PyC

Table of Contents

Why PyC

Architecture

Repository Layout

Quick Start

Build Targets

Continuous Integration

Using PyC

Link `pyc_core` into your project

Compiler-next public API

Benchmarking

GPU Benchmarking (Remote CUDA)

Documentation

Project Status

Contributing

License

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
.github		.github
apps/inference_api		apps/inference_api
artifacts/generated/hello_pyinstaller		artifacts/generated/hello_pyinstaller
benchmark		benchmark
docs		docs
examples/python		examples/python
include/pyc		include/pyc
infra		infra
kernels		kernels
python/pyc		python/pyc
scripts		scripts
src		src
tests		tests
tools/legacy_cli		tools/legacy_cli
web		web
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CMakeLists.txt		CMakeLists.txt
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Flamegraph_Example.svg		Flamegraph_Example.svg
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
SUPPORT.md		SUPPORT.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

PyC

Table of Contents

Why PyC

Architecture

Repository Layout

Quick Start

Build Targets

Continuous Integration

Using PyC

Link pyc_core into your project

Compiler-next public API

Benchmarking

GPU Benchmarking (Remote CUDA)

Documentation

Project Status

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Link `pyc_core` into your project

Packages