Skip to content

InfiniTensor/ntops.lab

Repository files navigation

ntops.lab

ntops.lab is an experimental operator lab for NineToothed. It collects generated and hand-shaped NineToothed kernels, keeps a runnable operator catalog, and records the current gaps that need compiler/runtime support.

The project is intentionally a lab: many kernels are produced with LLM-assisted operator development, then smoke-tested against PyTorch on CUDA.

Current Coverage

The canonical catalog contains:

  • 246 runnable NineToothed implementations
  • 0 unsupported/scaffold source files included in the commit-ready tree
  • gap analysis retained in documentation only

Primary categories:

  • pointwise
  • reduction
  • linear
  • layout
  • creation
  • normalization
  • fused/general
  • fused/fla

See docs/operator-coverage.md for the full list.

Repository Layout

src/ntops_lab/
  catalog.py                 # manifest loading and query helpers
  cli.py                     # ntops-lab CLI
  operator_manifest.json     # canonical runnable operator catalog
  kernels/                   # NineToothed kernel implementations
  ops/                       # PyTorch-facing callable wrappers
  testing/specs/             # input generation and PyTorch references
docs/
  operator-coverage.md       # runnable coverage
  runnable-operators.md      # runnable operator list
  unrunnable-operators.md    # excluded unsupported operator notes
  support-analysis.md        # missing capability analysis
tests/
  test_catalog.py            # CPU-only metadata tests
scripts/
  run_operator.py            # run one operator's check()
  check_manifest.py          # validate manifest/file consistency

Install

For metadata inspection and CPU-only tests:

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

For running kernels, use an environment that already has CUDA, PyTorch, and NineToothed installed:

pip install -e ".[dev]"

NineToothed and PyTorch are kept as optional GPU dependencies because this repository is often inspected on machines without a GPU.

Use The Catalog

ntops-lab summary
ntops-lab list --status done
ntops-lab list --category reduction

From Python:

from ntops_lab import list_operators, runnable_operators

print(len(runnable_operators()))
print(list_operators(category="linear"))

Use Operators From PyTorch

Operator wrappers accept and return torch.Tensor objects while dispatching to the underlying NineToothed kernels:

import torch
from ntops_lab import ops

x = torch.randn(1024, device="cuda")
y = torch.randn(1024, device="cuda")
z = ops.add(x, y)

mm = ops.get_op("mm")
out = mm(torch.randn(32, 32, device="cuda", dtype=torch.float16),
         torch.randn(32, 32, device="cuda", dtype=torch.float16))

Run A Kernel Smoke Test

On a CUDA machine with NineToothed available:

ntops-lab check add
python scripts/run_operator.py softmax

Kernel modules expose run; test specs expose:

make_inputs()
run(...)
run_pytorch(...)
check()

check() generates representative inputs, runs the NineToothed kernel, runs the PyTorch reference, and compares results.

Run Repository Tests

CPU-only catalog checks:

python -m pytest
python scripts/check_manifest.py
python -m compileall -q src scripts tests

GPU smoke checks should be run selectively:

ntops-lab check add
ntops-lab check mm
ntops-lab check softmax

Running every GPU check can take a while because many operators compile kernels on first use.

Compilation Cache

NineToothed and Triton already use content-addressed disk caches. Generated NineToothed source is stored under ~/.ninetoothed, while compiled Triton artifacts are stored under ${TRITON_CACHE_DIR:-~/.triton/cache}. Re-running an operator in a new Python process reuses these artifacts when the generated kernel source and compilation environment are unchanged.

Inspect the current caches:

ntops-lab cache status

Precompile and validate selected kernels before a test or benchmark session:

ntops-lab cache warm softmax mm
ntops-lab cache warm --category linear
ntops-lab cache warm --all

Changing a kernel implementation or its NineToothed compilation configuration changes the generated-source hash and causes the affected kernel to compile again. Unchanged kernels continue to reuse their cached artifacts.

What Is Not Included Yet?

Unsupported scaffold source files are intentionally excluded from this commit-ready tree. Remaining gaps mostly need one of these capabilities:

  • complex manual layout/indexing
  • dynamic gather/scatter, cache writes, or masked updates
  • top-k/sort/histogram/atomic primitives
  • cross-tile scan, recurrent state, or triangular solve
  • RNG/stateful random distribution support
  • normalization backward or running-stat APIs

See docs/support-analysis.md.

Project Philosophy

This is a practical operator workshop:

  • keep each operator in its own file
  • keep PyTorch references close to implementations
  • make every runnable operator independently smoke-testable
  • record why excluded operators are blocked
  • prefer clear templates over pretending incomplete kernels are done

License

Standard MIT License. The MIT license is not versioned, so the repository uses the canonical MIT text and SPDX-compatible MIT package metadata.

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors