span bridges a LaTeX mathematics paper and its Lean 4 formalization. It
indexes raw Lean declarations, parses the paper's standard \label{...} commands, and
uses a ledger (build/<paper>.lea, JSON) to align paper objects
and Lean declarations. The ledger records, for every object, which Lean declarations realize it and whether they are formalized.
Optional span: comments in Lean can provide hints, but they are not required:
the core model is raw Lean facts + paper objects + a curated ledger. For
compatibility, span can also read a
leanblueprint content.tex.
It is both a CLI and a Python library.
<project>/
latex/ main.tex # paper, indexed by \label{...}
lean/ *.lean # Lean source; span indexes declarations directly
build/ <paper>.lea # curated/generated ledger (+ spines)
span_notes.json # optional per-entry notes, keyed by align id
span auto-detects the project root (the nearest ancestor with latex/ and
lean/); override with --project/--paper/--lean-source/--blueprint/--out.
--paper, --lean-source, and --blueprint are repeatable. With no explicit
Lean input, span indexes lean/ directly and, when build/<paper>.lea
already exists, uses that ledger as the curated alignment.
span build # reconcile -> build/main.lea (+ spines) and run checks
span init-ledger # create a starter ledger from paper ids and raw Lean declarations
span build --lean-source lean
# preferred: index raw Lean declarations directly
span build --lean-source lean --ledger build/main.lea
# validate/update using an existing curated alignment ledger
span build --full-lean-coverage
# include every unaligned raw Lean declaration as a lean half-span
span build --paper latex/main.tex --paper latex/appendix.tex
# reconcile a corpus of multiple labeled paper roots
span build --blueprint lean/blueprint/src/content.tex --blueprint other/content.tex
# reconcile against multiple leanblueprint roots
span build --latex # also run a pdflatex compile check (C8)
span build --decls # also check named lean decls exist in source (C13)
span check [--strict] # run the verification suite (C1-C15); nonzero exit on errors
span check --decls # add C13: cross-check \lean{...} names against the Lean source
span check --deps # add C14: cross-check \uses vs the elaborator's statement deps (runs Lean)
span check --deps --deps-strict # make missing dependency edges hard errors
span compare --ledger build/main.lea
# print the curated paper<->Lean alignment table
span compare --against build/main.lea # diff the entry set against a saved ledger
span stats [--json] # statistics from the paper, Lean spine, and alignment
span graph [--overlay] [--numbered] [--statements] [--source paper|lean|both]
[--render] [--dot g.dot]
# Lean dependency graph (Graphviz), filled by kind.
# --overlay: paint the paper side on; --numbered: paper reading-order
# numbers; --statements: label nodes with the human statement;
# --source: take that statement from the paper, the Lean spine, or
# both stacked for comparison (default paper); --render: typeset with LaTeX
span order # paper statement order vs a dependency-respecting order (C12 detail)
span lean | span paper # emit either spine as JSON
span vocabulary [--json] # print the controlled vocabulary: kinds, statuses, checks
span --version
The directory examples/one_theorem/ is a complete ledger-first
project aligning a human-readable proof the 1=1 with a Lean 4 formalization of this statement.:
examples/one_theorem/
latex/main.tex
lean/One.lean
build/main.lea
From that directory, the normal workflow is:
span build
span check
span compare
span stats
span graph --overlay --statements --dot build/one.dotTo initialize the ledger from the paper and raw Lean source:
rm build/main.lea
span init-ledgerThe paper label names the paper object:
\begin{theorem}[One equals one]\label{thm:one}
\(1=1\).
\end{theorem}The ledger supplies the Lean declaration:
"lean_decls": ["One.one_eq_one"]For corpus builds, the output ledger is build/corpus.lea unless --out points
elsewhere. The paper side is the coproduct of the input paper summands, the Lean
side is the coproduct of the input Lean summands, and the ledger records the
alignment span between them. If a ledger entry needs to point to a Lean object
whose id is duplicated across Lean summands, qualify the ref with the source path
recorded by span, the basename, or the basename stem. Ambiguous unqualified
refs are reported by C15 as ambiguity across coproduct summands.
Paper labels identify paper objects:
\begin{theorem}[One equals one]\label{thm:one}
...
\end{theorem}The ledger supplies the Lean declarations:
{
"id": "align:one",
"kind": "theorem",
"title": "One equals one",
"paper_ids": ["thm:one"],
"lean_decls": ["One.one_eq_one"]
}span check --lean-source lean --ledger build/main.lea then deterministically
checks that the named paper objects and Lean declarations still exist and remain
kind-compatible. The ledger can be created by hand, by an LLM-assisted matching
workflow, or by starting from blueprint output and editing it.
Direct Lean mode also understands inert span: comments immediately before
declarations. These are optional hints; they are not required when a ledger
supplies the alignment.
Single-line form:
-- span: id=thm:one kind=theorem title="One equals one" uses={def:one}
theorem one_eq_one ... := by
...Doc-comment form:
/-- span:
id=thm:one
kind=theorem
title=One equals one
uses=def:one
proof_uses=lem:reflexivity
statement=The distinguished object equals itself.
-/
theorem one_eq_one ... := by
...Supported keys are id, kind, title, uses, proof_uses, and
statement. The declaration itself supplies the fully-qualified Lean
declaration name. If no span: comment is present, the declaration is still
indexed under its fully-qualified name.
import span
lean = span.parse_lean_sources(["lean"])
paper = span.parse_paper("latex/main.tex")
entries = span.reconcile(lean, paper)
ledger = span.build_ledger_dict(lean, paper, entries, "main.tex", "lean")
for f in span.run_checks(lean, paper, entries):
print(f.severity, f.code, f.message)Recognized paper/Lean spine kinds are:
definition, lemma, theorem, proposition, corollary, claim, conjecture,
remark, example, notation, convention, assumption, constant
constant is normally a paper-side label kind. Run span vocabulary for the
full compatibility table, span statuses, and check families.
The span tool is, at the top level, the one span that bridges the paper side and the Lean side:
Paper <--- span ---> Lean
That global span is assembled from one small span per aligned concept. The
structure is self-similar, not a name clash: you run span (the tool) and
read spans (the entries) in the ledger (.lea) it writes.
- Each ledger entry is an alignment span: an apex (the aligned concept) with
a paper leg and a lean leg. The canonical
span_statusis the span's shape:coherent,paper_half,lean_half, orincoherent. - The checks are grouped by alignment family, shown in every report:
object (well-defined and kind-preserving: C1-C4, C8, C10, C13),
arrow (
\usesdependencies: C9, C14), coherence (dependency order: C12), coverage (paper and Lean half-spans: C6, C7), and fiber (shared formalizations and fan-out: C5). - Fan-out is a fiber (
largest_fiberinspan stats), andbuild/compare/statsprint an alignment verdict naming each obstruction.
C1 id uniqueness · C2 dangling refs · C3 kind compatibility · C4 stated results
in labeled environments · C5 duplicate/shared formalizations · C6 lean half-span
coverage · C7 partial formalization · C8 LaTeX compile check · C9 Lean
dependency integrity · C10 compatible-kind notes · C12 statement order is a
linear extension of the dependency DAG · C13 (opt-in, --decls) named
\lean{...} decls exist in the Lean source with a matching kind · C14 (opt-in,
--deps) declared \uses matches the elaborator's statement dependencies · C15
corpus id ambiguity/source qualification. See CHECKS.md.
C13 is the strongest machine-checkable paper<->Lean consistency check that
needs no proof-assistant run and no LLM: a heuristic source scan confirms every
named declaration exists and that a definition node is backed by a def (a
theorem node by a theorem/lemma).
C14 goes one level deeper by actually running Lean (lake env lean): it
traverses the constants each declaration's statement (type) references --
through project-internal helpers, stopping at spine-owned decls -- to build
the true statement-level dependency graph, and reports edges the spine omits
(warn, or error with --deps-strict) or declares without type-level support
(info). It needs the Lean toolchain and a built project. Scope: statement
dependencies only -- Lean drops theorem proof terms from oleans, so proof-\uses
cannot be validated this way.
Neither check can verify that the two statements mean the same thing -- that semantic faithfulness is the specification gap, which only a human or (heuristically) an LLM can judge.