Skip to content

cuda.core: add GraphBuilder.graph_definition property#2026

Open
Andy-Jost wants to merge 9 commits intoNVIDIA:mainfrom
Andy-Jost:graph-step-3
Open

cuda.core: add GraphBuilder.graph_definition property#2026
Andy-Jost wants to merge 9 commits intoNVIDIA:mainfrom
Andy-Jost:graph-step-3

Conversation

@Andy-Jost
Copy link
Copy Markdown
Contributor

Summary

Completes step 3 of #1330 by exposing the captured graph as an explicit GraphDefinition view that shares ownership of the same graph the builder is producing. The handle-layer plumbing landed in #2008; this PR wires up the user-facing surface and the state-guard rules.

The new property unlocks two hybrid flows:

  • Inspect or augment a captured graph through the explicit API after end_building(), then re-complete() to pick up the changes.
  • Populate a conditional body (returned by if_then / if_else / while_loop / switch) entirely through the explicit API without ever calling begin_building() on it.

API addition

GraphBuilder.graph_definition: GraphDefinition (read-only property)

Availability rules:

Builder kind Before begin_building While building After end_building
Primary RuntimeError RuntimeError OK
Conditional body OK RuntimeError OK
Forked RuntimeError RuntimeError RuntimeError

The returned GraphDefinition is a view, not an owning wrapper: nodes added through it appear in subsequent complete() and debug_dot_print() calls on the builder.

Test plan

Nine new tests in test_graph_builder.py:

  • Happy path: graph_definition returns a GraphDefinition after end_building() and reflects the captured nodes.
  • Three error paths: forked builder, primary pre-begin_building, any builder mid-capture.
  • Shared validity: the returned GraphDefinition keeps working after the builder is closed.
  • Round-trip: mutate via the explicit API after capture, complete(), and run end-to-end on a stream.
  • Conditional-body hybrid flow: populate the body entirely through the explicit API and run.
  • Conditional-body augmentation flow: capture into the body, then add extra nodes through the explicit view, and run.
  • Conditional-body during-capture rejection.

Local pre-commit clean. Local test run on a 2-GPU machine: all cuda_core/tests/graph pass.

Related

Made with Cursor

Andy-Jost and others added 8 commits May 1, 2026 14:52
…hine

Refactor GraphBuilder from a Python class using _MembersNeededForFinalize
to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY)
and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves
into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle
from _resource_handles instead of holding raw driver objects. Drop the
is_stream_owner flag now that StreamHandle owns the lifetime.

End-capture paths in __dealloc__ and close guard on _h_stream so cleanup
is safe even if _init* fails before completing assignment.

Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to
GraphHandle) wrapping CUgraphExec with RAII cleanup via
cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python
class using _MembersNeededForFinalize to a cdef class holding a typed
_h_graph_exec attribute, dropping the weakref.finalize machinery.
update/upload/launch move to nogil cydriver paths consistent with the
GraphBuilder rewrite.

Also drop quoted forward-reference annotations on create_graph_builder
and _instantiate_graph/complete now that GraphBuilder is cimported in
_device.pyx and _stream.pyx and Cython accepts the in-module forward
reference to Graph. Clears the related "Strings should no longer be
used for type declarations" warnings.

Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does
not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle.

Made-with: Cursor
… cycle

cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made
Cython emit a Python-level import of cuda.core.graph._graph_builder
during _stream module init. That triggered the chain
graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer
-> _device, which then re-entered the still-initializing _stream module
via "from cuda.core._stream import IsStreamT", failing with
ImportError: cannot import name IsStreamT.

Restore the original lazy "import GraphBuilder" inside
create_graph_builder (Stream and Device) and Stream_accept. The return
annotations stay as bare names; "from __future__ import annotations" in
both files defers their evaluation, so they need not resolve at
function-definition time.

Made-with: Cursor
The previous import-cycle fix changed _stream/_device.create_graph_builder
to a lazy Python "import GraphBuilder" instead of a module-level cimport.
With _init declared as @staticmethod cdef, Python attribute lookup
cannot find it, so every test that builds a graph failed with
"AttributeError: type object 'GraphBuilder' has no attribute '_init'"
at _device.pyx:1376 / _stream.pyx:376.

Convert _init from @staticmethod cdef to @staticmethod def (matches the
Stream._init pattern) and drop the cdef declaration from the .pxd.
_init runs once per builder creation, so the loss of cdef-level
dispatch is irrelevant. Graph._init stays cdef; it is only called
intra-module.

Made-with: Cursor
Every graph-builder test failed with CUDA_ERROR_INVALID_VALUE on the
new ``GraphBuilder.begin_building`` path. The driver rejects
``cuStreamGetCaptureInfo`` when ``captureStatus_out`` is NULL, but the
new ``_get_capture_info`` helper accepted a NULL status pointer and
``begin_building`` was calling it that way (it just wanted the freshly
captured graph handle and assumed the status was implied by the
preceding ``cuStreamBeginCapture``).

Pass a stack-local ``CUstreamCaptureStatus`` and document the helper's
requirement that ``status`` be non-NULL. ``graph`` is still allowed to
be NULL (``is_building`` calls it that way and the driver accepts it).

Co-authored-by: Cursor <cursoragent@cursor.com>
@Andy-Jost Andy-Jost added this to the cuda.core v1.0.0 milestone May 5, 2026
@Andy-Jost Andy-Jost added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels May 5, 2026
@Andy-Jost Andy-Jost self-assigned this May 5, 2026
@Andy-Jost
Copy link
Copy Markdown
Contributor Author

The builds on #2008. Commits unique to this PR begin at de85b92

Completes step 3 of NVIDIA#1330 by exposing the captured graph as an explicit
`GraphDefinition` view that shares ownership of the underlying `CUgraph`.
The handle-layer plumbing landed in PR NVIDIA#2008; this commit wires up the
user-facing surface and locks in the state-guard rules.

State semantics:

- PRIMARY builder: only valid after `end_building()`. Before
  `begin_building()` no graph exists; during capture the driver is the
  sole writer, so explicit access is unsafe.
- CONDITIONAL_BODY builder: valid both before `begin_building()` (the
  body graph is allocated at conditional-node creation time) and after
  `end_building()`. This enables a hybrid flow where a conditional body
  is populated entirely via the explicit API, with no capture at all.
- FORKED builder: never valid. Forked builders share the primary's
  graph; access through the primary instead.

Tests cover the happy path, both hybrid flows on conditional bodies
(populate-via-explicit-API and capture-then-augment), the three error
states (forked, capturing, primary pre-capture), and the
shared-ownership guarantee (the `GraphDefinition` survives the
builder's `close()`).

Co-authored-by: Cursor <cursoragent@cursor.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant