cuda.core: add GraphBuilder.graph_definition property#2026
Open
Andy-Jost wants to merge 9 commits intoNVIDIA:mainfrom
Open
cuda.core: add GraphBuilder.graph_definition property#2026Andy-Jost wants to merge 9 commits intoNVIDIA:mainfrom
Andy-Jost wants to merge 9 commits intoNVIDIA:mainfrom
Conversation
…hine Refactor GraphBuilder from a Python class using _MembersNeededForFinalize to a cdef class with explicit _BuilderKind (PRIMARY/FORKED/CONDITIONAL_BODY) and _CaptureState (NOT_STARTED/CAPTURING/ENDED) tracking. Cleanup moves into __dealloc__/close, and the builder now uses GraphHandle/StreamHandle from _resource_handles instead of holding raw driver objects. Drop the is_stream_owner flag now that StreamHandle owns the lifetime. End-capture paths in __dealloc__ and close guard on _h_stream so cleanup is safe even if _init* fails before completing assignment. Made-with: Cursor
Add a GraphExecHandle to the resource-handle layer (parallel to GraphHandle) wrapping CUgraphExec with RAII cleanup via cuGraphExecDestroy on shared_ptr release. Convert Graph from a Python class using _MembersNeededForFinalize to a cdef class holding a typed _h_graph_exec attribute, dropping the weakref.finalize machinery. update/upload/launch move to nogil cydriver paths consistent with the GraphBuilder rewrite. Also drop quoted forward-reference annotations on create_graph_builder and _instantiate_graph/complete now that GraphBuilder is cimported in _device.pyx and _stream.pyx and Cython accepts the in-module forward reference to Graph. Clears the related "Strings should no longer be used for type declarations" warnings. Made-with: Cursor
The cdef-class member declarations live in the .pxd, so the .pyx does not need to re-cimport GraphExecHandle, GraphHandle, or StreamHandle. Made-with: Cursor
… cycle cimport-ing GraphBuilder at the top of _stream.pyx and _device.pyx made Cython emit a Python-level import of cuda.core.graph._graph_builder during _stream module init. That triggered the chain graph -> _graph_node -> _kernel_arg_handler -> _memory._buffer -> _device, which then re-entered the still-initializing _stream module via "from cuda.core._stream import IsStreamT", failing with ImportError: cannot import name IsStreamT. Restore the original lazy "import GraphBuilder" inside create_graph_builder (Stream and Device) and Stream_accept. The return annotations stay as bare names; "from __future__ import annotations" in both files defers their evaluation, so they need not resolve at function-definition time. Made-with: Cursor
The previous import-cycle fix changed _stream/_device.create_graph_builder to a lazy Python "import GraphBuilder" instead of a module-level cimport. With _init declared as @staticmethod cdef, Python attribute lookup cannot find it, so every test that builds a graph failed with "AttributeError: type object 'GraphBuilder' has no attribute '_init'" at _device.pyx:1376 / _stream.pyx:376. Convert _init from @staticmethod cdef to @staticmethod def (matches the Stream._init pattern) and drop the cdef declaration from the .pxd. _init runs once per builder creation, so the loss of cdef-level dispatch is irrelevant. Graph._init stays cdef; it is only called intra-module. Made-with: Cursor
Every graph-builder test failed with CUDA_ERROR_INVALID_VALUE on the new ``GraphBuilder.begin_building`` path. The driver rejects ``cuStreamGetCaptureInfo`` when ``captureStatus_out`` is NULL, but the new ``_get_capture_info`` helper accepted a NULL status pointer and ``begin_building`` was calling it that way (it just wanted the freshly captured graph handle and assumed the status was implied by the preceding ``cuStreamBeginCapture``). Pass a stack-local ``CUstreamCaptureStatus`` and document the helper's requirement that ``status`` be non-NULL. ``graph`` is still allowed to be NULL (``is_building`` calls it that way and the driver accepts it). Co-authored-by: Cursor <cursoragent@cursor.com>
Contributor
Author
Completes step 3 of NVIDIA#1330 by exposing the captured graph as an explicit `GraphDefinition` view that shares ownership of the underlying `CUgraph`. The handle-layer plumbing landed in PR NVIDIA#2008; this commit wires up the user-facing surface and locks in the state-guard rules. State semantics: - PRIMARY builder: only valid after `end_building()`. Before `begin_building()` no graph exists; during capture the driver is the sole writer, so explicit access is unsafe. - CONDITIONAL_BODY builder: valid both before `begin_building()` (the body graph is allocated at conditional-node creation time) and after `end_building()`. This enables a hybrid flow where a conditional body is populated entirely via the explicit API, with no capture at all. - FORKED builder: never valid. Forked builders share the primary's graph; access through the primary instead. Tests cover the happy path, both hybrid flows on conditional bodies (populate-via-explicit-API and capture-then-augment), the three error states (forked, capturing, primary pre-capture), and the shared-ownership guarantee (the `GraphDefinition` survives the builder's `close()`). Co-authored-by: Cursor <cursoragent@cursor.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes step 3 of #1330 by exposing the captured graph as an explicit
GraphDefinitionview that shares ownership of the same graph the builder is producing. The handle-layer plumbing landed in #2008; this PR wires up the user-facing surface and the state-guard rules.The new property unlocks two hybrid flows:
end_building(), then re-complete()to pick up the changes.if_then/if_else/while_loop/switch) entirely through the explicit API without ever callingbegin_building()on it.API addition
GraphBuilder.graph_definition: GraphDefinition(read-only property)Availability rules:
begin_buildingend_buildingThe returned
GraphDefinitionis a view, not an owning wrapper: nodes added through it appear in subsequentcomplete()anddebug_dot_print()calls on the builder.Test plan
Nine new tests in
test_graph_builder.py:graph_definitionreturns aGraphDefinitionafterend_building()and reflects the captured nodes.begin_building, any builder mid-capture.GraphDefinitionkeeps working after the builder is closed.complete(), and run end-to-end on a stream.Local pre-commit clean. Local test run on a 2-GPU machine: all
cuda_core/tests/graphpass.Related
graph-builder-refactor); branched off that PR's head, plusorigin/main(so reviewers see only this commit).Made with Cursor