Skip to content

gh-149342: _remote_debugging: Fix binary profile corruption when sampling a (temporarily) empty stack#149343

Merged
pablogsal merged 12 commits intopython:mainfrom
maurycy:remote-debugger-empty-stack-exc
May 5, 2026
Merged

gh-149342: _remote_debugging: Fix binary profile corruption when sampling a (temporarily) empty stack#149343
pablogsal merged 12 commits intopython:mainfrom
maurycy:remote-debugger-empty-stack-exc

Conversation

@maurycy
Copy link
Copy Markdown
Contributor

@maurycy maurycy commented May 4, 2026

My understanding is that an empty stack is a bit racy but perfectly valid?

Closes #149342

&shared_count, &pop_count, &push_count);

if (encoding == STACK_REPEAT && !is_new_thread) {
if (encoding == STACK_REPEAT) {
Copy link
Copy Markdown
Contributor Author

@maurycy maurycy May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's safe to drop because compare_stacks would return STACK_REPEAT only if the prev stack was empty, too; otherwise it'd give STACK_FULL...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should as well not emit this at all in the profiler frontend code

Copy link
Copy Markdown
Contributor Author

@maurycy maurycy May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pablogsal

Do you mean something like?

diff --git c/Modules/_remote_debugging/module.c i/Modules/_remote_debugging/module.c
index df3ade371de..7cd28540efc 100644
--- c/Modules/_remote_debugging/module.c
+++ i/Modules/_remote_debugging/module.c
@@ -646,6 +646,15 @@ _remote_debugging_RemoteUnwinder_get_stack_trace_impl(RemoteUnwinderObject *self
                 goto exit;
             }
 
+            PyObject *thread_frames = PyStructSequence_GET_ITEM(frame_info, 2);
+            if (PyList_GET_SIZE(thread_frames) == 0) {
+                Py_DECREF(frame_info);
+                if (self->tstate_addr || self->only_active_thread) {
+                    break;
+                }
+                continue;
+            }
+
             if (PyList_Append(interpreter_threads, frame_info) == -1) {
                 Py_DECREF(frame_info);
                 Py_DECREF(interpreter_threads);

This looks like a band aid, but it's clean?

Perhaps the fix should be deeper... I can take a stab but couldn’t figure out anything as clean, only (a result of me poking around without deeper understanding, and I trust no LLM to do proper reviews in deeper circles of hell):

diff --git c/Modules/_remote_debugging/module.c i/Modules/_remote_debugging/module.c
index df3ade371de..c53b6eafa9c 100644
--- c/Modules/_remote_debugging/module.c
+++ i/Modules/_remote_debugging/module.c
@@ -622,10 +622,8 @@ _remote_debugging_RemoteUnwinder_get_stack_trace_impl(RemoteUnwinderObject *self
                                                            gc_frame,
                                                            main_thread_tstate);
             if (!frame_info) {
-                // Check if this was an intentional skip due to mode-based filtering
-                if ((self->mode == PROFILING_MODE_CPU || self->mode == PROFILING_MODE_GIL ||
-                     self->mode == PROFILING_MODE_EXCEPTION) && !PyErr_Occurred()) {
-                    // Detect cycle: if current_tstate didn't advance, we have corrupted data
+                if (!PyErr_Occurred()) {
+                    // Intentional skip: mode filtering or no Python frames.
                     if (current_tstate == prev_tstate) {
                         Py_DECREF(interpreter_threads);
                         PyErr_Format(PyExc_RuntimeError,
@@ -636,7 +634,9 @@ _remote_debugging_RemoteUnwinder_get_stack_trace_impl(RemoteUnwinderObject *self
                         Py_CLEAR(result);
                         goto exit;
                     }
-                    // Thread was skipped due to mode filtering, continue to next thread
+                    if (self->tstate_addr || self->only_active_thread) {
+                        break;
+                    }
                     continue;
                 }
                 // This was an actual error
diff --git c/Modules/_remote_debugging/threads.c i/Modules/_remote_debugging/threads.c
index e303c667ea0..cb8b35c7caa 100644
--- c/Modules/_remote_debugging/threads.c
+++ i/Modules/_remote_debugging/threads.c
@@ -467,6 +467,12 @@ unwind_stack_for_thread(
 
     *current_tstate = GET_MEMBER(uintptr_t, ts, unwinder->debug_offsets.thread_state.next);
 
+    if (PyList_GET_SIZE(frame_info) == 0) {
+        Py_DECREF(frame_info);
+        cleanup_stack_chunks(&chunks);
+        return NULL;
+    }
+
     thread_id = PyLong_FromLongLong(tid);
     if (thread_id == NULL) {
         set_exception_cause(unwinder, PyExc_RuntimeError, "Failed to create thread ID");

This whole issue makes the case from PyCon even stronger.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I was referring in the Python side so we never call the writer if we have an empty frame.

Copy link
Copy Markdown
Contributor Author

@maurycy maurycy May 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pablogsal 83b8f8e

It skips the whole sample but it's a hot path so I want to avoid elaborate filtering.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community Bot commented May 4, 2026

Comment thread Modules/_remote_debugging/module.c Outdated
@maurycy maurycy requested a review from pablogsal May 4, 2026 11:07
Comment thread Modules/_remote_debugging/module.c
@pablogsal pablogsal enabled auto-merge (squash) May 4, 2026 22:27
Copy link
Copy Markdown
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a bunch of fixes but LGTM

@maurycy great job! 💪

maurycy and others added 12 commits May 5, 2026 01:01
```
Objects/descrobject.c:194:16: runtime error: call to function BinaryWriter_get_total_samples through pointer to incorrect function type 'struct _object *(*)(struct _object *, void *)'
/home/maurycy/cpython/./Modules/_remote_debugging/module.c:1660: note: BinaryWriter_get_total_samples defined here
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Objects/descrobject.c:194:16
```

https://github.com/python/cpython/actions/runs/25297636845/job/74159079502
@pablogsal pablogsal force-pushed the remote-debugger-empty-stack-exc branch from 19586a0 to 913b5c7 Compare May 5, 2026 00:04
@pablogsal pablogsal merged commit 6f8c964 into python:main May 5, 2026
55 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

_remote_debugging: Invalid RLE count XXX exceeds maximum possible XXX for remaining data

2 participants