Skip to content

Hot/fix: Stack overflow crash when loading multiple translation units in parallel #74

@SizzleUnrlsd

Description

@SizzleUnrlsd

Summary

coretrace-stack-analyzer can crash when it analyzes a compile_commands.json
batch with more than one worker. The failure happens while the analyzer compiles
source files to LLVM IR through compilerlib::compile(...), which runs Clang
frontend actions in-process.

The observed crash is not caused by CoreTrace CLI parsing or by cross-TU summary
logic. It is triggered by parallel module loading/compilation inside the stack
analyzer.

Environment

  • Platform: macOS arm64
  • LLVM/Clang: Homebrew LLVM 20.1.2
  • Binary: ctrace, linked against libclang-cpp.dylib and libLLVM.dylib
  • Shell stack limit: 8176 KB
  • Hardware concurrency observed by the analyzer: 8

Reproduction

From the parent CoreTrace checkout that embeds coretrace-stack-analyzer:

./build/ctrace \
  --compile-commands=./build/compile_commands.json \
  --invoke ctrace_stack_analyzer \
  --config config/tool-config.json

With stack_analyzer.jobs unset/empty, the analyzer resolves to jobs=auto and
starts multiple workers.

The crash also reproduces with cross-TU disabled:

{
  "stack_analyzer": {
    "jobs": "2",
    "resource_cross_tu": false,
    "uninitialized_cross_tu": false
  }
}

jobs=2 is enough to reproduce. jobs=1 completes successfully.

Actual behavior

The process exits with a native crash shortly after:

== CoreTrace == [INFO] Running specific tools on 16 file(s)
== CoreTrace == [INFO] Running CoreTrace Stack Analyzer on 16 files
bus error

or:

illegal hardware instruction

Under lldb, the actual stop reason is an EXC_BAD_ACCESS in Clang Sema:

* thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x16ff1bb58)
frame #0: libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136

The failing instruction writes to the current stack:

libclang-cpp.dylib`CheckConvertibilityForTypeTraits:
-> stp x21, x24, [sp, #0x28]

Registers at the crash:

sp = 0x000000016ff1bb30
pc = libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136

The faulting address is sp + 0x28, and the stack pointer is in an inaccessible
region:

memory region $sp
[0x000000016ff18000-0x000000016ff1c000) ---

This points to a worker thread stack overflow while Clang is deeply instantiating
C++ templates.

Expected behavior

The analyzer should either:

  • complete analysis successfully, or
  • report a per-translation-unit compilation/loading failure without crashing the
    hosting process.

ctrace should not be terminated by a native crash from an embedded analyzer
worker.

Relevant code path

The CoreTrace bridge invokes the analyzer in-process:

ctrace::stack::app::runAnalyzerApp(std::move(parseResult.parsed));

The analyzer schedules module loading in worker threads:

runParallelWork(inputFilenames.size(), loadJobs,
                [&](std::size_t index) { loadSingleModule(index); });

Each worker calls:

analysis::loadModuleForAnalysis(inputFilename, cfg, *moduleContext, localErr);

The input pipeline compiles non-IR inputs through compilerlib:

return compilerlib::compile(compileArgs, outputMode);

compilerlib executes Clang frontend actions in-process, including:

clang::EmitBCAction
clang::EmitLLVMAction
clang::EmitLLVMOnlyAction

Verification results

The following matrix was observed:

Configuration Result
jobs=auto, cross-TU enabled crashes
jobs=auto, cross-TU disabled crashes
jobs=2, cross-TU disabled crashes
jobs=1, cross-TU enabled exits 0
jobs=1, cross-TU disabled exits 0

This isolates the failure to parallel in-process Clang compilation/loading, not
to cross-TU resource or uninitialized summary construction.

Workaround

Set:

{
  "stack_analyzer": {
    "jobs": "1"
  }
}

This serializes module loading/compilation and avoids the stack overflow in the
observed environment.

Proposed fix direction

Avoid treating the same jobs setting as safe for in-process Clang compilation.
The current architecture is fast, but it gives Clang frontend crashes the same
blast radius as the analyzer process.

Recommended direction:

  1. Introduce a dedicated module loading / compile execution policy.
  2. Serialize source-to-IR compilation when using in-process compilerlib.
  3. Keep parallelism for analysis phases that operate on already-loaded modules.
  4. Prefer subprocess isolation for Clang compilation as the robust long-term
    path. If a subprocess crashes, the analyzer can report a failed TU instead of
    crashing the parent process.

Increasing worker thread stack size via platform-specific thread attributes can
reduce this specific crash, but it is less robust than isolating Clang
compilation or serializing the in-process frontend. The issue is generic: any
template-heavy translation unit can exceed a worker stack when compiled
in-process.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions