Hot/fix: Stack overflow crash when loading multiple translation units in parallel

## Summary

`coretrace-stack-analyzer` can crash when it analyzes a `compile_commands.json`
batch with more than one worker. The failure happens while the analyzer compiles
source files to LLVM IR through `compilerlib::compile(...)`, which runs Clang
frontend actions in-process.

The observed crash is not caused by CoreTrace CLI parsing or by cross-TU summary
logic. It is triggered by parallel module loading/compilation inside the stack
analyzer.

## Environment

- Platform: macOS arm64
- LLVM/Clang: Homebrew LLVM 20.1.2
- Binary: `ctrace`, linked against `libclang-cpp.dylib` and `libLLVM.dylib`
- Shell stack limit: `8176 KB`
- Hardware concurrency observed by the analyzer: `8`

## Reproduction

From the parent CoreTrace checkout that embeds `coretrace-stack-analyzer`:

```bash
./build/ctrace \
  --compile-commands=./build/compile_commands.json \
  --invoke ctrace_stack_analyzer \
  --config config/tool-config.json
```

With `stack_analyzer.jobs` unset/empty, the analyzer resolves to `jobs=auto` and
starts multiple workers.

The crash also reproduces with cross-TU disabled:

```json
{
  "stack_analyzer": {
    "jobs": "2",
    "resource_cross_tu": false,
    "uninitialized_cross_tu": false
  }
}
```

`jobs=2` is enough to reproduce. `jobs=1` completes successfully.

## Actual behavior

The process exits with a native crash shortly after:

```text
== CoreTrace == [INFO] Running specific tools on 16 file(s)
== CoreTrace == [INFO] Running CoreTrace Stack Analyzer on 16 files
bus error
```

or:

```text
illegal hardware instruction
```

Under `lldb`, the actual stop reason is an `EXC_BAD_ACCESS` in Clang Sema:

```text
* thread #4, stop reason = EXC_BAD_ACCESS (code=2, address=0x16ff1bb58)
frame #0: libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136
```

The failing instruction writes to the current stack:

```text
libclang-cpp.dylib`CheckConvertibilityForTypeTraits:
-> stp x21, x24, [sp, #0x28]
```

Registers at the crash:

```text
sp = 0x000000016ff1bb30
pc = libclang-cpp.dylib`CheckConvertibilityForTypeTraits(...) + 136
```

The faulting address is `sp + 0x28`, and the stack pointer is in an inaccessible
region:

```text
memory region $sp
[0x000000016ff18000-0x000000016ff1c000) ---
```

This points to a worker thread stack overflow while Clang is deeply instantiating
C++ templates.

## Expected behavior

The analyzer should either:

- complete analysis successfully, or
- report a per-translation-unit compilation/loading failure without crashing the
  hosting process.

`ctrace` should not be terminated by a native crash from an embedded analyzer
worker.

## Relevant code path

The CoreTrace bridge invokes the analyzer in-process:

```cpp
ctrace::stack::app::runAnalyzerApp(std::move(parseResult.parsed));
```

The analyzer schedules module loading in worker threads:

```cpp
runParallelWork(inputFilenames.size(), loadJobs,
                [&](std::size_t index) { loadSingleModule(index); });
```

Each worker calls:

```cpp
analysis::loadModuleForAnalysis(inputFilename, cfg, *moduleContext, localErr);
```

The input pipeline compiles non-IR inputs through `compilerlib`:

```cpp
return compilerlib::compile(compileArgs, outputMode);
```

`compilerlib` executes Clang frontend actions in-process, including:

```cpp
clang::EmitBCAction
clang::EmitLLVMAction
clang::EmitLLVMOnlyAction
```

## Verification results

The following matrix was observed:

| Configuration | Result |
| --- | --- |
| `jobs=auto`, cross-TU enabled | crashes |
| `jobs=auto`, cross-TU disabled | crashes |
| `jobs=2`, cross-TU disabled | crashes |
| `jobs=1`, cross-TU enabled | exits 0 |
| `jobs=1`, cross-TU disabled | exits 0 |

This isolates the failure to parallel in-process Clang compilation/loading, not
to cross-TU resource or uninitialized summary construction.

## Workaround

Set:

```json
{
  "stack_analyzer": {
    "jobs": "1"
  }
}
```

This serializes module loading/compilation and avoids the stack overflow in the
observed environment.

## Proposed fix direction

Avoid treating the same `jobs` setting as safe for in-process Clang compilation.
The current architecture is fast, but it gives Clang frontend crashes the same
blast radius as the analyzer process.

Recommended direction:

1. Introduce a dedicated module loading / compile execution policy.
2. Serialize source-to-IR compilation when using in-process `compilerlib`.
3. Keep parallelism for analysis phases that operate on already-loaded modules.
4. Prefer subprocess isolation for Clang compilation as the robust long-term
   path. If a subprocess crashes, the analyzer can report a failed TU instead of
   crashing the parent process.

Increasing worker thread stack size via platform-specific thread attributes can
reduce this specific crash, but it is less robust than isolating Clang
compilation or serializing the in-process frontend. The issue is generic: any
template-heavy translation unit can exceed a worker stack when compiled
in-process.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hot/fix: Stack overflow crash when loading multiple translation units in parallel #74

Summary

Environment

Reproduction

Actual behavior

Expected behavior

Relevant code path

Verification results

Workaround

Proposed fix direction

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Configuration	Result
`jobs=auto`, cross-TU enabled	crashes
`jobs=auto`, cross-TU disabled	crashes
`jobs=2`, cross-TU disabled	crashes
`jobs=1`, cross-TU enabled	exits 0
`jobs=1`, cross-TU disabled	exits 0

Hot/fix: Stack overflow crash when loading multiple translation units in parallel #74

Description

Summary

Environment

Reproduction

Actual behavior

Expected behavior

Relevant code path

Verification results

Workaround

Proposed fix direction

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions