Conversation
Run GCC -fanalyzer on the project's internal sources, scoped via two new OBJECT libraries so external/ third-party code is not analyzed. Requires GCC >= 13 for both C and C++ to cover the C++ frontend; configuration fails fast otherwise. Enabling the option surfaces -fanalyzer diagnostics across the internal sources; builds are not warning-clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run clang-tidy with the clang-analyzer-* check group on the project's internal sources via CMake's <LANG>_CLANG_TIDY target property; scope matches ENABLE_GCC_ANALYZER (external/ third-party code excluded). The configured C/C++ compiler does not need to be Clang. Configuration fails fast if clang-tidy is not on PATH. Enabling the option surfaces clang-analyzer-* diagnostics across the internal sources; builds are not warning-clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump cmake_minimum_required to 3.16. Move directory-level globals (include_directories, link_libraries, add_definitions) to target-level properties on the OBJECT libraries, propagating to the executables via target_link_libraries. Replace hard-coded -fopenmp with find_package(OpenMP) and OpenMP::OpenMP_C/CXX. Use C_STANDARD/CXX_STANDARD per target instead of mutating CMAKE_*_FLAGS. Gate -march=native behind a new BUILD_NATIVE option (default ON). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Check each dynamic memory allocation and for simplicity abort() on failure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 07d75ee added "if (!ptr) abort()" guards after every malloc() in the algorithm files, but several of the affected callsites do "malloc(n * sizeof(...))" where n is the input size received directly from main(). On a conforming allocator, "malloc(0)" may legally return NULL without indicating OOM, in which case the new guard fires and aborts on an empty input. The exposed sites are all public sort entry points -- the wrapper functions that take (unsigned char** strings, size_t n) and allocate a workspace immediately, before any "n < threshold" early-out can absorb the empty case. Internal recursive helpers are unaffected because they already start with an "if (n < 32) insertion_sort; return;" guard that handles n == 0. Add "if (n == 0) return;" at the very top of every affected entry: mergesort.cpp: 6 entries mergesort_lcp.cpp: 8 entries mergesort_losertree.cpp: 10 entries mergesort_unstable.cpp: 3 entries funnelsort_impl.h: 1 (funnelsort_Kway template) multikey_simd.cpp: 3 (multikey_simd_b_1/_2/_4) msd_a.cpp: 2 (msd_A, msd_A_adaptive) msd_a2.cpp: 2 (msd_A2, msd_A2_adaptive) msd_ce.cpp: 4 (msd_CE5/6/7/8) msd_lsd.cpp: 2 (msd_A_lsd, msd_A_lsd_adaptive templates) Sortstring's main() rejects empty input before reaching any routine, and the existing unit-test loops started at k=1, so this bug was not exercised in practice. Extend test_routines() to start each k-loop at 0, giving every registered routine n=0 coverage and locking in the contract that "sorting an empty array succeeds." Running the extended test exposed a related n == 0 bug in burstsort_mkq_simpleburst and burstsort_mkq_recursiveburst (the templates behind all six burstsort_mkq_* registrations): the first operation is "pseudo_median<CharT>(strings, N, 0)", which dereferences strings[N/2] -- segfault when N == 0. Same fix: "if (N == 0) return;" at the top of both templates. Also fix a latent underflow in util/debug.h's check_result: the loop "for (size_t i=0; i < n-1; ++i)" wraps to SIZE_MAX iterations when n == 0. Guard with "if (n < 2) return 0;" at the top. With these fixes, ./build/unit-test passes end-to-end against the broadened k=0 coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Idea is to speed up parallel builds, previously funnelsort.cpp was one of slowing files to compile. Move template machinery to a new header src/funnelsort_impl.h, and split the per-K instantiations and routine registrations into one TU per BufferLayout: src/funnelsort_bfs.cpp and src/funnelsort_dfs.cpp. Each TU owns its layout's K=4 specialization too. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix GCC analyzer findings where realloc results were assigned directly to vector storage before checking for failure. Store realloc results in temporary pointers first so the original allocation remains reachable if realloc fails and aborts. Co-authored-by: OpenAI Codex <codex@openai.com>
Default stays "native". Override the -march= value to target a different ISA level. Running unit_test executable under Valgrind can trip on AVX-512 instructions, can be avoided for example by building with `-DMARCH=x86-64`. ``` ==1437032== valgrind: Unrecognised instruction at address 0x40c34ed. ==1437032== at 0x40C34ED: mergesort_4way_unstable(unsigned char**, unsigned long, unsigned char**) (mergesort_unstable.cpp:1862) ==1437032== by 0x40C3648: mergesort_4way_unstable(unsigned char**, unsigned long) (mergesort_unstable.cpp:1879) ==1437032== by 0x40DD641: test_routines (main.cpp:226) ==1437032== by 0x40DD641: main (main.cpp:302) ==1437032== Your program just tried to execute an instruction that Valgrind ==1437032== did not recognise. There are two possible reasons for this. ==1437032== 1. Your program has a bug and erroneously jumped to a non-code ==1437032== location. If you are running Memcheck and you just saw a ==1437032== warning about a bad jump, it's probably your program's fault. ==1437032== 2. The instruction is legitimate but Valgrind doesn't handle it, ==1437032== i.e. it's Valgrind's fault. If you think this is the case or ==1437032== you are not sure, please let us know and we'll try to fix it. ==1437032== Either way, Valgrind will now raise a SIGILL signal which will ==1437032== probably kill your program. ==1437032== ==1437032== Process terminating with default action of signal 4 (SIGILL) ==1437032== Illegal opcode at address 0x40C34ED ==1437032== at 0x40C34ED: mergesort_4way_unstable(unsigned char**, unsigned long, unsigned char**) (mergesort_unstable.cpp:1862) ==1437032== by 0x40C3648: mergesort_4way_unstable(unsigned char**, unsigned long) (mergesort_unstable.cpp:1879) ==1437032== by 0x40DD641: test_routines (main.cpp:226) ==1437032== by 0x40DD641: main (main.cpp:302) ``` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the cascading-if implementation of med3char(a, b, c) with the branchless median identity max(min(a, b), min(max(a, b), c)). Semantics match on all inputs including ties (a==b -> a, c==a -> a, c==b -> b, strict order picks the middle). Clears a clang-analyzer false positive at src/util/median.h:67 where the analyzer could not prove that every path through the cascading-if returned an initialized value (three clang-analyzer-core.CallAndMessage warnings on the pseudo_median call site). std::min and std::max are fully modeled. The reference-and-comparator overload is not flagged and stays as-is. Assembly impact (release build, gcc -O2 -march=native): the value-form med3char is inlined into pseudo_median<CharT>; the three instantiations shrink consistently across all four TUs that call them (burstsort_mkq, multikey_block, multikey_dynamic, multikey_simd): pseudo_median<unsigned char>: 146 -> 98 insns (-33%) pseudo_median<unsigned short>: 301 -> 222 insns (-26%) pseudo_median<unsigned int>: 491 -> 388 insns (-21%) The cascading-if produced ~8 insns and 4 conditional jumps per nested med3char call (three string-equality early-outs plus a slow/fast split the compiler could not fuse). The min/max identity compiles to mostly straight-line cmp/cmov pairs with a single branch, so the hot path of pseudo_median is now four nested cmp+cmov sequences followed by ret. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reimplement src/util/vmainfo.c as src/util/vmainfo.cpp using std::ifstream, std::string, and std::vector. The C ABI is preserved via extern "C" so the sole caller (src/sortstring.c) keeps working unchanged. Output layout matches the original byte-for-byte. The rewrite incidentally clears the clang-analyzer diagnostics flagged on the C version: - 16 clang-analyzer-security.insecureAPI.strcpy (unbounded strcat) - 8 clang-analyzer-unix.Stream (getline after EOF) - 2 clang-analyzer-unix.Malloc (getline buffer leaked at done:) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the vmainfo C++ rewrite for the remaining /proc-parsing helper. std::ifstream / std::string / std::ostringstream replace the manual getline + goto-cleanup + asprintf chains. C linkage is preserved via extern "C" so src/sortstring.c keeps building unchanged. Incidental correctness fix: CPU_ALLOC returns uninitialized memory, so the original cpus_allowed left bits for cleared mask positions in indeterminate state. The new implementation CPU_ZERO_S's the set before populating it from the hex string. Output verified identical to the C version on the unrestricted CPU set and a taskset-restricted subset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the vmainfo / cpus_allowed C++ rewrite for the remaining util. Anonymous-namespace statics replace file-scope statics; a small ms_between() helper deduplicates the per-clock delta computation (four call sites collapse to one expression each). C linkage is preserved via extern "C". Incidental precision fix: the C version divided tv_nsec by an integer 1000000, truncating sub-millisecond resolution for wall-clock and PROCESS_CPUTIME. user/sys already had microsecond resolution via tv_usec/1e3. Switching to /1000000.0 makes all five reported timings consistent at microsecond precision. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finish the src/ migration: the executable entry point and the routine
registry are the last two C translation units. Convert them and drop
the C-compat scaffolding that was carrying their callers.
Mechanical translation:
- File-scope statics move into anonymous namespaces.
- C-style void-pointer casts become static_cast<>.
- Implicit conversions through munmap drop their casts.
- Stdlib includes use the <c*> form.
- nullptr, std::strcmp, std::qsort, std::fopen etc.
Linkage cleanup:
- routine_register stays extern "C" (called by external/*.c at 149
ROUTINE_REGISTER sites; src/routine.h keeps its extern "C" guard).
- routine_from_name and routine_get_all become plain C++ linkage
(only C++ callers remain).
- src/util/{vmainfo,cpus_allowed,timing}.h drop their extern "C"
guards along with the extern "C" specifiers on the .cpp definitions.
- _GNU_SOURCE defines drop too: g++ predefines it for libstdc++.
Behavior preserved: output of -L / -A / --help / a full sort run
(modulo ASLR mmap addresses and timing values) is byte-identical
to the pre-rewrite binary. Error-path stderr and exit codes match.
external/*.c routines still self-register correctly via the
ROUTINE_REGISTER macro.
clang-analyzer warning count: 29 -> 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CMakeLists.txt bumps both target properties from CXX_STANDARD 11 to 17.
C_STANDARD stays at 99 (external/*.c is untouched). The algorithm
implementation files in src/ compile under the new standard with no
source changes -- no removed/deprecated C++17 constructs were in use
(no register, auto_ptr, random_shuffle, bind1st/bind2nd, ptr_fun,
unary/binary_function, dynamic exception specs, or trigraphs).
Cleanups taking advantage of the new standard, scoped to sortstring +
util + routines per the user's guidance ("minimal in algorithm files,
extensive in sortstring/util"):
vma_info now returns std::string instead of a malloc'd char*; the
strdup_malloc helper goes away and the caller's free()/strcmp pair
collapses to operator==.
cpus_allowed_list returns std::string ("" means not present).
cpus_allowed returns a cpus_info struct -- the two out-params turn
into a structured-binding consumer. cpu_scaling_{min,max}_freq
return std::optional<int>, retiring the -1 sentinel. Internal
helpers (status_entry, high_bit_order, set_cpu_bits) take
string_view. [[nodiscard]] on all four public entry points.
Incidental fix at the caller: std::free(cpus) -> CPU_FREE(cpus)
(functionally identical on glibc, but spec-correct).
routine_from_name returns std::optional<const routine*> and takes
string_view; the call site uses if-with-init. routine_get_all
returns std::pair, consumed by structured bindings in both
sortstring.cpp and unit-test/main.cpp. qsort -> std::sort with a
typed lambda comparator. [[nodiscard]] on both.
sortstring.cpp: opts bitfields become plain bools with default
member initializers; write_filename is std::string. Two asprintf
sites (log file and write-output path) become std::string
concatenation, dropping the malloc/free/return-code dance. bazename
returns std::string, retiring the static char buf[300] (not
thread-safe, though never actually exercised concurrently here).
[[nodiscard]] on alloc_bytes, alloc_text, alloc_pointers, file_size.
util/debug.h: [[nodiscard]] on check_result.
Verified output byte-identical to pre-change binary for -L, -A,
--help, full sort run (regular and taskset-restricted), and all
error paths (modulo ASLR addresses, random seed, timing values,
and RSS/Pss). Unit tests pass. Algorithm-file object code unchanged
in shape (standard bump only).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing per-CPU line printed only the kernel governor's scaling
window via scaling_min_freq / scaling_max_freq. On a system with
boost disabled, or on hybrid CPUs where different cores have
different boost ceilings, that window is the same number on every
core and conveys nothing about whether the CPU can actually run
faster than displayed.
Add three signals so the benchmark banner reflects what the silicon
can do:
1. Turbo/boost state -- a single global line. Reads
/sys/devices/system/cpu/cpufreq/boost first (AMD CPB,
acpi-cpufreq, amd-pstate); falls back to
/sys/devices/system/cpu/intel_pstate/no_turbo (intel_pstate,
inverse-sense). Reported as on/off; the line is omitted when
neither file is readable.
2. Hardware peak frequency -- per CPU, when it exceeds the
scaling cap. Reads amd_pstate_max_freq first (the per-core
boost ceiling reported by amd-pstate; differs across cores on
AMD hybrid parts), then cpuinfo_max_freq (Intel turbo cap).
Suppressed when equal to scaling_max_freq so the line stays
short on non-hybrid / boost-on systems.
3. P-core / E-core label -- per CPU. Resolved from
/sys/devices/cpu_core/cpus (P) and /sys/devices/cpu_atom/cpus
(E), the Intel hybrid PMU sysfs. Empty on non-hybrid Intel and
on AMD (where the hardware-peak value already exposes the
split, e.g. 5090 MHz Zen 5 vs 3506 MHz Zen 5c on a Ryzen AI
7 PRO 350).
Sample on the AMD Ryzen AI 7 PRO 350 (boost disabled,
2 GHz scaling cap):
CPU information:
CPUs allowed: 0-1,4-5
Turbo/boost: off
CPU0, scaling [623MHz .. 2000MHz], hw peak 5090MHz
CPU1, scaling [623MHz .. 2000MHz], hw peak 3506MHz
CPU4, scaling [623MHz .. 2000MHz], hw peak 5090MHz
CPU5, scaling [623MHz .. 2000MHz], hw peak 3506MHz
Incidental cleanup: the scaling range now prints [min .. max]
(was [max .. min]) since "scaling ... hw peak X" only reads
naturally with an ascending range.
API shape: cpu_scaling_min_freq / cpu_scaling_max_freq collapse
into a single cpu_info_for(cpu) returning a struct (scaling
min/max, hw peak, core class). cpu_boost_enabled() returns
std::optional<bool>.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
16 commits spanning build modernization, completing the C→C++ migration in
src/, and a few correctness fixes the static analyzers surfaced along the way.Build & CI
ENABLE_GCC_ANALYZER/ENABLE_CLANG_ANALYZERcmake options (scoped to project sources via OBJECT libraries;external/excluded). Builds are not warning-clean when either is on — they're diagnostic-surfacing modes, not gates.CMakeLists.txtmodernized (target-level properties, OpenMP viafind_package,CXX_STANDARDper target);cmake_minimum_requiredbumped to 3.16.build.yml: refresh runner pins, dependencies, andactions/checkout@v6(clears the Node.js 20 deprecation annotation).MARCHcache variable (defaultnative) — useful for-DMARCH=x86-64when running under Valgrind on AVX-512 hosts.src/ migration to C++
util/vmainfo,util/cpus_allowed,util/timing,routines,sortstringrewritten in C++. The only remaining.cfiles are inexternal/.routine.hkeeps itsextern "C"guard (149ROUTINE_REGISTERcallsites inexternal/*.c).[[nodiscard]],if-with-init) scoped tosortstring.cppandutil/; algorithm files unchanged.Correctness fixes
vector_realloc: don't lose realloc pointers on failure (GCC analyzer finding).Explicit abort()on every malloc failure across algorithm files.Handle empty input (n == 0)in every sort routine — theabort()guards above could fire onmalloc(0)returning NULL (conforming behavior); also fixed a pre-existing segfault inburstsort_mkq_*and an underflow incheck_result. Unit-test now starts each k-loop at 0, locking in the contract.median.h: rewrite value-formmed3charasmax(min(a,b), min(max(a,b), c))— branchless, clears a clang-analyzer false positive, and shrinkspseudo_median<CharT>static instruction count by 21–33%.Banner
cpu_informationnow reports turbo/boost state (cross-driver:cpufreq/boostthenintel_pstate/no_turbo), per-CPU hardware peak frequency when it exceeds the scaling cap (amd_pstate_max_freq/cpuinfo_max_freq— surfaces P/E split on AMD by value), and explicit(P-core)/(E-core)labels on Intel hybrid.Funnelsort compile-time
funnelsort.cppperK×BufferLayoutinto independent TUs — parallel builds no longer bottleneck on this file.Test plan
./build/unit-test→*** All OK ***../sortstring -L,-A,--help, and a full sort run (incl. taskset-restricted) output byte-identical to pre-rewrite binary (modulo ASLR mmap addresses, RSS/Pss, random seed, timing values)..croutines (quicksort,msd_nilsson) still self-register through theextern "C"routine_registerboundary.n == 0invocation of every registered routine returns without abort or crash.🤖 Generated with Claude Code