Skip to content

bugfix(network): Add command id overflow check to keep network commands in chronological order#2736

Open
Caball009 wants to merge 6 commits into
TheSuperHackers:mainfrom
Caball009:fix_network_commandID_overflow
Open

bugfix(network): Add command id overflow check to keep network commands in chronological order#2736
Caball009 wants to merge 6 commits into
TheSuperHackers:mainfrom
Caball009:fix_network_commandID_overflow

Conversation

@Caball009

@Caball009 Caball009 commented May 19, 2026

Copy link
Copy Markdown

Synchronized network commands are given a unique id so that they can be sorted chronologically. The id needs to be unique per execution frame. It's a uint16 value so it'll overflow quite fast, especially because it's never reset and the starting value is 64000 by default.

When the network runahead decreases it's expected that multiple frames are executed right after each other. I noticed an issue with the overflow when I set the CRC interval to once per frame. When the overflow happened for a client combined with a decrease of the runahead, the order of the CRC messages was no longer guaranteed to be chronological for that execution frame. That would cause a mismatch if the overflow didn't happen for all clients at the same time.

With a starting value of 64000 and a CRC message every frame the first overflow happens after ~25 seconds. You can see it mismatch here because of the overflow (this was with a special test build):
https://www.youtube.com/live/V_l-q9Y-DmA?t=621s
https://www.youtube.com/live/V_l-q9Y-DmA?t=9499s

I've added some test code in the first commit. If that's used without this fix, it becomes very apparent that the messages are not in the right order when the command id overflows.

@Caball009 Caball009 added Bug Something is not working right, typically is user facing Major Severity: Minor < Major < Critical < Blocker Network Anything related to network, servers Gen Relates to Generals ZH Relates to Zero Hour labels May 19, 2026
@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch 2 times, most recently from ee6bcd8 to 1f31061 Compare May 31, 2026 18:32
@Caball009 Caball009 marked this pull request as ready for review May 31, 2026 18:46
@Caball009 Caball009 requested a review from Mauller May 31, 2026 18:46
@greptile-apps

greptile-apps Bot commented May 31, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a chronological ordering bug for synchronized network commands when the uint16 command ID counter overflows, by replacing simple > comparisons with a half-range serial-number comparison (isCommandIdNewer) that correctly handles wrap-around.

  • NetCommandList.cpp: Introduces isCommandIdNewer(newVal, oldVal) using the classic half-range trick (diff = newVal - oldVal; return diff != 0 && diff < 0x8000), gated behind !RETAIL_COMPATIBLE_NETWORKING to preserve backward-compatible behavior for retail builds. All three sort comparison sites in addMessage() are updated.
  • NetworkUtil.cpp: Resets the starting ID from 64001 (first returned under old pre-increment) to 0 (post-increment), ensuring the full 65536-value range is usable before overflow; the file-scope s_commandID replaces the function-local static.
  • Network.cpp / CommandLine.cpp (both variants): Gates the per-frame NET_CRC_INTERVAL=1 debug path behind !RETAIL_COMPATIBLE_NETWORKING so retail-compatible builds always use the safe interval of 100.

Confidence Score: 5/5

Safe to merge; the half-range comparison is the standard algorithm for uint16 sequence number ordering and all changed call sites are consistent.

The core fix is a well-understood technique (RFC 1982-style serial number arithmetic) applied correctly to all three comparison sites. The retail-compatible guard preserves the old behavior for builds that require it, and the CRC interval gating is a straightforward companion change. The only open question is whether ID 0 is safe as a real command value, but nothing in the examined code treats 0 as a sentinel.

NetworkUtil.cpp — verify that no call site treats a generated ID of 0 as unset or invalid.

Important Files Changed

Filename Overview
Core/GameEngine/Source/GameNetwork/NetCommandList.cpp Adds isCommandIdNewer() using the classic half-range (RFC 1982-style) uint16 serial number comparison behind a RETAIL_COMPATIBLE_NETWORKING guard; all three call sites in addMessage() updated correctly
Core/GameEngine/Source/GameNetwork/NetworkUtil.cpp Moves commandID to file-scope static, changes start from 64001 (first returned) to 0 (post-increment); first returned value is now 0 which may need validation as a safe non-sentinel ID
Core/GameEngine/Source/GameNetwork/Network.cpp Guards NET_CRC_INTERVAL=1 behind both DEBUG_CRC and !RETAIL_COMPATIBLE_NETWORKING to prevent per-frame CRC checks from triggering the overflow bug in retail-compatible builds
Generals/Code/GameEngine/Source/Common/CommandLine.cpp Mirrors Network.cpp change: parseNetCRCInterval is now gated by both DEBUG_CRC and !RETAIL_COMPATIBLE_NETWORKING
GeneralsMD/Code/GameEngine/Source/Common/CommandLine.cpp Identical change to Generals/Code counterpart for the Zero Hour build

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["addMessage(msg)"] --> B{"Fast-path eligible?\nsame type and playerID\nas lastMessageInserted"}
    B -- Yes --> C{"isCommandIdNewer\nmsg.ID vs last.ID"}
    C -- No --> F["Full linear search"]
    C -- Yes --> D{"theNext is null\nor theNext comes after msg"}
    D -- Yes --> E["Insert after lastMessageInserted\nfast path"]
    D -- No --> F
    F --> G["Scan forward by type"]
    G --> H["Scan forward by playerID"]
    H --> I{"isCommandIdNewer\nmsg.sortNum vs tempmsg.sortNum"}
    I -- Yes --> J["Advance tempmsg"]
    J --> I
    I -- No --> K["Insert before tempmsg"]

    subgraph compare["isCommandIdNewer logic"]
        direction LR
        P["diff = uint16(newVal - oldVal)"] --> Q{"diff == 0"}
        Q -- Yes --> R["false - equal IDs"]
        Q -- No --> S{"diff < 0x8000"}
        S -- Yes --> T["true - newVal is newer"]
        S -- No --> U["false - newVal is older"]
    end
Loading

Reviews (8): Last reviewed commit: "Tweaked comment." | Re-trigger Greptile

@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch from 1f31061 to ebf1fa2 Compare May 31, 2026 18:59
Comment thread Core/GameEngine/Source/GameNetwork/NetworkUtil.cpp Outdated
@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch from ebf1fa2 to a7ff356 Compare June 2, 2026 11:45
@helmutbuhler

Copy link
Copy Markdown

Resetting the command id just delays the potential mismatch, right? Wouldn't it be better to sort the command ids properly, even if that rarely causes mismatches with retail clients?

@Caball009

Copy link
Copy Markdown
Author

Resetting the command id just delays the potential mismatch, right?

It's reset every N frames, so it's not just delayed.

@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch from a7ff356 to 3ae2622 Compare June 3, 2026 01:29

@xezon xezon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id needs to be unique per execution frame.

Can we just reset it to 0 or 1 at the beginning of every new frame? Or does that cause issues? What is the reason for it to reset every 128 frames now?

Comment thread Core/GameEngine/Source/GameNetwork/NetworkUtil.cpp Outdated
Comment thread Core/GameEngine/Source/GameNetwork/NetworkUtil.cpp Outdated
Comment thread Core/GameEngine/Source/GameNetwork/Network.cpp Outdated
Comment thread Core/GameEngine/Source/GameNetwork/Network.cpp Outdated
@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch from 3ae2622 to 93c5ce7 Compare June 4, 2026 17:07
@helmutbuhler

Copy link
Copy Markdown

When it's reset every 128 frames, isn't it possible that a packet from frame 127 comes in after it's already reset and then gets sorted wrong?

@xezon

xezon commented Jun 5, 2026

Copy link
Copy Markdown

My current understanding is that the ids only matter per frame, not across different frames. But then I do not understand why it needs to be reset every 128 frames instead of every frame.

@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch from 93c5ce7 to 1bccb3a Compare June 7, 2026 18:54
@Caball009

Copy link
Copy Markdown
Author

I noticed that at extremely high latency (beyond 64 frames runahead) the disconnect screen could be triggered by the reset somehow. I tried a ton of things, but I was unable to find out why that happens. That's why I made some changes to the implementation and only reset the command id when it gets close to overflowing.

When it's reset every 128 frames, isn't it possible that a packet from frame 127 comes in after it's already reset and then gets sorted wrong?

if (frame + m_runAhead > m_lastExecutionFrame)

This check should guard against that. If the check is inverted, you'll see wrong sorting once in a while.

My current understanding is that the ids only matter per frame, not across different frames. But then I do not understand why it needs to be reset every 128 frames instead of every frame.

If the runahead decreases the ids matter across multiple frames.

@xezon xezon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that at extremely high latency (beyond 64 frames runahead) the disconnect screen could be triggered by the reset somehow. I tried a ton of things, but I was unable to find out why that happens. That's why I made some changes to the implementation and only reset the command id when it gets close to overflowing.

Ok so this means every ~16 minutes there is a miniscule chance of network breakdown? It would be good to find a definite fix for it so we never need to worry about this again.

If the runahead decreases the ids matter across multiple frames.

This is a big problem for resetting. Can you gather more intel about this? Can we avoid it?

Maybe a good command reset opportunity is when there is a time frame where little to no new network commands were sent to peers?

Comment thread Core/GameEngine/Source/GameNetwork/Network.cpp Outdated
@xezon

xezon commented Jun 9, 2026

Copy link
Copy Markdown

Also inquired with Chad Gippy:

If you have a monotonically increasing uint16_t counter that wraps from 65535 to 0, a direct comparison like:

if (a > b)

fails across the wrap boundary.

A common solution is to compare the difference modulo 2¹⁶ and interpret it as a signed value:

bool is_newer(uint16_t a, uint16_t b)
{
    return static_cast<int16_t>(a - b) > 0;
}

Examples:

is_newer(100, 50);      // true
is_newer(50, 100);      // false

is_newer(0, 65535);     // true  (wrapped by 1)
is_newer(65535, 0);     // false

The subtraction is performed modulo 65536:

0 - 65535 == 1

which becomes int16_t(1) and therefore indicates that 0 is newer than 65535.

Limitation

This only works if the counters are never more than half the range apart.

For a 16-bit counter:

maximum unambiguous distance = 32767

The value exactly half a range away (32768) is fundamentally ambiguous:

a = 0;
b = 32768;

You cannot tell whether a is newer than b or vice versa because both interpretations are equally valid on a circular counter.

A robust version is:

bool is_newer(uint16_t a, uint16_t b)
{
    uint16_t diff = a - b;
    return diff != 0 && diff < 0x8000;
}

This is essentially the same technique used for sequence numbers in networking protocols such as TCP, where wraparound is expected.

@xezon

xezon commented Jun 9, 2026

Copy link
Copy Markdown

So what you could do for the non retail compatible sorting:

bool isCommandIdNewer(uint16_t newVal, uint16_t oldVal)
{
#if RETAIL_COMPATIBLE_NETWORKING
    return newVal > oldVal;
#else
    uint16_t diff = newVal - oldVal;
    return diff != 0 && diff < 0x8000;
#endif
}

This would require no manual ID resetting. Needs testing.

@Caball009 Caball009 force-pushed the fix_network_commandID_overflow branch 2 times, most recently from 9f1d368 to 60e1229 Compare June 13, 2026 20:17
@Caball009 Caball009 changed the title bugfix(network): Reset network command id to keep commands in chronological order bugfix(network): Add command id overflow check to keep network commands in chronological order Jun 13, 2026
@Caball009 Caball009 added the NoRetail This fix or change is not applicable with Retail game compatibility label Jun 13, 2026
@Caball009

Caball009 commented Jun 13, 2026

Copy link
Copy Markdown
Author

So what you could do for the non retail compatible sorting:

bool isCommandIdNewer(uint16_t newVal, uint16_t oldVal)
{
#if RETAIL_COMPATIBLE_NETWORKING
    return newVal > oldVal;
#else
    uint16_t diff = newVal - oldVal;
    return diff != 0 && diff < 0x8000;
#endif
}

This would require no manual ID resetting. Needs testing.

Yeah, this is probably the simplest solution after all (but not 100% retail compatible). Code and PR description updated.

@xezon

xezon commented Jun 14, 2026

Copy link
Copy Markdown

Yeah, this is probably the simplest solution after all (but not 100% retail compatible). Code and PR description updated.

Maybe still use the command id reset approach for the retail compatible network, if it reduces the chance of mismatch?

Comment thread Core/GameEngine/Source/GameNetwork/NetCommandList.cpp Outdated
@Caball009

Copy link
Copy Markdown
Author

Yeah, this is probably the simplest solution after all (but not 100% retail compatible). Code and PR description updated.

Maybe still use the command id reset approach for the retail compatible network, if it reduces the chance of mismatch?

It's only a potential mismatch issue if NET_CRC_INTERVAL is well below the original value (100). I'd have liked to have a retail-compatible fix, but it's fine this way.

The only reason I found out about this issue at all is because Legionnaire was using a special build for mismatch analysis that had the interval set to 1, which resulted in spurious mismatches that coincided with the command id overflow.

Comment thread Core/GameEngine/Source/GameNetwork/NetCommandList.cpp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Something is not working right, typically is user facing Gen Relates to Generals Major Severity: Minor < Major < Critical < Blocker Network Anything related to network, servers NoRetail This fix or change is not applicable with Retail game compatibility ZH Relates to Zero Hour

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants