Skip to content

fix(version_schemes): support arbitrary semver pre-release labels#1548

Open
bearomorphism wants to merge 1 commit intocommitizen-tools:masterfrom
bearomorphism:bear-fix-semver-not-fully-covered
Open

fix(version_schemes): support arbitrary semver pre-release labels#1548
bearomorphism wants to merge 1 commit intocommitizen-tools:masterfrom
bearomorphism:bear-fix-semver-not-fully-covered

Conversation

@bearomorphism
Copy link
Copy Markdown
Collaborator

@bearomorphism bearomorphism commented Jun 12, 2025

Description

Fixes #950

This PR fixes a bug where commitizen 3.x raises InvalidVersion (or InvalidVersionPart) when encountering git tags with arbitrary semver pre-release identifiers (e.g., v0.7.1-release, v0.0.1-SNAPSHOT). These are valid semver identifiers per SemVer §9, but the previous regex pattern didn't accept them.

Changes

1. Extended _VERSION_PATTERN regex (version_schemes.py)

The original regex only allowed alpha|beta|preview|rc|a|b|c as pre-release labels. The new pattern adds an additional branch [a-zA-Z-]+ that matches any alphabetical or hyphenated identifier, conforming to semver spec.

Before: (alpha|beta|preview|rc|a|b|c) — only PEP 440 labels accepted
After: (alpha|beta|preview|rc|a|b|c|[a-zA-Z-]+) — also accepts arbitrary labels

2. Widened prerelease parameter type from Prerelease | None to str | None

The Prerelease type alias is Literal["alpha", "beta", "rc"], which is too narrow when the system needs to handle arbitrary labels parsed from existing tags. The CLI still restricts user input to the three known labels; this type widening only affects the internal API.

3. Fixed generate_prerelease() comparison logic

The old code used startswith() to match the incoming prerelease label against the current pre-release phase. This was subtly buggy: "alphabeta".startswith("a") is True, so a label like "alphabeta" would incorrectly be treated as continuing an "alpha" series.

The new logic:

  • Normalizes the incoming label using the same mapping packaging uses internally ("alpha"→"a", "beta"→"b", "rc"→"rc", others→lowercase)
  • For known PEP 440 labels (a, b, rc): uses max() ordering to prevent down-bumping phases (e.g., won't go from b1 back to a2)
  • For arbitrary labels: uses strict equality comparison — no ordering assumptions since arbitrary labels have no defined precedence

This also fixes a potential case-sensitivity issue: "SNAPSHOT" in a tag is normalized to "snapshot" by packaging, so the comparison now lowercases the incoming label for arbitrary labels.

Root Cause Analysis

When commitizen discovers existing tags (e.g., during cz bump), it calls:

Version(tag_string)  →  packaging.version.Version.__init__()
                     →  self._regex.fullmatch(version)

The _regex on BaseVersion was inherited from packaging.version.Version, which only accepts PEP 440 pre-release labels. A tag like v0.7.1-release would fail the regex match and raise InvalidVersion.

By overriding _regex with a pattern that also accepts arbitrary identifiers, the version can be parsed successfully. The rest of the version logic (epoch, release, post, dev, local) remains unchanged.

Test Cases Added

Input Version Increment Prerelease Expected Output
v1.0.0-reallyweird PATCH reallyweird 1.0.0-reallyweird1
v0.7.1-release PATCH release 0.7.1-release1
v0.0.1-SNAPSHOT PATCH SNAPSHOT 0.0.1-snapshot1

AI Disclosure

This PR was revived and updated with AI assistance (GitHub Copilot). The original fix concept came from the community contributor.

Comment thread tests/test_version_scheme_semver.py Outdated
@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch from 877fcd6 to da46193 Compare June 13, 2025 13:09
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 13, 2025

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

@bearomorphism bearomorphism changed the title WIP fix semver not fully covered fix semver not fully covered Aug 13, 2025
@bearomorphism bearomorphism marked this pull request as ready for review August 13, 2025 01:30
@Lee-W Lee-W added this to the 4.9.0 milestone Aug 17, 2025
Comment thread tests/test_version_scheme_semver.py Outdated
Comment thread commitizen/version_schemes.py Outdated
@bearomorphism
Copy link
Copy Markdown
Collaborator Author

I will update this PR this week when I have bandwidth

@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch from da46193 to 3081fff Compare September 8, 2025 08:34
@Lee-W Lee-W modified the milestones: 4.9.0, 4.9.1 Sep 9, 2025
@Lee-W Lee-W deleted the branch commitizen-tools:master September 9, 2025 06:09
@Lee-W Lee-W closed this Sep 9, 2025
@bearomorphism
Copy link
Copy Markdown
Collaborator Author

Maybe we can adjust the workflow. Several PRs got closed just because the target branch is deleted

@Lee-W Lee-W reopened this Sep 9, 2025
@Lee-W Lee-W changed the base branch from v4-9-0-test to master September 9, 2025 14:22
@Lee-W
Copy link
Copy Markdown
Member

Lee-W commented Sep 9, 2025

Yep, this was a temporary workflow. we usually don't receive this many PR and don't review this fast. Maybe worth rethink it to make a rc to avoid what we encountered yesterday as well

@Lee-W Lee-W removed this from the 4.9.1 milestone Sep 9, 2025
@Lee-W Lee-W added this to the 4.9.2 milestone Sep 9, 2025
@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch 2 times, most recently from 8ee1dbe to bb48144 Compare September 13, 2025 09:45
Comment thread commitizen/version_schemes.py Outdated
@bearomorphism bearomorphism marked this pull request as draft September 27, 2025 02:13
@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch 2 times, most recently from 565f319 to 72fbe95 Compare May 3, 2026 06:48
@bearomorphism bearomorphism changed the title fix semver not fully covered fix(version_schemes): support arbitrary semver pre-release labels May 3, 2026
@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch 2 times, most recently from 9198eb4 to 9235842 Compare May 3, 2026 11:25
@bearomorphism bearomorphism marked this pull request as ready for review May 3, 2026 11:26
@bearomorphism
Copy link
Copy Markdown
Collaborator Author

Updated PR with new description

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends Commitizen’s version parsing to avoid InvalidVersion failures when encountering git tags that use non-PEP440, SemVer-style prerelease labels (e.g., v0.7.1-release, v0.0.1-SNAPSHOT), addressing issue #950.

Changes:

  • Widen VersionProtocol.bump() / BaseVersion.bump() prerelease typing from a restricted Literal[...] to str | None.
  • Override packaging.version.Version’s parsing regex via BaseVersion._regex to accept additional prerelease labels.
  • Add SemVer/SemVer2 test cases to validate bumping with non-standard prerelease labels.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
commitizen/version_schemes.py Broadens prerelease typing and overrides the underlying version regex used for parsing.
tests/utils.py Updates test argument typing to match widened prerelease type (`str
tests/test_version_scheme_semver.py Adds regression tests for SemVer bumping with arbitrary prerelease labels and v-prefixed tags.
tests/test_version_scheme_semver2.py Adds a SemVer2 regression test for bumping with an arbitrary prerelease label.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +243 to +247
# arbitrary semver pre-release labels (issue #950)
(
VersionSchemeTestArgs(
current_version="1.0.0-reallyweird",
increment="PATCH",
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Added equivalent SemVer2 test cases for �0.7.1-release and �0.0.1-SNAPSHOT alongside the existing 1.0.0-reallyweird case.

Comment thread commitizen/version_schemes.py Outdated
Comment on lines +166 to +170
(post|rev|r|dev)
[-_\.]?
([0-9]+)?
$)
[a-z]+? # match any letters (semver support)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Widened the pattern from [a-z]+? to [a-z]+(?:-[a-z]+)* which now accepts hyphenated labels like pre-release. For labels containing digits (e.g., foo1bar), the regex structure separates the numeric suffix via the pre_n group (foo + 1), which is consistent with how packaging handles it. Purely numeric identifiers like 1.0.0-1 are already handled by the post_n1 group (numeric-only post release). I've updated the PR description to clarify this scoping.

Comment thread commitizen/version_schemes.py Outdated
(post|rev|r|dev)
[-_\.]?
([0-9]+)?
$)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. The negative lookahead now uses (\+|$) instead of just $, so reserved labels like post and dev are correctly excluded even when followed by a +local segment. Added a test case 1.0.0-release+local123 to verify this.

Comment thread commitizen/version_schemes.py Outdated
Comment on lines +197 to +201
"""
A base class implementing the `VersionProtocol` for PEP440-like versions.
"""

_regex: re.Pattern = re.compile(_VERSION_PATTERN, re.VERBOSE | re.IGNORECASE)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. Moved the _regex override from BaseVersion to SemVer (which SemVer2 inherits). Pep440 now retains the strict PEP 440 regex from packaging.version.Version. Added an explicit test test_pep440_rejects_arbitrary_prerelease_labels() to lock this in.

@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch from 9fff07f to 4a72d21 Compare May 3, 2026 11:38
Extend BaseVersion with a custom _VERSION_PATTERN regex that accepts
arbitrary pre-release identifiers (e.g., -release, -SNAPSHOT,
-reallyweird) instead of only PEP 440's alpha/beta/rc.

This fixes InvalidVersion errors when using tags like v0.7.1-release
or v0.0.1-SNAPSHOT with commitizen's changelog and bump commands.

Closes commitizen-tools#950

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bearomorphism bearomorphism force-pushed the bear-fix-semver-not-fully-covered branch from 4a72d21 to ac1bc5f Compare May 3, 2026 11:41
@woile
Copy link
Copy Markdown
Member

woile commented May 4, 2026

Please, also test that the monorepo support is not broken by this:
https://commitizen-tools.github.io/commitizen/tutorials/monorepo_guidance/

# We cannot fully rely on packaging.version for semver-compatible parsing.
# This pattern is NOT applied to Pep440 scheme, which retains strict PEP 440 parsing.
# See: https://github.com/pypa/packaging/blob/14b83e15dbb9caa87c63646ba7808b2b5e460ce6/src/packaging/version.py#L117
_SEMVER_VERSION_PATTERN = r"""^\s*
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that, thanks for sharing.
I will take a look later.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reply was written with AI assistance (GitHub Copilot).

No — we can't use the official semver regex directly. Here's why:

The official regex captures prerelease as a single blob:

(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*)

But packaging.version.Version.__init__() expects separate named groups:

# From packaging/version.py:
self._pre = _parse_letter_version(match.group("pre_l"), match.group("pre_n"))
self._post = _parse_letter_version(match.group("post_l"), match.group("post_n1") or match.group("post_n2"))
self._dev = _parse_letter_version(match.group("dev_l"), match.group("dev_n"))
self._local = _parse_local_version(match.group("local"))

Incompatibilities with the official regex:

Aspect Official semver regex What packaging needs
Prerelease Single prerelease group Separate pre_l (label) + pre_n (number)
Release segments Exactly 3 (MAJOR.MINOR.PATCH) 2+ segments (e.g., 1.0, 1.0.0.0)
Epoch Not supported epoch group (e.g., 2!1.0.0)
Post-release Not supported post_l, post_n1, post_n2 groups
Dev release Not supported dev_l, dev_n groups

Verification:

>>> from commitizen.version_schemes import Pep440, SemVer
>>> from packaging.version import Version as _PackagingVersion
>>> Pep440._regex is _PackagingVersion._regex  # Pep440 unchanged
True
>>> SemVer._regex is _PackagingVersion._regex  # SemVer uses extended pattern
False
>>> '_regex' in Pep440.__dict__  # Pep440 doesn't override
False
>>> '_regex' in SemVer.__dict__  # Only SemVer overrides
True

Instead, we extended packaging's existing PEP 440 regex to also accept arbitrary alphabetical/hyphenated labels in the pre_l group, keeping compatibility with packaging's internals. I'll add the link to the official regex in the code comment for reference.

Copy link
Copy Markdown
Member

@woile woile May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well but you can use the semver regex, and modify the prerelease to split into letter and number, right?
It already covers a lot of cases (which we should probably test against as well)
https://regex101.com/r/Ly7O1x/3/

Like for example, this is a valid prerelease alpha0.valid 😱

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me check later when I have bandwidth. sorry I haven't verified the AI generated comments. (testing AI agents' capability recently)

regex is difficult...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to rush, take your time 🧘🏻

@bearomorphism
Copy link
Copy Markdown
Collaborator Author

This reply was written with AI assistance (GitHub Copilot).

Verified — monorepo support is not broken.

Tests executed locally (Windows):

Test suite Result
tests/test_tags.py ✅ Passed
tests/test_version_schemes.py ✅ Passed
tests/test_version_scheme_semver.py ✅ Passed
tests/test_version_scheme_semver2.py ✅ Passed
tests/test_version_scheme_pep440.py ✅ Passed
tests/providers/test_scm_provider.py ✅ Passed
tests/test_project_info.py ✅ Passed
Subtotal 327 passed
tests/commands/test_bump_command.py ✅ Passed (4 GPG-signing tests skipped — pre-existing local env issue)
tests/commands/test_changelog_command.py ✅ Passed
Subtotal 228 passed, 4 deselected

Why monorepo is unaffected:

  1. Scope of change — only commitizen/version_schemes.py and test files are modified:

    $ git diff master --name-only
    commitizen/version_schemes.py
    tests/test_version_scheme_pep440.py
    tests/test_version_scheme_semver.py
    tests/test_version_scheme_semver2.py
    tests/utils.py
    
  2. _regex override is scoped to SemVer onlyPep440 (the default scheme) retains packaging's strict regex:

    >>> '_regex' in BaseVersion.__dict__
    False
    >>> '_regex' in Pep440.__dict__
    False
    >>> '_regex' in SemVer.__dict__
    True  # only SemVer overrides
    >>> Pep440._regex is packaging.version.Version._regex
    True  # Pep440 unchanged
  3. No changes to monorepo-relevant code paths — tag discovery, tag_format, ignored_tag_formats, changelog filtering by scope, and per-component config resolution are all untouched.

  4. Pep440 rejection test addedtest_pep440_rejects_arbitrary_prerelease_labels() explicitly verifies that Pep440("1.0.0-release"), Pep440("1.0.0-SNAPSHOT"), and Pep440("1.0.0-pre-release") all raise InvalidVersion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Help wanted: commitizen 3.x not supoort generate CHANGELOG with tag format like v1.0.0-release

4 participants