fix(version_schemes): support arbitrary semver pre-release labels#1548
fix(version_schemes): support arbitrary semver pre-release labels#1548bearomorphism wants to merge 1 commit intocommitizen-tools:masterfrom
Conversation
0972ba2 to
877fcd6
Compare
877fcd6 to
da46193
Compare
|
|
I will update this PR this week when I have bandwidth |
da46193 to
3081fff
Compare
|
Maybe we can adjust the workflow. Several PRs got closed just because the target branch is deleted |
|
Yep, this was a temporary workflow. we usually don't receive this many PR and don't review this fast. Maybe worth rethink it to make a rc to avoid what we encountered yesterday as well |
8ee1dbe to
bb48144
Compare
565f319 to
72fbe95
Compare
9198eb4 to
9235842
Compare
|
Updated PR with new description |
9235842 to
9fff07f
Compare
There was a problem hiding this comment.
Pull request overview
Extends Commitizen’s version parsing to avoid InvalidVersion failures when encountering git tags that use non-PEP440, SemVer-style prerelease labels (e.g., v0.7.1-release, v0.0.1-SNAPSHOT), addressing issue #950.
Changes:
- Widen
VersionProtocol.bump()/BaseVersion.bump()prerelease typing from a restrictedLiteral[...]tostr | None. - Override
packaging.version.Version’s parsing regex viaBaseVersion._regexto accept additional prerelease labels. - Add SemVer/SemVer2 test cases to validate bumping with non-standard prerelease labels.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
commitizen/version_schemes.py |
Broadens prerelease typing and overrides the underlying version regex used for parsing. |
tests/utils.py |
Updates test argument typing to match widened prerelease type (`str |
tests/test_version_scheme_semver.py |
Adds regression tests for SemVer bumping with arbitrary prerelease labels and v-prefixed tags. |
tests/test_version_scheme_semver2.py |
Adds a SemVer2 regression test for bumping with an arbitrary prerelease label. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # arbitrary semver pre-release labels (issue #950) | ||
| ( | ||
| VersionSchemeTestArgs( | ||
| current_version="1.0.0-reallyweird", | ||
| increment="PATCH", |
There was a problem hiding this comment.
Addressed. Added equivalent SemVer2 test cases for �0.7.1-release and �0.0.1-SNAPSHOT alongside the existing 1.0.0-reallyweird case.
| (post|rev|r|dev) | ||
| [-_\.]? | ||
| ([0-9]+)? | ||
| $) | ||
| [a-z]+? # match any letters (semver support) |
There was a problem hiding this comment.
Good catch. Widened the pattern from [a-z]+? to [a-z]+(?:-[a-z]+)* which now accepts hyphenated labels like pre-release. For labels containing digits (e.g., foo1bar), the regex structure separates the numeric suffix via the pre_n group (foo + 1), which is consistent with how packaging handles it. Purely numeric identifiers like 1.0.0-1 are already handled by the post_n1 group (numeric-only post release). I've updated the PR description to clarify this scoping.
| (post|rev|r|dev) | ||
| [-_\.]? | ||
| ([0-9]+)? | ||
| $) |
There was a problem hiding this comment.
Fixed. The negative lookahead now uses (\+|$) instead of just $, so reserved labels like post and dev are correctly excluded even when followed by a +local segment. Added a test case 1.0.0-release+local123 to verify this.
| """ | ||
| A base class implementing the `VersionProtocol` for PEP440-like versions. | ||
| """ | ||
|
|
||
| _regex: re.Pattern = re.compile(_VERSION_PATTERN, re.VERBOSE | re.IGNORECASE) |
There was a problem hiding this comment.
Addressed. Moved the _regex override from BaseVersion to SemVer (which SemVer2 inherits). Pep440 now retains the strict PEP 440 regex from packaging.version.Version. Added an explicit test test_pep440_rejects_arbitrary_prerelease_labels() to lock this in.
9fff07f to
4a72d21
Compare
Extend BaseVersion with a custom _VERSION_PATTERN regex that accepts arbitrary pre-release identifiers (e.g., -release, -SNAPSHOT, -reallyweird) instead of only PEP 440's alpha/beta/rc. This fixes InvalidVersion errors when using tags like v0.7.1-release or v0.0.1-SNAPSHOT with commitizen's changelog and bump commands. Closes commitizen-tools#950 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4a72d21 to
ac1bc5f
Compare
|
Please, also test that the monorepo support is not broken by this: |
| # We cannot fully rely on packaging.version for semver-compatible parsing. | ||
| # This pattern is NOT applied to Pep440 scheme, which retains strict PEP 440 parsing. | ||
| # See: https://github.com/pypa/packaging/blob/14b83e15dbb9caa87c63646ba7808b2b5e460ce6/src/packaging/version.py#L117 | ||
| _SEMVER_VERSION_PATTERN = r"""^\s* |
There was a problem hiding this comment.
is this the one officially shared?
https://semver.org/#is-there-a-suggested-regular-expression-regex-to-check-a-semver-string
There was a problem hiding this comment.
I didn't know that, thanks for sharing.
I will take a look later.
There was a problem hiding this comment.
This reply was written with AI assistance (GitHub Copilot).
No — we can't use the official semver regex directly. Here's why:
The official regex captures prerelease as a single blob:
(?P<prerelease>(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\.(?:0|[1-9]\d*|\d*[a-zA-Z-][0-9a-zA-Z-]*))*)
But packaging.version.Version.__init__() expects separate named groups:
# From packaging/version.py:
self._pre = _parse_letter_version(match.group("pre_l"), match.group("pre_n"))
self._post = _parse_letter_version(match.group("post_l"), match.group("post_n1") or match.group("post_n2"))
self._dev = _parse_letter_version(match.group("dev_l"), match.group("dev_n"))
self._local = _parse_local_version(match.group("local"))Incompatibilities with the official regex:
| Aspect | Official semver regex | What packaging needs |
|---|---|---|
| Prerelease | Single prerelease group |
Separate pre_l (label) + pre_n (number) |
| Release segments | Exactly 3 (MAJOR.MINOR.PATCH) |
2+ segments (e.g., 1.0, 1.0.0.0) |
| Epoch | Not supported | epoch group (e.g., 2!1.0.0) |
| Post-release | Not supported | post_l, post_n1, post_n2 groups |
| Dev release | Not supported | dev_l, dev_n groups |
Verification:
>>> from commitizen.version_schemes import Pep440, SemVer
>>> from packaging.version import Version as _PackagingVersion
>>> Pep440._regex is _PackagingVersion._regex # Pep440 unchanged
True
>>> SemVer._regex is _PackagingVersion._regex # SemVer uses extended pattern
False
>>> '_regex' in Pep440.__dict__ # Pep440 doesn't override
False
>>> '_regex' in SemVer.__dict__ # Only SemVer overrides
TrueInstead, we extended packaging's existing PEP 440 regex to also accept arbitrary alphabetical/hyphenated labels in the pre_l group, keeping compatibility with packaging's internals. I'll add the link to the official regex in the code comment for reference.
There was a problem hiding this comment.
well but you can use the semver regex, and modify the prerelease to split into letter and number, right?
It already covers a lot of cases (which we should probably test against as well)
https://regex101.com/r/Ly7O1x/3/
Like for example, this is a valid prerelease alpha0.valid 😱
There was a problem hiding this comment.
let me check later when I have bandwidth. sorry I haven't verified the AI generated comments. (testing AI agents' capability recently)
regex is difficult...
There was a problem hiding this comment.
no need to rush, take your time 🧘🏻
Verified — monorepo support is not broken. Tests executed locally (Windows):
Why monorepo is unaffected:
|
Description
Fixes #950
This PR fixes a bug where
commitizen 3.xraisesInvalidVersion(orInvalidVersionPart) when encountering git tags with arbitrary semver pre-release identifiers (e.g.,v0.7.1-release,v0.0.1-SNAPSHOT). These are valid semver identifiers per SemVer §9, but the previous regex pattern didn't accept them.Changes
1. Extended
_VERSION_PATTERNregex (version_schemes.py)The original regex only allowed
alpha|beta|preview|rc|a|b|cas pre-release labels. The new pattern adds an additional branch[a-zA-Z-]+that matches any alphabetical or hyphenated identifier, conforming to semver spec.Before:
(alpha|beta|preview|rc|a|b|c)— only PEP 440 labels acceptedAfter:
(alpha|beta|preview|rc|a|b|c|[a-zA-Z-]+)— also accepts arbitrary labels2. Widened
prereleaseparameter type fromPrerelease | Nonetostr | NoneThe
Prereleasetype alias isLiteral["alpha", "beta", "rc"], which is too narrow when the system needs to handle arbitrary labels parsed from existing tags. The CLI still restricts user input to the three known labels; this type widening only affects the internal API.3. Fixed
generate_prerelease()comparison logicThe old code used
startswith()to match the incoming prerelease label against the current pre-release phase. This was subtly buggy:"alphabeta".startswith("a")isTrue, so a label like"alphabeta"would incorrectly be treated as continuing an"alpha"series.The new logic:
packaginguses internally ("alpha"→"a","beta"→"b","rc"→"rc", others→lowercase)a,b,rc): usesmax()ordering to prevent down-bumping phases (e.g., won't go fromb1back toa2)This also fixes a potential case-sensitivity issue:
"SNAPSHOT"in a tag is normalized to"snapshot"bypackaging, so the comparison now lowercases the incoming label for arbitrary labels.Root Cause Analysis
When
commitizendiscovers existing tags (e.g., duringcz bump), it calls:The
_regexonBaseVersionwas inherited frompackaging.version.Version, which only accepts PEP 440 pre-release labels. A tag likev0.7.1-releasewould fail the regex match and raiseInvalidVersion.By overriding
_regexwith a pattern that also accepts arbitrary identifiers, the version can be parsed successfully. The rest of the version logic (epoch, release, post, dev, local) remains unchanged.Test Cases Added
v1.0.0-reallyweirdreallyweird1.0.0-reallyweird1v0.7.1-releaserelease0.7.1-release1v0.0.1-SNAPSHOTSNAPSHOT0.0.1-snapshot1AI Disclosure
This PR was revived and updated with AI assistance (GitHub Copilot). The original fix concept came from the community contributor.