Fix #405: no spurious space between emphasis and following punctuation by assinscreedFC · Pull Request #442 · Alir3z4/html2text

assinscreedFC · 2026-06-05T17:24:52Z

Summary

Fixes #405.

After a closing emphasis marker, html2text inserted a separating space before anything except whitespace, brackets and .!?. That wrongly added a space before other punctuation:

>>> import html2text
>>> html2text.html2text("<em>hello</em>,")
'_hello_ ,\n\n'      # expected '_hello_,'

Same for : " ; etc.

Change

In handle_data, the separating space after stressed (emphasis) text is only needed before a word character — which would otherwise attach to the closing _/* marker and stop Markdown from recognising the emphasis. Punctuation never merges with the marker, so it must not get a space. The condition changes from a broad blocklist:

re.match(r"[^][(){}\s.!?]", data[0])

to:

re.match(r"\w", data[0])

\w keeps the needed space before letters, digits and _ (all of which break a closing _), while dropping it before punctuation.

This is scoped to the punctuation case (#405). The separate **strong** + alphanumeric case (#413) uses a different code path and is left untouched.

Tests

Added a regression fixture test/emphasis_punctuation.html / .md (the test driver auto-discovers *.html/*.md pairs). It pins both behaviours: no space before punctuation/apostrophe, space kept before a following word or digit.

All 199 tests pass; black, isort, mypy, and flake8 are clean. ChangeLog and AUTHORS updated.

AI assistance (Claude) was used; I reviewed every line and ran the tests.

…ctuation After a closing emphasis marker html2text inserted a separating space before anything except whitespace, brackets and `.!?`. That wrongly added a space before other punctuation, e.g. `<em>hello</em>,` produced `_hello_ ,` instead of `_hello_,`. The separating space is only needed before a word character, which would otherwise attach to the closing marker and stop Markdown from recognising the emphasis. The condition is now `re.match(r"\w", data[0])`. Adds a regression fixture (test/emphasis_punctuation.*) and a ChangeLog entry. Co-authored-by: Claude <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #405: no spurious space between emphasis and following punctuation#442

Fix #405: no spurious space between emphasis and following punctuation#442
assinscreedFC wants to merge 1 commit into
Alir3z4:masterfrom
assinscreedFC:fix/405-emphasis-punctuation-space

assinscreedFC commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

assinscreedFC commented Jun 5, 2026

Summary

Change

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant