feat: make patch_xml respect custom Jinja2 delimiters#651
Open
Neal-Ding wants to merge 1 commit into
Open
Conversation
patch_xml() had hardcoded regex patterns for {{ }}, {% %}, and {# #},
ignoring any custom delimiters set via jinja_env. This caused template
variables to be silently left unreplaced when using non-default delimiters
(like single braces { }) because XML tags split by Word were never stripped
from inside the custom blocks.
Changes:
- Added jinja_env parameter to patch_xml() and all its call sites
- Dynamic regex patterns for stripping XML tags inside Jinja2 blocks (pattern ②)
- Dynamic regex patterns for HTML entity cleanup inside Jinja2 tags (pattern ⑥)
- Default behavior unchanged when jinja_env is None
- Added test with intentionally split XML runs and custom { } delimiters
Co-Authored-By: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
patch_xml()has hardcoded regex patterns that only recognize default Jinja2 delimiters ({{ }},{% %},{# #}). When users configure custom delimiters viajinja_env:…the
patch_xmlpreprocessing step silently skips over their template variables. Specifically:Pattern ② (striptags — the core issue): Strips XML tags from inside
{{...}}/{%...%}/{#...#}blocks. Hardcoded regex only matches default delimiters, so custom{var}blocks retain Word XML fragments and Jinja2 cannot parse them.Pattern ⑥ (clean_tags): HTML entity cleanup inside Jinja2 tags. Same hardcoded pattern.
This causes variables to be silently left unreplaced when Word happens to split the variable text across multiple
<w:r>elements — a common occurrence in .docx files.Root Cause
Changes
patch_xml: Addedjinja_env=Noneparameter. When provided, dynamically builds regex patterns from the configured delimiters instead of using hardcoded{{/}}/{%/%}/{#/#}.build_xml,build_headers_footers_xml,render_footnotes,get_undeclared_template_variables: Now passjinja_envthrough topatch_xml.jinja_env=None, falls back to standard delimiters — fully backward compatible.tests/custom_delimiters.py: New test with intentionally split XML runs and custom{ }delimiters.Scope
This PR handles the two patterns that directly affect user template variables (② striptags, ⑥ clean_tags). docxtpl's own DSL tags (
colspan,cellbg,vm,hm,{{y ...}},{%y ... %},{%-,-%},{{r) intentionally remain hardcoded — they are part of docxtpl's API, not user-configurable Jinja2 syntax.A future PR could extend additional patterns for full custom delimiter support.
Co-Authored-By: Claude noreply@anthropic.com