Skip to content

feat: make patch_xml respect custom Jinja2 delimiters#651

Open
Neal-Ding wants to merge 1 commit into
elapouya:masterfrom
Neal-Ding:feat/custom-delimiter-patch-xml
Open

feat: make patch_xml respect custom Jinja2 delimiters#651
Neal-Ding wants to merge 1 commit into
elapouya:masterfrom
Neal-Ding:feat/custom-delimiter-patch-xml

Conversation

@Neal-Ding

Copy link
Copy Markdown

Problem

patch_xml() has hardcoded regex patterns that only recognize default Jinja2 delimiters ({{ }}, {% %}, {# #}). When users configure custom delimiters via jinja_env:

jinja_env = Environment(variable_start_string="{", variable_end_string="}")
tpl.render(context, jinja_env)

…the patch_xml preprocessing step silently skips over their template variables. Specifically:

  1. Pattern ② (striptags — the core issue): Strips XML tags from inside {{...}} / {%...%} / {#...#} blocks. Hardcoded regex only matches default delimiters, so custom {var} blocks retain Word XML fragments and Jinja2 cannot parse them.

  2. Pattern ⑥ (clean_tags): HTML entity cleanup inside Jinja2 tags. Same hardcoded pattern.

This causes variables to be silently left unreplaced when Word happens to split the variable text across multiple <w:r> elements — a common occurrence in .docx files.

Root Cause

# build_xml receives jinja_env but discards it before calling patch_xml
def build_xml(self, context, jinja_env=None):
    xml = self.get_xml()
    xml = self.patch_xml(xml)          # ← jinja_env not passed
    ...

Changes

  • patch_xml: Added jinja_env=None parameter. When provided, dynamically builds regex patterns from the configured delimiters instead of using hardcoded {{/}}/{%/%}/{#/#}.
  • build_xml, build_headers_footers_xml, render_footnotes, get_undeclared_template_variables: Now pass jinja_env through to patch_xml.
  • Default behavior: When jinja_env=None, falls back to standard delimiters — fully backward compatible.
  • tests/custom_delimiters.py: New test with intentionally split XML runs and custom { } delimiters.

Scope

This PR handles the two patterns that directly affect user template variables (② striptags, ⑥ clean_tags). docxtpl's own DSL tags (colspan, cellbg, vm, hm, {{y ...}}, {%y ... %}, {%-, -%}, {{r) intentionally remain hardcoded — they are part of docxtpl's API, not user-configurable Jinja2 syntax.

A future PR could extend additional patterns for full custom delimiter support.

Co-Authored-By: Claude noreply@anthropic.com

patch_xml() had hardcoded regex patterns for {{ }}, {% %}, and {# #},
ignoring any custom delimiters set via jinja_env. This caused template
variables to be silently left unreplaced when using non-default delimiters
(like single braces { }) because XML tags split by Word were never stripped
from inside the custom blocks.

Changes:
- Added jinja_env parameter to patch_xml() and all its call sites
- Dynamic regex patterns for stripping XML tags inside Jinja2 blocks (pattern ②)
- Dynamic regex patterns for HTML entity cleanup inside Jinja2 tags (pattern ⑥)
- Default behavior unchanged when jinja_env is None
- Added test with intentionally split XML runs and custom { } delimiters

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant