Optimization of REXML::XPathParser#step by naitoh · Pull Request #310 · ruby/rexml

naitoh · 2026-05-05T06:39:53Z

Benchmark

$ benchmark-driver benchmark/xpath.yaml
                                                           before       after  before(YJIT)  after(YJIT) 
  REXML::XPath.match(REXML::Document.new(xml), 'a//a')     4.170k      4.300k        4.091k       4.087k i/s -     100.000 times in 0.023979s 0.023257s 0.024444s 0.024470s
REXML::XPath.match(REXML::Document.new(xml), '//a//a')    196.250      1.378k       293.984       1.701k i/s -     100.000 times in 0.509554s 0.072574s 0.340154s 0.058796s

Comparison:
               REXML::XPath.match(REXML::Document.new(xml), 'a//a')
                                                 after:      4299.8 i/s 
                                                before:      4170.3 i/s - 1.03x  slower
                                          before(YJIT):      4091.0 i/s - 1.05x  slower
                                           after(YJIT):      4086.6 i/s - 1.05x  slower

             REXML::XPath.match(REXML::Document.new(xml), '//a//a')
                                           after(YJIT):      1700.8 i/s 
                                                 after:      1377.9 i/s - 1.23x  slower
                                          before(YJIT):       294.0 i/s - 5.79x  slower
                                                before:       196.3 i/s - 8.67x  slower

YJIT=ON : 0.99x - 5.79x faster
YJIT=OFF : 1.03x - 7.01x faster

tompng · 2026-05-05T12:14:35Z

+      new_nodes = {}
      nodeset.each do |node|
-        new_nodeset = []
-        new_nodes = {}


With this change, the result of this xpath has changed.

REXML::XPath.match(REXML::Document.new('<a id="a1"><a id="a2"></a></a>'), '//a/descendant::b[1]') # => [ ... </>, ... </>] Expected (in master branch) # => [ ... </>] Actual (this pull request)

Reference:

document.body.innerHTML='<a id="a1"><a id="a2"></a></a>' snapshot = document.evaluate('//a/descendant::b[1]', document, null, XPathResult.ORDERED_NODE_SNAPSHOT_TYPE, null); [...new Array(snapshot.snapshotLength)].map((_, i) => snapshot.snapshotItem(i)) // => [b#b1, b#b2]

I'm sorry.
Since this change has unintended consequences, I will cancel the change to #descendant.
Thanks!

tompng · 2026-05-05T12:31:59Z

+              next if seen.key?(raw_node.object_id)
+              seen[raw_node.object_id] = true


object_id will lazily assign a new id when it is called. We can avoid this by using seen = {}.compare_by_identity instead.

Thanks for pointing that out.
I’ll fix it.

Using compare_by_identity, Hash will compare by its identity, (same as comparing by object_id, but without assigning actual object_id), so seen.key?(raw_node) and seen[raw_node] = true is enough.

These two code gives the same result

h = {} h[key.object_id] = value h[key.object_id] #=> value h[key.dup.object_id] #=> nil

h = {}.compare_by_identity h[key] = value h[key] #=> value h[key.dup] #=> nil

OK, I see.
Fixed it.

## Benchmark ``` $ benchmark-driver benchmark/xpath.yaml before after before(YJIT) after(YJIT) REXML::XPath.match(REXML::Document.new(xml), 'a//a') 4.170k 4.300k 4.091k 4.087k i/s - 100.000 times in 0.023979s 0.023257s 0.024444s 0.024470s REXML::XPath.match(REXML::Document.new(xml), '//a//a') 196.250 1.378k 293.984 1.701k i/s - 100.000 times in 0.509554s 0.072574s 0.340154s 0.058796s Comparison: REXML::XPath.match(REXML::Document.new(xml), 'a//a') after: 4299.8 i/s before: 4170.3 i/s - 1.03x slower before(YJIT): 4091.0 i/s - 1.05x slower after(YJIT): 4086.6 i/s - 1.05x slower REXML::XPath.match(REXML::Document.new(xml), '//a//a') after(YJIT): 1700.8 i/s after: 1377.9 i/s - 1.23x slower before(YJIT): 294.0 i/s - 5.79x slower before: 196.3 i/s - 8.67x slower ``` - YJIT=ON : 0.99x - 5.79x faster - YJIT=OFF : 1.03x - 7.01x faster

Copilot

Pull request overview

This PR optimizes REXML::XPathParser#step by deduplicating nodes earlier when merging multiple node sets, improving performance (notably for queries like //a//a) while preserving correct XPath node-set semantics (no duplicates, document order) and predicate behavior.

Changes:

Deduplicate merged node sets in XPathParser#step using identity-based tracking before sorting and assigning positions.
Add regression tests covering descendant-axis ordering, duplicate elimination, and position predicates across overlapping subtrees.
Update the XPath benchmark to include a //a//a scenario and reduce XML nesting depth.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
lib/rexml/xpath_parser.rb	Deduplicates merged raw nodes (by identity) before sorting and wrapping into `XPathNode`s.
test/xpath/test_base.rb	Adds regression tests for descendant-axis behavior (no duplicates, document order, predicate position semantics).
benchmark/xpath.yaml	Adds a new benchmark case for `//a//a` and reduces generated XML depth.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

naitoh requested a review from kou May 5, 2026 06:47

tompng reviewed May 5, 2026

View reviewed changes

naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 5b3e01a to 2d3346f Compare May 5, 2026 16:20

naitoh changed the title ~~Optimization of REXML::XPathParser#step,#descendant~~ Optimization of REXML::XPathParser#step May 5, 2026

naitoh requested a review from tompng May 5, 2026 16:23

naitoh added 2 commits May 6, 2026 05:12

Add test

6456800

naitoh force-pushed the fix_XPathParser_step_descendant_performance_xpath branch from 2d3346f to 6456800 Compare May 5, 2026 20:14

kou requested review from Copilot and removed request for tompng May 6, 2026 03:17

Copilot started reviewing on behalf of kou May 6, 2026 03:18 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Comment thread benchmark/xpath.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization of REXML::XPathParser#step#310

Optimization of REXML::XPathParser#step#310
naitoh wants to merge 2 commits intoruby:masterfrom
naitoh:fix_XPathParser_step_descendant_performance_xpath

naitoh commented May 5, 2026 •

edited

Loading

Uh oh!

tompng May 5, 2026

Uh oh!

naitoh May 5, 2026

Uh oh!

tompng May 5, 2026

Uh oh!

naitoh May 5, 2026

Uh oh!

tompng May 5, 2026

Uh oh!

naitoh May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		next if seen.key?(raw_node.object_id)
		seen[raw_node.object_id] = true

Conversation

naitoh commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

naitoh May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

naitoh May 5, 2026

Choose a reason for hiding this comment

Uh oh!

tompng May 5, 2026

Choose a reason for hiding this comment

Uh oh!

naitoh May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

naitoh commented May 5, 2026 •

edited

Loading