Bug updates by rbs333 · Pull Request #33 · redis-developer/sql-redis

rbs333 · 2026-05-27T12:00:38Z

Fix 10 trial-user findings in the SQL to RediSearch translator

Closes the bugs catalogued in ISSUE_REPORT.md (originally reported in error_spec.md). Every defect now has a dedicated regression test that fails on main and passes here.

Issue-by-issue resolution

Two flavors of resolution are used. Where RediSearch has a faithful translation, the bug is fixed so the query produces the right result. Where RediSearch has no equivalent semantics (e.g., ordering on TAG values), the translator now raises a clear ValueError so users get a fast, accurate signal instead of silently wrong output.

#	Severity	Reported behavior	Worth fixing?	Resolution in this PR	Notes
1	Critical	`NOT IN` on TAG silently emits the positive match (`@status:{a\|b}`)	Yes. RediSearch supports negation as a `-` prefix; the parser was setting `negated=True` and the builder was ignoring it.	Threaded `negated` through `build_tag_condition` and the translator's `is_negated`. Now emits `-@status:{a\|b}`.	Faithful 1:1 mapping; no semantic compromise.
2	Critical	`NOT BETWEEN` on NUMERIC silently emits the in-range query	Yes. Same root cause as 1 for numeric ranges.	Threaded `negated` through `build_numeric_condition`. Now emits `-@age:[10 20]`.	Faithful 1:1 mapping.
3	Critical	`NOT field > 5` silently emits the positive comparison	Yes. Same root cause.	Same fix as 2; covers `>`, `>=`, `<`, `<=`, `BETWEEN`, `IN`, `=`, `!=` for both TAG and NUMERIC.	Faithful 1:1 mapping.
4	Critical	Literal `\|` in a TAG value is read by RediSearch as a value separator (`status = 'a\|b'` matches `a` OR `b`)	Yes. `\|` is the IN-list separator at the syntax level, so it must be escaped inside a value.	Added `\|` to `QueryBuilder.TAG_SPECIAL_CHARS`. Now emits `@status:{a\\\|b}` and `IN ('x\|y','z')` correctly produces two values, not three.	Cosmetic in indexes that never store pipes; critical anywhere a tag value can contain user input.
5	High	`IN` on NUMERIC crashes with `TypeError: float([1,2,3])`	Yes. RediSearch has no native NUMERIC `IN`, but the union of equality ranges is faithful and cheap.	The translator expands `field IN (a,b,c)` to `(@field:[a a]\|@field:[b b]\|@field:[c c])`, or to an ANDed set of negated ranges for `NOT IN`. No more crash.	Query string grows linearly with list length. For very large IN lists, a single `BETWEEN min AND max` may be cheaper if the values are contiguous.
6	High	`LIKE ''` emits `@field:` (bare colon) which fails at runtime	Yes, but no sensible behavior to map to. Empty pattern has no defined match.	Raises `ValueError` with a hint pointing at `IS NULL` or `%`. Better than emitting invalid syntax.	Behavioral change: callers that previously got a runtime error from Redis now get an earlier `ValueError` from the translator.
7	High	`BETWEEN` on TAG emits `@status:{\\('a'\\, 'z'\\)}` (a literal string match on the seven-character token `('a', 'z')`)	No, this cannot be faithfully fixed. RediSearch TAG fields are an unordered inverted index; there is no `<` / `<=` on tags, no collation, no lexicographic range. The previous output was already junk.	Raises `ValueError` explaining the limitation and pointing at `IN (...)`.	If the user truly wants lexicographic ranges, the field must be declared as TEXT (or NUMERIC) at index creation. That is a schema decision the translator cannot make.
8	High	`status = "active"` becomes `@status:{None}` (and crashes on NUMERIC with `float(None)`)	Yes. Users clearly intend a value comparison; the issue is sqlglot parsing `"active"` as a quoted identifier instead of a literal.	`_extract_literal_value` now treats a quoted `exp.Column(Identifier(quoted=True))` as the identifier's string value. `status = "active"` → `@status:{active}`.	Aligns with MySQL's permissive double-quote handling. Standard ANSI SQL reserves `"..."` for identifiers, so any future strict-mode flag should turn this back into an error.
9	Medium	`SELECT DISTINCT` is silently ignored and returns duplicates	Yes. `FT.AGGREGATE` with `GROUPBY` on the projected columns produces the same set of distinct rows.	Parser detects `select.args["distinct"]` and promotes the projected columns into `groupby_fields`, routing through `FT.AGGREGATE` with no `REDUCE`. `DISTINCT *` and `DISTINCT` mixed with aggregates raise a clear `ValueError`.	Underlying Redis command changes from `FT.SEARCH` to `FT.AGGREGATE`. The executor normalizes both into `QueryResult`, so user-visible row shape is unchanged.
10	Medium	Multi-column `ORDER BY` silently drops every key after the first	Yes. `FT.AGGREGATE SORTBY` accepts multiple keys; we can route there automatically.	Added `len(parsed.orderby_fields) > 1` to the `use_aggregate` trigger in `_build_command`. `FT.AGGREGATE` emits `SORTBY 4 @price ASC @title DESC` for two keys. ORDER BY fields are validated by the analyzer and added to `LOAD` so non-`SORTABLE` columns work.	Tradeoff: `FT.AGGREGATE` is heavier than `FT.SEARCH` because it materializes the result pipeline, and `LIMIT` uses cursor semantics rather than score-ordered windows. For a multi-key sort this is the right tool anyway; the side effect is that SQL queries differing only by a second ORDER BY key now hit a different Redis command.

What changed in code

sql_redis/query_builder.py

build_tag_condition and build_numeric_condition now accept negated: bool = False and apply the - prefix to the entire range expression. Existing != behavior is preserved by ORing the two sources.
Added | to TAG_SPECIAL_CHARS so a literal pipe inside a TAG value is escaped.
build_text_condition rejects empty LIKE patterns with a ValueError.

sql_redis/translator.py

_build_condition passes the XOR-resolved is_negated to the tag and numeric builders.
IN on a NUMERIC field is expanded to a |-joined union of equality ranges (or an ANDed set of negated equalities for NOT IN).
BETWEEN on a TAG field raises ValueError.
Multi-key ORDER BY auto-routes to FT.AGGREGATE (added to the use_aggregate condition in _build_command). ORDER BY fields are added to the LOAD set so non-SORTABLE columns sort correctly.

sql_redis/parser.py

_extract_literal_value treats a quoted identifier (exp.Column(Identifier(quoted=True))) as the string value of the identifier.
ParsedQuery gained a distinct: bool field. SELECT DISTINCT promotes the projected columns to groupby_fields. SELECT DISTINCT * and DISTINCT mixed with aggregates raise.

sql_redis/analyzer.py

ORDER BY fields are now added to referenced_fields so they're validated against the schema and available in field_types for the LOAD step.
The alias set used to skip schema lookups was broadened to include score(), vector-distance, and aggregation aliases. Previously these were not in the set because no path put them in referenced_fields; now that ORDER BY does, the alias set has to be complete.

Tests

24 new tests, all failing before this change and passing after:

tests/test_query_builder.py::TestQueryBuilderNegationBug (9 tests) covers the builder-level negation contract for tag and numeric.
tests/test_query_builder.py::TestQueryBuilderTagPipeEscape (2 tests) covers literal pipe escaping in both scalar and IN-list values.
tests/test_translator.py::TestTranslatorTrialUserBug (13 tests) covers the end-to-end SQL to RediSearch translation for every case from the bug report, including the FT.AGGREGATE routing and LOAD behavior for multi-key ORDER BY.

Full suite: 550 passed.

Behavioral changes that callers may notice

Input	Before	After
`LIKE ''`	Invalid query at runtime	`ValueError`
`BETWEEN` on TAG	Garbage query, no matches	`ValueError`
Multi-key `ORDER BY`	Silently dropped trailing keys (`FT.SEARCH`)	All keys honored via `FT.AGGREGATE SORTBY`
`SELECT DISTINCT col`	`FT.SEARCH` with duplicates	`FT.AGGREGATE` with `GROUPBY`
`status = "active"`	`@status:{None}`	`@status:{active}`
`NOT field > x` on NUMERIC	Positive match (wrong)	Negated match (correct)
`NOT IN` on TAG	Positive match (wrong)	Negated match (correct)
`status = 'a\|b'` on TAG	Matches `a` OR `b`	Matches literal `a\|b`
`IN` on NUMERIC	`TypeError`	Union of equality ranges

No existing test required updating; all changes preserved the prior contract where it was correct.

Files touched

sql_redis/query_builder.py
sql_redis/translator.py
sql_redis/parser.py
sql_redis/analyzer.py
tests/test_query_builder.py
tests/test_translator.py

github-actions · 2026-05-27T17:00:24Z

🚀 PR was released in v0.6.0 🚀

not, in, etc. changes

18ad59e

rbs333 requested a review from nkanu17 May 27, 2026 12:00

rbs333 added the auto:release Create a release when this PR is merged label May 27, 2026

rbs333 merged commit 4063514 into main May 27, 2026
8 checks passed

rbs333 deleted the bug/RAAE-1685 branch May 27, 2026 16:34

github-actions Bot added the released This issue/pull request has been released. label May 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug updates#33

Bug updates#33
rbs333 merged 1 commit into
mainfrom
bug/RAAE-1685

rbs333 commented May 27, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rbs333 commented May 27, 2026

Fix 10 trial-user findings in the SQL to RediSearch translator

Issue-by-issue resolution

What changed in code

Tests

Behavioral changes that callers may notice

Files touched

Uh oh!

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant