Bug updates#33
Merged
Merged
Conversation
Contributor
|
🚀 PR was released in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix 10 trial-user findings in the SQL to RediSearch translator
Closes the bugs catalogued in
ISSUE_REPORT.md(originally reported inerror_spec.md). Every defect now has a dedicated regression test that fails onmainand passes here.Issue-by-issue resolution
Two flavors of resolution are used. Where RediSearch has a faithful translation, the bug is fixed so the query produces the right result. Where RediSearch has no equivalent semantics (e.g., ordering on TAG values), the translator now raises a clear
ValueErrorso users get a fast, accurate signal instead of silently wrong output.NOT INon TAG silently emits the positive match (@status:{a|b})-prefix; the parser was settingnegated=Trueand the builder was ignoring it.negatedthroughbuild_tag_conditionand the translator'sis_negated. Now emits-@status:{a|b}.NOT BETWEENon NUMERIC silently emits the in-range querynegatedthroughbuild_numeric_condition. Now emits-@age:[10 20].NOT field > 5silently emits the positive comparison>,>=,<,<=,BETWEEN,IN,=,!=for both TAG and NUMERIC.|in a TAG value is read by RediSearch as a value separator (status = 'a|b'matchesaORb)|is the IN-list separator at the syntax level, so it must be escaped inside a value.|toQueryBuilder.TAG_SPECIAL_CHARS. Now emits@status:{a\\|b}andIN ('x|y','z')correctly produces two values, not three.INon NUMERIC crashes withTypeError: float([1,2,3])IN, but the union of equality ranges is faithful and cheap.field IN (a,b,c)to(@field:[a a]|@field:[b b]|@field:[c c]), or to an ANDed set of negated ranges forNOT IN. No more crash.BETWEEN min AND maxmay be cheaper if the values are contiguous.LIKE ''emits@field:(bare colon) which fails at runtimeValueErrorwith a hint pointing atIS NULLor%. Better than emitting invalid syntax.ValueErrorfrom the translator.BETWEENon TAG emits@status:{\\('a'\\, 'z'\\)}(a literal string match on the seven-character token('a', 'z'))</<=on tags, no collation, no lexicographic range. The previous output was already junk.ValueErrorexplaining the limitation and pointing atIN (...).status = "active"becomes@status:{None}(and crashes on NUMERIC withfloat(None))"active"as a quoted identifier instead of a literal._extract_literal_valuenow treats a quotedexp.Column(Identifier(quoted=True))as the identifier's string value.status = "active"→@status:{active}."..."for identifiers, so any future strict-mode flag should turn this back into an error.SELECT DISTINCTis silently ignored and returns duplicatesFT.AGGREGATEwithGROUPBYon the projected columns produces the same set of distinct rows.select.args["distinct"]and promotes the projected columns intogroupby_fields, routing throughFT.AGGREGATEwith noREDUCE.DISTINCT *andDISTINCTmixed with aggregates raise a clearValueError.FT.SEARCHtoFT.AGGREGATE. The executor normalizes both intoQueryResult, so user-visible row shape is unchanged.ORDER BYsilently drops every key after the firstFT.AGGREGATE SORTBYaccepts multiple keys; we can route there automatically.len(parsed.orderby_fields) > 1to theuse_aggregatetrigger in_build_command.FT.AGGREGATEemitsSORTBY 4 @price ASC @title DESCfor two keys. ORDER BY fields are validated by the analyzer and added toLOADso non-SORTABLEcolumns work.FT.AGGREGATEis heavier thanFT.SEARCHbecause it materializes the result pipeline, andLIMITuses cursor semantics rather than score-ordered windows. For a multi-key sort this is the right tool anyway; the side effect is that SQL queries differing only by a second ORDER BY key now hit a different Redis command.What changed in code
sql_redis/query_builder.pybuild_tag_conditionandbuild_numeric_conditionnow acceptnegated: bool = Falseand apply the-prefix to the entire range expression. Existing!=behavior is preserved by ORing the two sources.|toTAG_SPECIAL_CHARSso a literal pipe inside a TAG value is escaped.build_text_conditionrejects emptyLIKEpatterns with aValueError.sql_redis/translator.py_build_conditionpasses the XOR-resolvedis_negatedto the tag and numeric builders.INon a NUMERIC field is expanded to a|-joined union of equality ranges (or an ANDed set of negated equalities forNOT IN).BETWEENon a TAG field raisesValueError.ORDER BYauto-routes toFT.AGGREGATE(added to theuse_aggregatecondition in_build_command). ORDER BY fields are added to theLOADset so non-SORTABLEcolumns sort correctly.sql_redis/parser.py_extract_literal_valuetreats a quoted identifier (exp.Column(Identifier(quoted=True))) as the string value of the identifier.ParsedQuerygained adistinct: boolfield.SELECT DISTINCTpromotes the projected columns togroupby_fields.SELECT DISTINCT *andDISTINCTmixed with aggregates raise.sql_redis/analyzer.pyreferenced_fieldsso they're validated against the schema and available infield_typesfor the LOAD step.score(), vector-distance, and aggregation aliases. Previously these were not in the set because no path put them inreferenced_fields; now that ORDER BY does, the alias set has to be complete.Tests
24 new tests, all failing before this change and passing after:
tests/test_query_builder.py::TestQueryBuilderNegationBug(9 tests) covers the builder-level negation contract for tag and numeric.tests/test_query_builder.py::TestQueryBuilderTagPipeEscape(2 tests) covers literal pipe escaping in both scalar and IN-list values.tests/test_translator.py::TestTranslatorTrialUserBug(13 tests) covers the end-to-end SQL to RediSearch translation for every case from the bug report, including the FT.AGGREGATE routing and LOAD behavior for multi-key ORDER BY.Full suite: 550 passed.
Behavioral changes that callers may notice
LIKE ''ValueErrorBETWEENon TAGValueErrorORDER BYFT.SEARCH)FT.AGGREGATE SORTBYSELECT DISTINCT colFT.SEARCHwith duplicatesFT.AGGREGATEwithGROUPBYstatus = "active"@status:{None}@status:{active}NOT field > xon NUMERICNOT INon TAGstatus = 'a|b'on TAGaORba|bINon NUMERICTypeErrorNo existing test required updating; all changes preserved the prior contract where it was correct.
Files touched
sql_redis/query_builder.pysql_redis/translator.pysql_redis/parser.pysql_redis/analyzer.pytests/test_query_builder.pytests/test_translator.py