[SQL-2712] Update IN translation to use $in#144
Conversation
…QL-2712-in-operator
mattChiaravalloti
left a comment
There was a problem hiding this comment.
I made it through everything except translator and codegen. I'm getting nervous I'll lose these big comments at this point so I'm submitting this partial review now. I'll follow up with thoughts on translator and codegen. This is a very large change! Nice job getting to all of the areas that need to be addressed. There is more work to do to get this generally correct but we're on a good path here.
| current_db: index_usage_filter | ||
| query: "SELECT * FROM nullable_fields WHERE a > 100 AND b IN (SELECT b FROM non_nullable_fields WHERE a < 300)" | ||
| expected_utilization: IX_SCAN | ||
| - description: in operator uses the index |
There was a problem hiding this comment.
| - description: in operator uses the index | |
| - description: in operator uses the index |
[nit] All other tests in this file have a newline between them, so I suggest adding one here as well
There was a problem hiding this comment.
[Questions/Thoughts] Should we also add any of the following tests?
NOT INfornullable_fieldsshould use anIX_SCANINfornon_nullable_fieldsshould use anIX_SCANI think... you can check that against mongodb manually and then add the test if you observe index utilization is possible in this case (I suspect it is)NOT INfornon_nullable_fieldsshould use anIX_SCAN(again, check this and if it does then we should add the test so we don't regress on this support in the future)
| let op = BinaryOp::arbitrary(g); | ||
| // The parser always produces a Tuple on the RHS of IN/NOT IN, so arbitrary | ||
| // generation must match that shape to satisfy the round-trip property. | ||
| let right = if op == BinaryOp::In || op == BinaryOp::NotIn { |
There was a problem hiding this comment.
[Praise] Smart!
|
|
||
| #[cfg(test)] | ||
| mod test; | ||
|
|
There was a problem hiding this comment.
[Question] Is there a reason you got rid of this newline?
| }), | ||
| ); | ||
|
|
||
| test_algebrize!( |
There was a problem hiding this comment.
[Question] Is this the same test as the one from binary.rs? Does it need to live in both places? I think the test file names correspond to input types, not output types, so this test is better placed in binary.rs. We can likely remove this here.
There was a problem hiding this comment.
Question still stands. My suggestion is that we remove this test; this PR should not need to make changes to this file.
| /// Rewrites `IN` and `NOT IN` scalar functions to native match language when the | ||
| /// left-hand side is a field access and every value in the list is a literal. | ||
| /// | ||
| /// Any other shape (subqueries, field refs in the list, functions) falls through | ||
| /// to `None` so the expression stays in `ExprLanguage`. |
There was a problem hiding this comment.
[Praise] Nice! Good thinking here, and the tests for this look great too.
mattChiaravalloti
left a comment
There was a problem hiding this comment.
Rest of my thoughts -- this time focused on translator and codegen (and, implicitly, air::desugarer).
| }; | ||
|
|
||
| let expression = match in_op.input { | ||
| None => return Err(Error::InvalidMatchLanguageInputRef), |
There was a problem hiding this comment.
[Suggestion] Why would this ever be None? Can we update mir::MatchLanguageIn.input to be FieldPath instead of an Option<FieldPath>? Looking at the optimization that creates mir::MatchLanguageIn, there doesn't seem to be a need to make it an option since it is always set to Some(field_path).
| pub struct | ||
| FieldRef { |
There was a problem hiding this comment.
| pub struct | |
| FieldRef { | |
| pub struct FieldRef { |
[nit/suggestion/question] Why the newline?
| // $nin is match-language-only and rejected by the server in $project; NotIn must | ||
| // emit { "$not": [{ "$in": [...] }] } via SqlSemanticOperator codegen. | ||
| In => ScalarFunctionType::Sql(SqlOperator::In), | ||
| NotIn => ScalarFunctionType::Sql(SqlOperator::NotIn), |
There was a problem hiding this comment.
[Thought] (blocking) I'm thinking through how algrebrization, translation, and codegen will work for these operators. It seems that we may still have additional work to do in the algebrization step which will impact these other steps.
In particular, we need to determine if the algebrized expression is nullable -- right now, we're doing that the way every other binary op does it but that's wrong! Since IN/NOT IN deal with an expr and an array of exprs, we determine nullability by either the LHS being nullable or any element of the RHS being nullable, right? nullable_a IN (1, 2, 3) is possibly null, non_nullable_a IN (1, 2, NULL, 3) is possibly null, non_nullable_a IN (1, 2, 3) is never null, and nullable_a IN (1, 2, NULL, 3) is possibly null. We'll need to update algebrization to account for this. Then, when we're doing translation here, we'll follow the correct path to translate these as SQL semantic or MQL semantic operators (if the scalar function is not nullable, we can translate to the MQL version, if it is nullable we need to translate to the SQL version).
On top of that, we may need to deal with the SQL version in the desugarer. Specifically, a IN (1, 2, NULL, 3) should be true if a is 1, 2, or 3, or it should be null if it itself is null or if it is not 1, 2, or 3 (because the result of the third comparison is null). However, as currently implemented we will just codegen {$in: ["$a", [1, 2, null, 3]]} and not get the expected null result when we should. The desugarer could handle this as part of the sql_null_semantics_operators pass.
Ultimately, the reason I'm commenting this here is because I think we'll need to update this translation util function to return MqlOperator::In/MqlOperator::NotIn in the even the function itself is not nullable.
There was a problem hiding this comment.
I updated the algebrizer and parsing logic here:
6d16cde
There was a problem hiding this comment.
I'll make the desugarer change in another PR
| ); | ||
| } | ||
|
|
||
| mod in_operator { |
There was a problem hiding this comment.
[Chore] We'll need to add tests for when the functions are nullable (is_nullable: true) -- these tests only handle when it that value is false.
| translator::{self, test::match_query::mir_field_input}, | ||
| }; | ||
|
|
||
| fn make_translator() -> translator::MqlTranslator { |
There was a problem hiding this comment.
[Question] This helper function, along with the test functions themselves, look identical to the implementation of the test_translate_match_query macro. Is there any difference between the two? Is there a reason you didn't use the macro?
|
One last idea is that we may want to update the |
mattChiaravalloti
left a comment
There was a problem hiding this comment.
Marking as approved even though I am leaving a ton of comments. I just don't want to block this from merging while I'm gone.
There is still work to be done even though I marked as approved:
- all outstanding open comment threads should either result in code changes or have answers to them that explain why not (I resolved the ones that are addressed in one way or another)
- A lot of the threads are still just some bookkeeping changes like fixing test names and/or inputs, addressing odd newlines, etc.
- still need to add the spec/query tests as discussed offline
- still need to update the desugarer before this can merge since it won't behave correctly until we do (we'll see the spec/query tests fail without this work done)
Feel free to tag someone else in in addition to Nick to give this another thorough pass if you want!
| }), | ||
| ); | ||
|
|
||
| test_algebrize!( |
There was a problem hiding this comment.
Question still stands. My suggestion is that we remove this test; this PR should not need to make changes to this file.
| env = map! { | ||
| ("foo", 1u16).into() => Schema::Document( Document { | ||
| keys: map! { | ||
| "a".into() => Schema::Atomic(Atomic::String), |
There was a problem hiding this comment.
| "a".into() => Schema::Atomic(Atomic::String), | |
| "a".into() => Schema::Atomic(Atomic::Integer), |
We still need to update this or else it will ultimately fail when schema checking is implemented correctly.
| env = map! { | ||
| ("foo", 1u16).into() => Schema::Document( Document { | ||
| keys: map! { | ||
| "a".into() => Schema::Atomic(Atomic::String), |
There was a problem hiding this comment.
| "a".into() => Schema::Atomic(Atomic::String), | |
| "a".into() => Schema::Atomic(Atomic::Integer), |
Same issue applies here.
| env = map! { | ||
| ("foo", 1u16).into() => Schema::Document( Document { | ||
| keys: map! { | ||
| "a".into() => Schema::Atomic(Atomic::String), |
There was a problem hiding this comment.
| "a".into() => Schema::Atomic(Atomic::String), | |
| "a".into() => Schema::Atomic(Atomic::Boolean), |
Same issue applies here as well.
| env = map! { | ||
| ("foo", 1u16).into() => Schema::Document( Document { | ||
| keys: map! { | ||
| "a".into() => Schema::Atomic(Atomic::Integer), | ||
| }, | ||
| required: set!{}, | ||
| additional_properties: false, | ||
| ..Default::default() | ||
| }), | ||
| }, |
There was a problem hiding this comment.
| env = map! { | |
| ("foo", 1u16).into() => Schema::Document( Document { | |
| keys: map! { | |
| "a".into() => Schema::Atomic(Atomic::Integer), | |
| }, | |
| required: set!{}, | |
| additional_properties: false, | |
| ..Default::default() | |
| }), | |
| }, |
Since this test uses all Literals, we don't even need to include an env.
| @@ -1 +1 @@ | |||
| /// Optimizes IS and LIKE expressions such that they can be translated and | |||
There was a problem hiding this comment.
[Suggestion] We should update this comment a bit to reflect that this now supports IN/NOT IN as well.
| use super::*; | ||
|
|
||
| test_schema!( | ||
| in_operator_requires_two_args, |
There was a problem hiding this comment.
| in_operator_requires_two_args, | |
| not_in_operator_requires_two_args, |
[Question/Suggestion] Should we name these for in and not_in distinctly?
There was a problem hiding this comment.
This question applies throughout this file but I'll only comment it once.
| ); | ||
|
|
||
| test_schema!( | ||
| basic_in_operator_schema_is_boolean, |
There was a problem hiding this comment.
Question still stands, thought there's a typo in my suggestion. Should be not_in_operator_schema_is_boolean.
| /// Contains an `unreachable!()` branch for the non-array RHS match arm. This branch is | ||
| /// structurally unreachable: the preceding guard already returns | ||
| /// [`Error::SchemaChecking`] for any RHS that does not satisfy `Schema::Array(Any)`. | ||
| fn get_in_operator_comparison_schema( |
There was a problem hiding this comment.
[Praise] The implementation of the method is fantastic!
| use crate::{air, mapping_registry::MqlMappingRegistryValue}; | ||
|
|
||
| test_translate_expression!( | ||
| it_converts_to_mql_when_lhs_is_nullable, |
There was a problem hiding this comment.
[Question] (blocking) Shouldn't it be that if either side is nullable it translates to SQL? And when both are not nullable it translates to MQL?
There was a problem hiding this comment.
It's really just the test name that is misleading, and the input that is invalid. If either operand is marked with is_nullable: true, the entire expression should also be marked with is_nullable: true. This test sets is_nullable: false for the overall In expression, but marks one of its operands, the LHS, with is_nullable: true.
Can you fix this test to be valid to demonstrate that when none of the args is nullable, the translation is MQL. And then a separate test that shows when either arg is nullable (and therefore the expression is nullable) it translates to SQL?
Summary
This change updates mongosql to preserve the IN operator in SQL queries rather than translate them into a disjunction of ORs. Index Utilization is preserved.
Before, our rewriters would translate the query to a disjunction of ORs:
Now, the code will preserve the IN operator, and later translate the code to the corresponding mql operator:
Along with these changes, I've made changes throughout the rest of the code, to allow IN / NOT IN to be translated to $in and { "$not": { "$in": [...] } } expressions in match stage aggregation pipelines, and use $in and $nin in $expr expressions.
./target/debug/mongosql-cli "SELECT * FROM nullable_fields WHERE a IN (100, 250, 322)"Will produce this aggregation pipeline:
Changes
InandNotInenums to mir, air, ast, and codegen codepathsTupleenum type throughut mir, air, and codegen. It operates the same as the Array type but I felt like it made it clearer to understand the cases.Testing