-
Notifications
You must be signed in to change notification settings - Fork 257
fix: allow a column to share its name with the source relation #6020
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
lukapeschke
wants to merge
1
commit into
PRQL:main
Choose a base branch
from
lukapeschke:fix-table-name-shadowing
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this makes sense. If I as a user wrote
from bar | derive { x2 = this.bar }I would not expect the generated SQL to beSELECT *, * FROM BAR. I would be expecting thatthis.barwould be referring to a column inside thebarrelation namedbar, so the generated SQL ought to beSELECT *, bar AS x2 FROM bar. This is what's generated if the derive references any other name --from bar | derive { x2 = this.baz }results inSELECT *, baz AS x2 FROM bar.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess maybe the broader question is, why does
this.barresolve to the parent relationbar? It seems that specifyingthis.anythingshould always resolveanythingto only within the namespace of the current relation and never to the parent namespace. Although maybe I'm missing some currently established bit of PRQL functionality...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two things worth separating here, with the behavior verified against the compiler:
This is pre-existing — not introduced by this PR. On
main(aab7438, before this branch) the same query already produces the*:The PR documents it in
test_column_shadows_relation_name_forward_referencerather than changing it, and it's orthogonal to what this PR actually fixes (a shadowing derived column likederive { bar = ... }previously broke resolution of all its sibling columns).Why
this.barresolves to the relation:from barbinds the relation itself to the namebarin the current (this) namespace.this.barlooksbarup there, finds the relation, and expands it to*. A name that doesn't collide with the relation —this.baz— finds nothing under that name and is inferred as a column, givingbaz AS x2. So the asymmetry you're seeing is the bare leaf colliding with the relation's own name inthis.To reach a column named
bartoday, the qualified form does the right thing:Your broader question — should
this.Xonly ever resolve within the relation's columns and never to the relation name itself — is a real design question, but a wider change than this PR's scope. That call is the maintainers'; flagging that the answer to "am I missing established functionality?" is: yes,from <rel>putting<rel>into thethisnamespace is the current established behavior, and<rel>.<col>is the qualified escape hatch.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no sense in adding a test to the test suite that encodes bad behavior -- that's why this question is relevant against this PR. I'm fine with this test if the behavior is intended but would suggest dropping the test if we'd like to fix this in the future.
That said, what design intent is met by adding the name of the current relation to the
thisnamespace? Is there a reason why I would want to refer tothis.barand have that reflect a relation rather than a column?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question — I traced this through the resolver and verified each result against the compiler on this branch.
Short version:
thisisn't a flat bag of columns. Each input relation is registered inthisas a named sub-namespace, keyed by the relation's name. That registration is exactly what makes qualified access work —bar.x, and after a join the disambiguation betweenbar.xandfoo.y(you can even writethis.bar.x). Sobarlives inthisfor the same reasonfoodoes after a join: it's an input. The bare leafthis.barthen collides with that input name, and an input resolves to its wildcard;bar.baris the escape hatch to the column. So it isn't a parent-namespace leak — the relation is a first-class member ofthis, colliding with a column of the same name inside that one namespace.On "encoding bad behavior": this PR actually moves toward the resolution you're describing. When a derived column realizes a name that collides with the relation, the column now wins:
The residual that
test_column_shadows_relation_name_forward_referencedocuments is narrower than the general case — it's a forward reference within the same tuple, wherethis.baris read before the shadowing column is added:Whether that narrow residual is "bad behavior to drop the test for" or "intentional, with
bar.baras the documented escape hatch" is genuinely a maintainer design call, so I won't adjudicate it. The two readings are clear though: if the intent is that leafthis.<rel>should always prefer a column (your broader proposal), then the test does encode behavior you'd want to change, and dropping it or marking it a known-limitation TODO would signal that; if the wildcard expansion is accepted as-is, keeping the test as documentation of the edge is reasonable.Mechanism + evidence
thisby name when a pipeline step resolves:resolver/functions.rs:290(insert_frame(frame, NS_THIS)).module.rslookup: a column nested underNS_SHADOWING_COLwins; otherwise the bare relation/input name is returned and expands to*. The shadowing column is nested rather than overwriting the input atmodule.rs:487-502.prqlc compileon this branch (b992209), not hand-derived.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kgutwin Thank you for your review :) The test was mostly added to flag existing behavior that I'll try to address later. If you want me to, I can delete the test, or put the correct expected output in it and skip it with an associated issue number ?
Regarding your question:
I believe this is a side effect of a bare
thismeaning*to have thecount thisresolve toCOUNT(*): https://prql-lang.org/book/reference/syntax/keywords.html#this--thatI grepped through teh docs a little and
this.barreferring to a "everything in thebarrelation" is not documented so I'd be in favor of removing that behavior. We need to keep supportingcount thisthough since it's in the docs.Also, I'm not sure what a bare
thisshould mean in other contetxts ? for example what should the behavior be forfrom bar | select this? Could beSELECT * FROM bar, could be an error where we suggest usingselect this.*or explicit column names instead.I'm happy to help on that but the design is not my decision to make, so I think the work on it should happen in another PR.
Please let me know what you think the behavior should be. I'd be happy over input from other maintainers as well 😄