Skip to content

Conversation

@xuzifu666
Copy link
Member

dssysolyatin
dssysolyatin previously approved these changes Nov 12, 2025
Copy link
Contributor

@dssysolyatin dssysolyatin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. About the discussion, I’ll start it on the mailing list when I have time (I added it to my personal to-do list, but it will be not soon). If you want to do it yourself or open a jira ticket, go ahead.

@xuzifu666
Copy link
Member Author

LGTM. About the discussion, I’ll start it on the mailing list when I have time (I added it to my personal to-do list, but it will be not soon). If you want to do it yourself or open a jira ticket, go ahead.

@dssysolyatin Thank you for your suggestions and help,I learned a lot!
@mihaibudiu could you also give a final review? If OK,I would squash the commits, thanks

@mihaibudiu
Copy link
Contributor

The point of having visitors is exactly so you don't have to dispatch manually based on types (LogicalSnapshot), but have the compiler do the dispatch for you.

This deserves a discussion in JIRA. There are many ways to design the visitors, and I haven't seen a spec for Calcite's visitors. For example, it's not clear whether the visitor method for LogicalSnapshot is supposed to call the visitor method for superclasses.

But in general, if you have a new class that extends the visited objects (LogicalSnapshot extending RelNode), all visitors must potentially be extended to cope with the new class.

@xiedeyantu
Copy link
Member

For example, it's not clear whether the visitor method for LogicalSnapshot is supposed to call the visitor method for superclasses.

Should the implementation of the visitor be set as final or private as much as possible? However, it seems that it is currently inheritable. If modifications are made to the existing visitor, would this have an impact on forward compatibility?

@xuzifu666
Copy link
Member Author

The point of having visitors is exactly so you don't have to dispatch manually based on types (LogicalSnapshot), but have the compiler do the dispatch for you.

This deserves a discussion in JIRA. There are many ways to design the visitors, and I haven't seen a spec for Calcite's visitors. For example, it's not clear whether the visitor method for LogicalSnapshot is supposed to call the visitor method for superclasses.

But in general, if you have a new class that extends the visited objects (LogicalSnapshot extending RelNode), all visitors must potentially be extended to cope with the new class.

I also think it makes sense. Since it is open, it must be visible to other callers. So your suggestion is to open the RelShuttle interface for LogicalSnapshot, right? @mihaibudiu

@dssysolyatin
Copy link
Contributor

@xuzifu666 I recommend reading the comments on [CALCITE-7288]

@mihaibudiu
Copy link
Contributor

I think that the visitor should have methods for all Rel classes that are in core.
It is a breaking change, but I think it's a bug - the methods should have been added when the LogicalSnapshot was added.
But that's just my opinion, our compiler moves very quickly to accommodate such changes.

@dssysolyatin
Copy link
Contributor

dssysolyatin commented Nov 13, 2025

@xuzifu666 I approved your PR, but it doesn’t really make sense to merge it as it is. The reason is that AbstractRelNode already has a default accept implementation that calls RelShuttle.visit(RelNode other) and your change does nothing. So you can just close your PR if you don't plan add method for LogicalSnapshot.

If you want to add a visitor method for LogicalSnapshot, keep in mind this behavior: If there’s a class X that extends LogicalSnapshot and a class Y extends Snapshot, but not LogicalSnapshot, then X will call RelShuttle.visit(LogicalSnapshot) and Y will call RelShuttle.visit(RelNode other). This should also be documented as a breaking change as well

@xuzifu666
Copy link
Member Author

Okay, it seems that everyone is basically in agreement, so I would add visit method to LogicalSnapshot and I would document it as a breaking change.

By the way where is the appropriate place to write the breaking change document? (I haven't written this before) @dssysolyatin

@dssysolyatin
Copy link
Contributor

Okay, it seems that everyone is basically in agreement

I think I clearly explained my point of view during the PR review and in [CALCITE-7288]. As long as RelShuttle.visit(RelNode other) exists, we will keep forgetting to add new visit methods to RelShuttle for new RelNode types. If we add these methods later, it becomes a breaking change, and it’s easy to forget to document it. And user can forgot to read breaking changing if it is not compile error.

About breaking changes, I already sent you a link with example:

See the example in Stamatis’s commit 1ab5c7f#diff-c0eccc3130d0bc82b14113843abec157190ebb2551380fa754dbcdd56f61ea51

@xuzifu666
Copy link
Member Author

I have updated the PR and added the breaking change document. Reviews can check if it is OK.

@xuzifu666
Copy link
Member Author

It seems OK and I had squashed the commits @mihaibudiu

@mihaibudiu
Copy link
Contributor

I personally think this is the right way to solve this problem.
But I think @dssysolyatin is not satisfied.
Can we get a third party opinion on this PR? It's a small PR, should be easy to review.

@xiedeyantu
Copy link
Member

Until we have a solid refactoring plan, I agree with adding a visit method as a temporary fix for the current omission. However, I saw a concern raised in the refactoring Jira ticket: the generic visit(RelNode other) method could mask missing specific visit methods. Can we add a test to ensure that when a new RelNode is added, the shuttle does not fall back to the generic visit(RelNode other)?

@xuzifu666
Copy link
Member Author

xuzifu666 commented Nov 18, 2025

Until we have a solid refactoring plan, I agree with adding a visit method as a temporary fix for the current omission. However, I saw a concern raised in the refactoring Jira ticket: the generic visit(RelNode other) method could mask missing specific visit methods. Can we add a test to ensure that when a new RelNode is added, the shuttle does not fall back to the generic visit(RelNode other)?

OK,had added tests for visit(RelNode other), thanks for suggestion~ @xiedeyantu

@sonarqubecloud
Copy link

@dssysolyatin
Copy link
Contributor

dssysolyatin commented Nov 18, 2025

Can we add a test to ensure that when a new RelNode is added

@xuzifu666 I don’t think what you did in your last commit matches what @xiedeyantu meant. It is not so easy to write such test, although it should be possible using reflection. But before writing such a test, you need to clearly define the specification. For example, should RelShuttle handle both TableFunctionScan and LogicalTableFunctionScan in the same visit(TableFunctionScan) method (which already exists), or should LogicalTableFunctionScan have its own visit(LogicalTableFunctionScan) method?

as a temporary fix for the current omission

If you do such "temporary fix". Make sure we don’t add a new "temporary fix" with every release that includes breaking changes related to this topic. Based on HintCollector, I can say that there's an issue with Window and TableFunctionScan. For example, RelShuttle.visit(TableFunctionScan) exists, but TableFunctionScan doesn’t implement accept for it. So it falls back to RelShuttle.visit(RelNode other). This is yet another breaking change.

@xuzifu666
Copy link
Member Author

Can we add a test to ensure that when a new RelNode is added

@xuzifu666 I don’t think what you did in your last commit matches what @xiedeyantu meant. It is not so easy to write such test, although it should be possible using reflection. But before writing such a test, you need to clearly define the specification. For example, should RelShuttle handle both TableFunctionScan and LogicalTableFunctionScan in the same visit(TableFunctionScan) method (which already exists), or should LogicalTableFunctionScan have its own visit(LogicalTableFunctionScan) method?

as a temporary fix for the current omission

If you do such "temporary fix". Make sure we don’t add a new "temporary fix" with every release that includes breaking changes related to this topic. Based on HintCollector, I can say that there's an issue with Window and TableFunctionScan. For example, RelShuttle.visit(TableFunctionScan) exists, but TableFunctionScan doesn’t implement accept for it. So it falls back to RelShuttle.visit(RelNode other). This is yet another breaking change.

It seems we haven't reached a consensus. If you feel this is inappropriate, should I close this PR first? @dssysolyatin

@dssysolyatin
Copy link
Contributor

I’m okay to merge this PR if:

1 The specifications for RelShuttle methods will be clearly defined. Right now, it has visitors for both org.apache.calcite.rel.core (two of them) and org.apache.calcite.rel.logical (the rest).
2. The test that @xiedeyantu mentioned (which was actually a very good suggestion) is written based on this specification.

This will help us avoid forgetting new visit methods and ensure all existing methods work correctly, so any breaking changes will happen in only one release. It doesn’t matter to me if it is a separate Jira/PR or not

@xiedeyantu
Copy link
Member

Thanks for the explanation @dssysolyatin . Indeed, it's quite challenging to write a comprehensive test #4641 . I've drafted a general approach (still a rough sketch) – please feel free to review its feasibility. As you mentioned, we probably need to clarify which RelNode classes should have visit methods in RelShuttle and which shouldn't (those should be excluded in the test). I'm not sure if this type of test would be acceptable to the team.

@github-actions
Copy link

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@calcite.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Dec 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants