-
Notifications
You must be signed in to change notification settings - Fork 427
Add an initial implementation of a kcp native garbage collector #3768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
|
/test pull-kcp-test-integration |
1 similar comment
|
/test pull-kcp-test-integration |
| ) | ||
|
|
||
| func (gc *GarbageCollector) registerHandlers(ctx context.Context) func() { | ||
| // TODO(ntnn): Handle sharding? Could add a filter on all watches to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think this even matters? If we are dealing with local shard informer, it will not even see logicalcluster from other shardS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True; I was originally thinking of watching all logical clusters and wrote the comment based on that. If we only need cross-cluster ownership for resources in the cache that is moot.
|
@ntnn: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
mjudeikis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did quick read
| "github.com/kcp-dev/kcp/pkg/reconciler/garbagecollector/syncmap" | ||
| ) | ||
|
|
||
| type Graph struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much of this is upsteam-looks like and how much is our custom stuff? Would be nice to have some mini readme ontop how data is stored, and if some of these are "borowoed" from upstream - links to original source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's pretty much all custom stuff; I originally wanted to reuse code from upstream but the upstream code is very tightly knit. My initial attempt was to start a graph builder per logical cluster with the same queues, feeding that into a single garbage collector with modifications for cluster-awareness; but that needed so many adjustments that it'd be a nightmare to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add some more docs
|
There are a couple |
|
Also, I'm getting pretty consistent failures with: and First run succeeds (probably always), the second time (probably always) fails. |
| // The graph add/update events are not queued into their own worker | ||
| // queue because updating the graph cannot fail and is reasonably | ||
| // fast. | ||
| handlers := cache.ResourceEventHandlerFuncs{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're inserting all objects in a shard into the GC graph. Can they be filtered and inserted only if they actually contain owner references?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, they're references, not the objects themselves. I was thinking of making so a node in the graph only exists when it the respective object owns other objects but decided to leave that for later because I don't think it makes much of a difference in the initial implementation.
Memory-wise - if an object doesn't own any other objects its node in the graph will just be an empty slice.
Performance-wise it might be better because the underlying sync.Map is better with fewer writes and many reads. So not writing empty nodes would be preferable to not busy the underlying sync.Map when the information isn't needed. But I'd prefer to benchmark that to see how much of an impact that will have (also to compare the hashtriemap vs other potential implementations [like a simple map with a lock :D]; though I strongly suspect the hashtriemap will be the best option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a node in the graph only exists when it the respective object owns other objects
Yes that's what I meant (or inverse of that) :D but fair enough, this can be left for a next PR.
For the rest, I agree -- benchmarks needed. Atomics have their costs too, so it would be interesting to see what is the difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ a benchmark of this GC versus the stock one.
|
Can you link to some design doc or tickets, to give this more context? Or maybe with a broad brush describe the problems this is solving, the goal and assumptions you made? |
Summary
Add an initial implementation of a kcp native garbage collector.
Only enabled for e2e tests for now, e2e-shared and -sharded produced some odd errors.
Enabling for e2e is done in a separate commit for bisecting.
There's still blind spots and unhandled cases (e.g. blocking owner deletion), but these need tests added.
And the PR is already relatively sizeable and I'd rather iterate off of this going forward instead of making one massive PR.
Cross-cluster ownership is technically possible but needs adjustments in the kube fork to allow owner references to include the cluster name.
What Type of PR Is This?
/kind feature
Related Issue(s)
Fixes #
Release Notes
Not adding a release note yet. I'd leave that for when the gc is ready to be tested in production-like environments.