Add an initial implementation of a kcp native garbage collector #3768

ntnn · 2025-12-19T11:52:14Z

Summary

Add an initial implementation of a kcp native garbage collector.

Only enabled for e2e tests for now, e2e-shared and -sharded produced some odd errors.
Enabling for e2e is done in a separate commit for bisecting.

There's still blind spots and unhandled cases (e.g. blocking owner deletion), but these need tests added.
And the PR is already relatively sizeable and I'd rather iterate off of this going forward instead of making one massive PR.

Cross-cluster ownership is technically possible but needs adjustments in the kube fork to allow owner references to include the cluster name.

What Type of PR Is This?

/kind feature

Related Issue(s)

Fixes #

Release Notes

Not adding a release note yet. I'd leave that for when the gc is ready to be tested in production-like environments.

NONE

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

kcp-ci-bot · 2025-12-19T11:52:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from ntnn. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

ntnn · 2025-12-19T12:10:52Z

/test pull-kcp-test-integration

ntnn · 2025-12-19T12:22:49Z

/test pull-kcp-test-integration

mjudeikis · 2025-12-19T12:36:34Z

pkg/reconciler/garbagecollector/gc_handlers.go

+)
+
+func (gc *GarbageCollector) registerHandlers(ctx context.Context) func() {
+	// TODO(ntnn): Handle sharding? Could add a filter on all watches to


Do you think this even matters? If we are dealing with local shard informer, it will not even see logicalcluster from other shardS?

True; I was originally thinking of watching all logical clusters and wrote the comment based on that. If we only need cross-cluster ownership for resources in the cache that is moot.

kcp-ci-bot · 2025-12-19T12:44:24Z

@ntnn: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
pull-kcp-test-e2e	`bf6e24b`	link	true	`/test pull-kcp-test-e2e`
pull-kcp-test-e2e-multiple-runs	`bf6e24b`	link	true	`/test pull-kcp-test-e2e-multiple-runs`

Full PR test history

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

mjudeikis

did quick read

mjudeikis · 2025-12-19T13:23:32Z

pkg/reconciler/garbagecollector/graph.go

+	"github.com/kcp-dev/kcp/pkg/reconciler/garbagecollector/syncmap"
+)
+
+type Graph struct {


How much of this is upsteam-looks like and how much is our custom stuff? Would be nice to have some mini readme ontop how data is stored, and if some of these are "borowoed" from upstream - links to original source

It's pretty much all custom stuff; I originally wanted to reuse code from upstream but the upstream code is very tightly knit. My initial attempt was to start a graph builder per logical cluster with the same queues, feeding that into a single garbage collector with modifications for cluster-awareness; but that needed so many adjustments that it'd be a nightmare to maintain.

I'll add some more docs

gman0 · 2025-12-29T20:28:56Z

There are a couple gc.log.* lines that are a bit too verbose - printing full (partial metadata) objects. Debugging leftovers?

gman0 · 2025-12-29T20:33:07Z

Also, I'm getting pretty consistent failures with:

./bin/kcp start --feature-gates=WorkspaceMounts=true,CacheAPIs=true,WorkspaceAuthentication=true,KcpNativeGarbageCollector=true

and

go test -p 1 -v -run '^TestGarbageCollectorNormalCRDs$' ./test/e2e/garbagecollector/... --kcp-kubeconfig=$(pwd)/.kcp/admin.kubeconfig

First run succeeds (probably always), the second time (probably always) fails.

gman0 · 2025-12-29T20:44:41Z

pkg/reconciler/garbagecollector/gc_handlers.go

+	// The graph add/update events are not queued into their own worker
+	// queue because updating the graph cannot fail and is reasonably
+	// fast.
+	handlers := cache.ResourceEventHandlerFuncs{


We're inserting all objects in a shard into the GC graph. Can they be filtered and inserted only if they actually contain owner references?

Well, they're references, not the objects themselves. I was thinking of making so a node in the graph only exists when it the respective object owns other objects but decided to leave that for later because I don't think it makes much of a difference in the initial implementation.

Memory-wise - if an object doesn't own any other objects its node in the graph will just be an empty slice.

Performance-wise it might be better because the underlying sync.Map is better with fewer writes and many reads. So not writing empty nodes would be preferable to not busy the underlying sync.Map when the information isn't needed. But I'd prefer to benchmark that to see how much of an impact that will have (also to compare the hashtriemap vs other potential implementations [like a simple map with a lock :D]; though I strongly suspect the hashtriemap will be the best option).

a node in the graph only exists when it the respective object owns other objects

Yes that's what I meant (or inverse of that) :D but fair enough, this can be left for a next PR.

For the rest, I agree -- benchmarks needed. Atomics have their costs too, so it would be interesting to see what is the difference.

+ a benchmark of this GC versus the stock one.

xrstf · 2026-01-05T16:20:45Z

Can you link to some design doc or tickets, to give this more context? Or maybe with a broad brush describe the problems this is solving, the goal and assumptions you made?

Add kcpfeatures.KcpNativeGarbageCollection

93f41e2

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

kcp-ci-bot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. dco-signoff: yes Indicates the PR's author has signed the DCO. labels Dec 19, 2025

kcp-ci-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 19, 2025

ntnn added 5 commits December 19, 2025 12:53

Add syncmap.SyncMap

4d5a5b5

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

Add garbagecollector.ID and .ObjectReference

b4ce031

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

Add graph

1969089

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

Add kcp native gc

639e584

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

Enable kcp native gc for e2e tests

bf6e24b

Signed-off-by: Nelo-T. Wallus <red.brush9525@fastmail.com> Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>

ntnn force-pushed the mcaware-gc-pr branch from 593f27d to bf6e24b Compare December 19, 2025 11:53

mjudeikis reviewed Dec 19, 2025

View reviewed changes

gman0 reviewed Dec 29, 2025

View reviewed changes

Add an initial implementation of a kcp native garbage collector #3768

Are you sure you want to change the base?

Add an initial implementation of a kcp native garbage collector #3768

Uh oh!

Conversation

ntnn commented Dec 19, 2025

Summary

What Type of PR Is This?

Related Issue(s)

Release Notes

Uh oh!

kcp-ci-bot commented Dec 19, 2025

Uh oh!

ntnn commented Dec 19, 2025

Uh oh!

ntnn commented Dec 19, 2025

Uh oh!

mjudeikis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ntnn Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

kcp-ci-bot commented Dec 19, 2025

Uh oh!

mjudeikis left a comment

Choose a reason for hiding this comment

Uh oh!

mjudeikis Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ntnn Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

ntnn Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

gman0 commented Dec 29, 2025

Uh oh!

gman0 commented Dec 29, 2025

Uh oh!

gman0 Dec 29, 2025

Choose a reason for hiding this comment

Uh oh!

ntnn Dec 30, 2025

Choose a reason for hiding this comment

Uh oh!

gman0 Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gman0 Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xrstf commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gman0 Dec 30, 2025 •

edited

Loading

gman0 Dec 30, 2025 •

edited

Loading