add unmanged crd detection to remove duplicated openapi path#348
add unmanged crd detection to remove duplicated openapi path#348krishbajaj1609 wants to merge 1 commit into
Conversation
| // the corresponding OpenAPI path prefix is returned so it can be added to | ||
| // IgnorePrefixes — suppressing reports-server's OpenAPI paths for that group | ||
| // while keeping the APIService request routing fully functional. | ||
| func DetectConflictingCRDs(restConfig *rest.Config) []string { |
There was a problem hiding this comment.
its better if this function returns an error and the error is ignored at the caller level. Also it doesn't have to be public
|
|
||
| for _, crd := range crds { | ||
| group := crd.Spec.Group | ||
| if !conflictGroups[group] || seen[group] { |
There was a problem hiding this comment.
can you make this a variable that gets supplied to the function rather than a global ? it's fine if the caller references the global. but its ok for the caller to pass the global variable. it makes it easier to guess about the function's intent and what it does.
|
|
||
| ctx, cancel := context.WithCancel(context.Background()) | ||
| go func() { | ||
| <-stopCh |
There was a problem hiding this comment.
There are a few things i wanna comment on in this bit:
First, if the PeriodicallyCheckForNewConflicts detect a conflict, it will cancel the context. and RunUntil will exit because the done channel became readable. This means that cleanups we are running on events received through the stopCh won't run.
Second, I think its added complexity to create a context that you cancel when you detect a conflict, run a goroutine that checks if stopCh became readable (something else triggered a server shutdown signal) and cancel the periodic check goroutine.
Why not just pass the stopCh as an argument to the function, run a for select on it becoming readable (something else cancelled) and when a conflict is detected you close this channel yourself ?
Lastly, i am a bit against making this whole thing periodic in the first place. I prefer if its only once on startup. If the admin installed a conflicting CRD, it should be on them to restart the server (to make sure they are doing it in line with keeping their services available).
We can help them out by logging a message that nudges them to do so. Something like there's a CRD installed that conflicts with the CRDs the reports server serves, please ensure to restart the server once after CRD installation to avoid serving duplicate prefixes. Or something less verbose?
| "k8s.io/klog/v2" | ||
| ) | ||
|
|
||
| var conflictGroups = map[string]bool{ |
There was a problem hiding this comment.
a map of string to empty struct is better here. an empty struct is 0 bytes.
|
|
||
| var conflictGroups = map[string]bool{ | ||
| "wgpolicyk8s.io": true, | ||
| "openreports.io": true, |
There was a problem hiding this comment.
why do we only have these and not the other crds the reports server serves ?
| return nil | ||
| } | ||
|
|
||
| ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) |
There was a problem hiding this comment.
why not pass the context given to PeriodicallyCheckForNewConflicts ?
| return detectFromCRDList(crdList.Items) | ||
| } | ||
|
|
||
| func detectFromCRDList(crds []apiextensionsv1.CustomResourceDefinition) []string { |
There was a problem hiding this comment.
i think its best to drop this helper entirely and move its logic to DetectConflictingCrds
| case <-ticker.C: | ||
| current := DetectConflictingCRDs(restConfig) | ||
| for _, p := range current { | ||
| if !known[p] { |
There was a problem hiding this comment.
here, you are checking if we detected new conflicting CRDs other than the ones in config.OpenAPIIgnorePrefixes but this field won't persist across startups anyways, and the user doesn't have a way to supply their own ignore prefixes. so why not just skip passing and comparing against the initial prefixes array and report if we detect any of the api groups that can cause problems in a CRD ?
| // If a new conflicting CRD appears that was not present at startup, the cancel | ||
| // function is called to trigger a graceful restart. | ||
| func PeriodicallyCheckForNewConflicts(ctx context.Context, cancel context.CancelFunc, restConfig *rest.Config, initialPrefixes []string) { | ||
| known := make(map[string]bool, len(initialPrefixes)) |
|
@krishbajaj1609 |
Summary
IgnorePrefixes,eliminating duplicate-path errors while keeping APIService request routing fully functional
Context
When both a CRD and an APIService exist for the same API group/version (e.g.,
wgpolicyk8s.io/v1alpha2), the kube-apiserver's OpenAPI aggregator can produce duplicate-path errors (K8s 1.28+), breaking ArgoCD, kubectl apply, and Helm validation. Actual API request handling is unaffected — the problem is only in the OpenAPI documentation/discovery layer.Related: kubernetes/kubernetes#122668, kubernetes/kubernetes#129499
Evidence
(kind, K8s 1.32)
Periodic detection and restart:
When a conflicting CRD (
policyreports.wgpolicyk8s.io) was installed while reports-server was running, the periodic checker detected it within 2 minutes and triggered a graceful restart detect.go:59 "detected third-party CRD for API group served by reports-server, suppressing OpenAPI paths to avoid duplicate-path errors" group="wgpolicyk8s.io"periodic.go:31 "new conflicting CRD detected since startup, restarting to apply OpenAPI suppression" prefix="/apis/wgpolicyk8s.io/"
Startup detection after restart:
On restart, the CRD was detected at startup and
IgnorePrefixeswas set beforeComplete():detect.go:59 "detected third-party CRD for API group served by reports-server suppressing OpenAPI paths to avoid duplicate-path errors" group="wgpolicyk8s.io"
OpenAPI suppression verified (per-group OpenAPI v3):
wgpolicyk8s.io(with fix)wgpolicyk8s.io(without fix)openreports.ioreports.kyverno.ioFunctional verification (with fix + CRDs present):
kubectl applyPolicyReport: works, stored in reports-server (served-by: reports-serverannotation)kubectl get/delete: worksv1alpha2.wgpolicyk8s.io: Available=TrueTest plan
closes #344