Demote steady-state discovery and per-task logs that spam at INFO/ERROR#7489
Open
pvditt wants to merge 4 commits into
Open
Demote steady-state discovery and per-task logs that spam at INFO/ERROR#7489pvditt wants to merge 4 commits into
pvditt wants to merge 4 commits into
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
Reduces steady-state log noise across the connector discovery loop, k8s resolver watcher, webapi phase monitoring, resource override logging, and log-fetching helpers by demoting repetitive logs and tightening high-volume messages.
Changes:
- Demotes repetitive steady-state INFO/ERROR logs to DEBUG/WARN across resolver/connector/webapi/logging utilities.
- Fixes noisy watcher-event logging by logging only the event type (instead of dumping full Endpoints objects).
- Logs connector supported task types at INFO only when the set changes (normalized via sort/dedup) rather than every poll.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| flytestdlib/storage/stow_store.go | Demotes contentMD5 cast-miss logging to DEBUG (but see comment about logging the wrong value / logging on missing key). |
| flytestdlib/resolver/k8s_resolver.go | Demotes resolver lifecycle logs to DEBUG and reduces watcher-event verbosity to event type. |
| flyteplugins/go/tasks/plugins/webapi/connector/plugin.go | Adds “log only on change” behavior for supported task types at INFO. |
| flyteplugins/go/tasks/plugins/webapi/connector/client.go | Demotes per-deployment and per-registry rebuild task-type logs to DEBUG. |
| flyteplugins/go/tasks/pluginmachinery/internal/webapi/monitor.go | Demotes phase transition logging to DEBUG and includes cache item/task id in the message. |
| flyteplugins/go/tasks/pluginmachinery/flytek8s/container_helper.go | Demotes resource override / adjusted resources struct dumps to DEBUG. |
| flyteplugins/go/tasks/logs/logging_utils.go | Demotes expected containerStatus out-of-bounds condition from ERROR to WARN for Pending pods. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
244
to
247
| contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string) | ||
| if !ok { | ||
| logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) | ||
| logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) | ||
| } |
Comment on lines
181
to
185
| case event, ok := <-watcher.ResultChan(): | ||
| logger.Info(k.ctx, "k8s resolver watcher event response: [%v]", event) | ||
| logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type) | ||
| if !ok { | ||
| logger.Debugf(k.ctx, "k8s resolver: watcher closed") | ||
| return |
Connector discovery rebuilds its registry every pollInterval (10s) and logged the full cycle at INFO each time; the k8s resolver watcher also dumped entire Endpoints objects via an unformatted %v. The aggregate supported-task-types line now logs only when the set changes; the rest moves to Debug. Also demotes per-task resource-override dumps, the contentMD5 cast miss, and the containerStatus IndexOutOfBound error that fires for every Pending pod, and attaches the task ID to the webapi phase-transition log. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
containerStatuses is empty by design until the kubelet starts the containers, so every Pending pod hit this branch on every evaluation and the message read as an error. Pending now logs at Debug; any other phase keeps the Warn since a short containerStatuses there is genuinely unexpected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
The setter owns the field and the mutex, so comparing there covers every caller uniformly instead of only the watchConnectors poll loop, and drops the closure-side lastSupported bookkeeping. The stored slice is now normalized (sorted, deduped), which ContainTaskType's slices.Contains lookup is indifferent to. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
6cf9f81 to
fdfa5f2
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
flytestdlib/resolver/k8s_resolver.go:186
- The watcher event is logged before checking
ok. When the result channel is closed,eventwill be the zero value and this will emit a misleading log line (empty type) right before the "watcher closed" message. Checkokfirst, then log the event type.
logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type)
if !ok {
logger.Debugf(k.ctx, "k8s resolver: watcher closed")
return
}
Comment on lines
+51
to
53
| func (p *ConnectorService) SetSupportedTaskType(ctx context.Context, taskTypes []string) { | ||
| normalized := slices.Compact(slices.Sorted(slices.Values(taskTypes))) | ||
| p.mu.Lock() |
Member
There was a problem hiding this comment.
should we apply its suggestion?
Comment on lines
244
to
247
| contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string) | ||
| if !ok { | ||
| logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) | ||
| logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) | ||
| } |
Covers the behavior the change-detection depends on: stored set is sorted and deduped, order/duplicate variations of the same set normalize identically, and ContainTaskType tracks adds and removals. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Comment on lines
244
to
+246
| contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string) | ||
| if !ok { | ||
| logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) | ||
| logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5) |
Comment on lines
181
to
185
| case event, ok := <-watcher.ResultChan(): | ||
| logger.Info(k.ctx, "k8s resolver watcher event response: [%v]", event) | ||
| logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type) | ||
| if !ok { | ||
| logger.Debugf(k.ctx, "k8s resolver: watcher closed") | ||
| return |
pingsutw
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Production log review of two data planes running the leaseworker (which hosts the connector webapi plugin) showed the overwhelming majority of log output was repetitive steady-state noise rather than signal:
pollInterval(default 10s) and logs the full cycle at INFO every time — resolver start, watcher event, resolver close, per-deployment task types, aggregate task types — even when nothing changed.logger.Info(ctx, "...: [%v]", event)never applies the format string, so it printed a literal[%v]followed by the entireEndpointsobject (managedFields and all, ~3–4 KB per line, every 10s).containerStatus IndexOutOfBoundlogs at ERROR for every Pending pod that simply has no container statuses yet — an expected transient state that pollutes error-rate dashboards."Moving Phase for from X to Y"— a broken message with no task identifier, making it unattributable.ApplyResourceOverrides/Adjusted container resourcesstruct dumps and acontentMD5cast miss (expected for objects without MD5 metadata) logged at INFO on every occurrence.Changes
flytestdlib/resolver/k8s_resolver.go%vbug, logsevent.Typeinstead of the full Endpoints dumpconnector/client.goconnector/plugin.goSetSupportedTaskTypenormalizes (sort + dedup) and compares against the stored set under its existing mutex, covering both the init path and the poll loopflytestdlib/storage/stow_store.gowebapi/monitor.gocacheItemID)logs/logging_utils.goflytek8s/container_helper.goNet effect on the sampled deployments: ~95% reduction in INFO/ERROR line volume; what remains is actual lifecycle signal. The useful "what task types can this worker run" line is preserved at INFO — it now fires once at startup and again only on change.
Verification
go build+go vetclean for all touched packagesgo testpasses:webapi/connector,internal/webapi,tasks/logs,flytek8s,flytestdlib/storagegofmtclean🤖 Generated with Claude Code