Skip to content

Demote steady-state discovery and per-task logs that spam at INFO/ERROR#7489

Open
pvditt wants to merge 4 commits into
mainfrom
pvditt/clean-up-logs
Open

Demote steady-state discovery and per-task logs that spam at INFO/ERROR#7489
pvditt wants to merge 4 commits into
mainfrom
pvditt/clean-up-logs

Conversation

@pvditt

@pvditt pvditt commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Problem

Production log review of two data planes running the leaseworker (which hosts the connector webapi plugin) showed the overwhelming majority of log output was repetitive steady-state noise rather than signal:

  • In one 8-minute sample, 88% of lines came from the connector discovery poll loop, which rebuilds its registry every pollInterval (default 10s) and logs the full cycle at INFO every time — resolver start, watcher event, resolver close, per-deployment task types, aggregate task types — even when nothing changed.
  • The k8s resolver watcher line had a format bug: logger.Info(ctx, "...: [%v]", event) never applies the format string, so it printed a literal [%v] followed by the entire Endpoints object (managedFields and all, ~3–4 KB per line, every 10s).
  • containerStatus IndexOutOfBound logs at ERROR for every Pending pod that simply has no container statuses yet — an expected transient state that pollutes error-rate dashboards.
  • The webapi phase-transition log read "Moving Phase for from X to Y" — a broken message with no task identifier, making it unattributable.
  • Per-task ApplyResourceOverrides/Adjusted container resources struct dumps and a contentMD5 cast miss (expected for objects without MD5 metadata) logged at INFO on every occurrence.

Changes

File Change
flytestdlib/resolver/k8s_resolver.go Resolver start/close → Debug; watcher event → Debug, fixed %v bug, logs event.Type instead of the full Endpoints dump
connector/client.go Per-deployment and per-rebuild task-type lines → Debug
connector/plugin.go Aggregate supported-task-types now logs at INFO only when the set changesSetSupportedTaskType normalizes (sort + dedup) and compares against the stored set under its existing mutex, covering both the init path and the poll loop
flytestdlib/storage/stow_store.go contentMD5 cast miss → Debug
webapi/monitor.go Phase transition → Debug; fixed message and added the task ID (cacheItemID)
logs/logging_utils.go containerStatus IndexOutOfBound: Debug when the pod is Pending (containerStatuses is empty by design until containers start), Warn for any other phase where it is genuinely unexpected; kept the spec-containers sibling check at Error
flytek8s/container_helper.go Resource-override struct dumps → Debug

Net effect on the sampled deployments: ~95% reduction in INFO/ERROR line volume; what remains is actual lifecycle signal. The useful "what task types can this worker run" line is preserved at INFO — it now fires once at startup and again only on change.

Verification

  • go build + go vet clean for all touched packages
  • go test passes: webapi/connector, internal/webapi, tasks/logs, flytek8s, flytestdlib/storage
  • gofmt clean

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings June 10, 2026 05:24

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Reduces steady-state log noise across the connector discovery loop, k8s resolver watcher, webapi phase monitoring, resource override logging, and log-fetching helpers by demoting repetitive logs and tightening high-volume messages.

Changes:

  • Demotes repetitive steady-state INFO/ERROR logs to DEBUG/WARN across resolver/connector/webapi/logging utilities.
  • Fixes noisy watcher-event logging by logging only the event type (instead of dumping full Endpoints objects).
  • Logs connector supported task types at INFO only when the set changes (normalized via sort/dedup) rather than every poll.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
flytestdlib/storage/stow_store.go Demotes contentMD5 cast-miss logging to DEBUG (but see comment about logging the wrong value / logging on missing key).
flytestdlib/resolver/k8s_resolver.go Demotes resolver lifecycle logs to DEBUG and reduces watcher-event verbosity to event type.
flyteplugins/go/tasks/plugins/webapi/connector/plugin.go Adds “log only on change” behavior for supported task types at INFO.
flyteplugins/go/tasks/plugins/webapi/connector/client.go Demotes per-deployment and per-registry rebuild task-type logs to DEBUG.
flyteplugins/go/tasks/pluginmachinery/internal/webapi/monitor.go Demotes phase transition logging to DEBUG and includes cache item/task id in the message.
flyteplugins/go/tasks/pluginmachinery/flytek8s/container_helper.go Demotes resource override / adjusted resources struct dumps to DEBUG.
flyteplugins/go/tasks/logs/logging_utils.go Demotes expected containerStatus out-of-bounds condition from ERROR to WARN for Pending pods.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 244 to 247
contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string)
if !ok {
logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
}
Comment on lines 181 to 185
case event, ok := <-watcher.ResultChan():
logger.Info(k.ctx, "k8s resolver watcher event response: [%v]", event)
logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type)
if !ok {
logger.Debugf(k.ctx, "k8s resolver: watcher closed")
return
Copilot AI review requested due to automatic review settings June 10, 2026 05:57
pvditt and others added 3 commits June 9, 2026 22:59
Connector discovery rebuilds its registry every pollInterval (10s) and
logged the full cycle at INFO each time; the k8s resolver watcher also
dumped entire Endpoints objects via an unformatted %v. The aggregate
supported-task-types line now logs only when the set changes; the rest
moves to Debug. Also demotes per-task resource-override dumps, the
contentMD5 cast miss, and the containerStatus IndexOutOfBound error
that fires for every Pending pod, and attaches the task ID to the
webapi phase-transition log.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
containerStatuses is empty by design until the kubelet starts the
containers, so every Pending pod hit this branch on every evaluation
and the message read as an error. Pending now logs at Debug; any other
phase keeps the Warn since a short containerStatuses there is genuinely
unexpected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
The setter owns the field and the mutex, so comparing there covers
every caller uniformly instead of only the watchConnectors poll loop,
and drops the closure-side lastSupported bookkeeping. The stored slice
is now normalized (sorted, deduped), which ContainTaskType's
slices.Contains lookup is indifferent to.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
@pvditt pvditt force-pushed the pvditt/clean-up-logs branch from 6cf9f81 to fdfa5f2 Compare June 10, 2026 05:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

flytestdlib/resolver/k8s_resolver.go:186

  • The watcher event is logged before checking ok. When the result channel is closed, event will be the zero value and this will emit a misleading log line (empty type) right before the "watcher closed" message. Check ok first, then log the event type.
			logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type)
			if !ok {
				logger.Debugf(k.ctx, "k8s resolver: watcher closed")
				return
			}

Comment on lines +51 to 53
func (p *ConnectorService) SetSupportedTaskType(ctx context.Context, taskTypes []string) {
normalized := slices.Compact(slices.Sorted(slices.Values(taskTypes)))
p.mu.Lock()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we apply its suggestion?

Comment on lines 244 to 247
contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string)
if !ok {
logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
}
Covers the behavior the change-detection depends on: stored set is
sorted and deduped, order/duplicate variations of the same set
normalize identically, and ContainTaskType tracks adds and removals.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Paul Dittamo <pvdittamo@gmail.com>
Copilot AI review requested due to automatic review settings June 10, 2026 06:15

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment on lines 244 to +246
contentMD5, ok := metadata[strings.ToLower(FlyteContentMD5)].(string)
if !ok {
logger.Infof(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
logger.Debugf(ctx, "Failed to cast contentMD5 [%v] to string", contentMD5)
Comment on lines 181 to 185
case event, ok := <-watcher.ResultChan():
logger.Info(k.ctx, "k8s resolver watcher event response: [%v]", event)
logger.Debugf(k.ctx, "k8s resolver watcher event: [%s]", event.Type)
if !ok {
logger.Debugf(k.ctx, "k8s resolver: watcher closed")
return
@pingsutw pingsutw added this to the V2 GA milestone Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants