Skip to content

Add automatic reconnect on Ironic authentication failure#10

Merged
tzumainn merged 1 commit into
osac-project:mainfrom
DanNiESh:auth-reconnect
Jun 5, 2026
Merged

Add automatic reconnect on Ironic authentication failure#10
tzumainn merged 1 commit into
osac-project:mainfrom
DanNiESh:auth-reconnect

Conversation

@DanNiESh
Copy link
Copy Markdown
Contributor

@DanNiESh DanNiESh commented Jun 5, 2026

Port the reconnect logic from the old internal/ironic package (PR #7) into the new internal/management layer. When gophercloud's built-in token refresh fails, the client detects auth errors (401, reauth failures), recreates the service client from scratch, and retries the operation once.
Part of osac-project/issues#394

Summary by CodeRabbit

  • Bug Fixes

    • OpenStack baremetal service now automatically detects authentication failures and reconnects before retrying operations, improving service resilience during temporary auth issues.
  • Tests

    • Added comprehensive test coverage for authentication error handling and client reconnection logic.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 5, 2026

Review Change Stack

Walkthrough

The OpenStack baremetal client now automatically recovers from authentication failures. The client stores a service client factory function, detects auth-related errors (HTTP 401 and reauthentication exceptions), and reconnects with a fresh client before retrying failed power-state operations in both GetPowerState and SetPowerState methods.

Changes

Authentication Failure Recovery with Reconnect and Retry

Layer / File(s) Summary
Client factory and initialization refactoring
internal/management/openstack.go
OpenStackClient struct now carries a newServiceClient factory function. NewOpenStackClient is refactored to build provider and baremetal client through a reusable closure, returning both the initial client and the factory for later on-demand reconnects.
Auth error detection, reconnect mechanism, and GetPowerState retry
internal/management/openstack.go, internal/management/openstack_internal_test.go
New isAuthError helper detects HTTP 401 and gophercloud reauthentication failure types. New reconnect method rebuilds the baremetal client using the factory, validates endpoint presence, and swaps the client in OpenStackClient. GetPowerState detects auth errors, invokes reconnect, and retries node fetch; non-auth errors are wrapped and returned. Comprehensive test coverage validates isAuthError across nil, generic, 401, and reauthentication error types (including wrapped variants) and validates reconnect success/failure with proper client swapping and error preservation.
SetPowerState resilience with context logging and retry
internal/management/openstack.go
SetPowerState initializes a request-scoped logger. When nodes.ChangePowerState fails with detected auth error, the method reconnects via the factory and retries the operation. HTTP 409 conflict responses map to ErrTransitioning; all other errors are wrapped and returned from both initial and post-reconnect attempts.

Sequence Diagram(s)

No sequence diagram generated; the changes are localized error-handling enhancements with straightforward retry logic best understood through flowchart representation already provided in the review stack artifact.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Complexity reasoning: This PR introduces a stateful reconnection mechanism with error-type detection, factory-based client rebuilding, and dual-retry paths in two methods. The changes span imports, struct definitions, initialization refactoring, helper functions, method implementations, and comprehensive test coverage. The logic is coherent but requires careful review of error classification, client state transitions, and retry semantics to ensure authentication recovery does not mask other failure types or leave the client in an inconsistent state. No high-density logic, but multiple interconnected concerns across the codebase.

Possibly related PRs

  • osac-project/host-management-openstack#7: Implements the same auth-error detection and reconnect/retry pattern for Ironic node and power operations, using an identical factory-based client rebuild design.

Suggested reviewers

  • larsks
  • tzumainn

🔄 A phoenix client rises,

When auth credentials expire—

Reconnect, retry, thrive. 🌟


Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

  • Ignore

❌ Failed checks (1 error, 2 warnings)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error Line 110 logs sc.Endpoint which exposes internal Ironic hostname/URL, violating the security guideline against logging internal hostnames in routine logs. Remove the "endpoint", sc.Endpoint argument from line 110 log statement to prevent exposure of internal service URLs in routine logs.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Ai-Attribution ⚠️ Warning Commit uses Co-Authored-By trailer for AI (Claude Opus 4.6), but Red Hat attribution guidelines recommend Assisted-by or Generated-by trailers instead. Co-Authored-By is meant for human collaborators. Replace Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com with Assisted-by: Claude Opus 4.6 per Red Hat open source attribution norms for AI-assisted development.
✅ Passed checks (8 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding automatic reconnect logic triggered by Ironic authentication failures, which aligns with the changeset's core objective of porting reconnect logic to handle auth failures.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Hardcoded-Secrets ✅ Passed No hardcoded secrets (API keys, tokens, passwords, private keys, credentials, base64 strings >32 chars, or URLs with embedded credentials) detected in the PR code.
No-Weak-Crypto ✅ Passed No weak crypto algorithms (MD5, SHA1, DES, RC4, 3DES, Blowfish, ECB), custom crypto implementations, or non-constant-time secret comparisons detected in the modified files.
No-Injection-Vectors ✅ Passed No injection vectors detected. Code safely uses gophercloud API with parameters passed as structured arguments, not shell/SQL/eval commands. Error wrapping and logging use proper format specifiers.
Container-Privileges ✅ Passed No privileged K8s settings detected. All 21 YAML manifests enforce: runAsNonRoot, allowPrivilegeEscalation=false, drop ALL caps, RuntimeDefault seccomp policy.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]
coderabbitai Bot previously requested changes Jun 5, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/management/openstack_internal_test.go`:
- Around line 127-140: Test currently only checks error text when reconnect
returns a client with empty Endpoint; also assert that the existing client
pointer remains unchanged: after calling c.reconnect(context.Background()) and
Expect(err).To(HaveOccurred()), add an assertion that c.client is still the
original oldSC (e.g., Expect(c.client).To(Equal(oldSC))) so the test verifies
the old client survives the failed swap in the OpenStackClient.reconnect path.

In `@internal/management/openstack.go`:
- Line 110: Remove the sensitive endpoint value from the reconnect success log:
update the log.Info call that currently logs "ironic service client reconnected
successfully" with the "endpoint", sc.Endpoint pair so it no longer includes
sc.Endpoint (either omit the "endpoint" key/value or replace it with a redacted
placeholder). Locate the log.Info invocation referencing sc.Endpoint in
internal/management/openstack.go (the reconnect success log) and modify it to
log only the success message (or a redacted endpoint) to avoid exposing internal
hostnames/URLs.
- Around line 27-29: The shared managementClient mutates c.client in reconnect
while GetPowerState and SetPowerState read it, causing a data race; protect
c.client with a sync.RWMutex on the managementClient struct (add a mutex field),
acquire a read lock when reading/copying c.client in GetPowerState/SetPowerState
and a write lock when swapping it in reconnect/newServiceClient, or copy the
client under lock then operate on the copy to avoid holding the lock long; also
remove the "endpoint", sc.Endpoint argument from the log call in reconnect to
avoid emitting internal hostnames.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: osac-project/coderabbit/.coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: ca3d0c8d-6fc6-403f-b65a-54ee14ce8e2e

📥 Commits

Reviewing files that changed from the base of the PR and between a126183 and 693a781.

📒 Files selected for processing (2)
  • internal/management/openstack.go
  • internal/management/openstack_internal_test.go

Comment thread internal/management/openstack_internal_test.go
Comment thread internal/management/openstack.go
Comment thread internal/management/openstack.go Outdated
Port the reconnect logic from the old internal/ironic package (PR osac-project#7)
into the new internal/management layer. When gophercloud's built-in
token refresh fails, the client detects auth errors (401, reauth
failures), recreates the service client from scratch, and retries
the operation once.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai coderabbitai Bot dismissed their stale review June 5, 2026 16:28

Issue tracked in #11 for future resolution.

@DanNiESh DanNiESh requested a review from tzumainn June 5, 2026 16:29
Copy link
Copy Markdown

@tzumainn tzumainn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks!

@tzumainn tzumainn merged commit a1bda84 into osac-project:main Jun 5, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants