Skip to content

Conversation

@drduker
Copy link

@drduker drduker commented Nov 28, 2025

This implementation adds Google Cloud Platform OAuth 2.0 authentication to Headlamp, providing a replacement for the deprecated Identity Service for GKE. Users can now authenticate with their Google Cloud accounts when accessing Headlamp deployed on GKE clusters.

Backend Changes

  • Add GCP OAuth configuration system (backend/pkg/config/config.go)
  • Implement GCP authenticator with PKCE support (backend/pkg/gcp/auth.go)
  • Add OAuth HTTP handlers for login, callback, and feature detection (backend/pkg/auth/gcp.go)
  • Register OAuth routes: /gcp-auth/login, /gcp-auth/callback, /gcp-auth/enabled
  • Support for token refresh and caching
  • Environment variable configuration for OAuth credentials

Frontend Changes

  • Add GCPLoginButton component for Google sign-in (frontend/src/components/cluster/GCPLoginButton.tsx)
  • Implement GKE cluster detection (frontend/src/lib/k8s/gke.ts)
  • Integrate OAuth flow into AuthChooser component
  • Add backend feature detection to conditionally show GCP login option
  • Disable auto-redirect to token page to allow users to see authentication options

Key Features

  • RFC 7636 compliant PKCE (Proof Key for Code Exchange) implementation
  • Base64url encoding without padding for code challenge
  • OAuth state parameter for CSRF protection
  • Automatic token refresh handling
  • GKE cluster detection based on server URL patterns
  • Comprehensive error handling and logging

Documentation

  • Comprehensive setup guide (docs/GCP_OAUTH_SETUP.md)
  • Implementation status and troubleshooting (docs/GCP_OAUTH_IMPLEMENTATION_STATUS.md)
  • Deployment instructions for GKE
  • RBAC configuration examples

Testing

  • Unit tests for GCP authenticator functions
  • Unit tests for GKE cluster detection
  • Component tests for GCPLoginButton

This implementation has been tested and verified to work with GKE clusters, including successful OAuth flow initiation and PKCE code challenge generation.

Summary

This PR adds/fixes [feature/bug] by [brief description of what the change does].

Related Issue

Fixes #ISSUE_NUMBER

Changes

  • Added/Updated [component/file/logic]
  • Fixed [bug/issue/typo]
  • Refactored [code/module] for clarity/performance

Steps to Test

  1. [Step 1: e.g., Navigate to ...]
  2. [Step 2: Click on ...]
  3. [Step 3: Observe behavior or check logs/output]

Screenshots (if applicable)

Notes for the Reviewer

  • [e.g., This touches the i18n layer, so please check language consistency.]

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 28, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

  • ✅ login: drduker / name: drduker (8da1788)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: drduker
Once this PR has been reviewed and has the lgtm label, please assign sniok for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @drduker!

It looks like this is your first PR to kubernetes-sigs/headlamp 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/headlamp has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 28, 2025
@drduker drduker force-pushed the feature/gcp-oauth-authentication branch from 32982e2 to fc40cb0 Compare November 28, 2025 06:13
@illume illume requested a review from Copilot November 28, 2025 14:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Google Cloud Platform OAuth 2.0 authentication support to Headlamp, providing a modern replacement for the deprecated GKE Identity Service. The implementation enables users to authenticate with their Google Cloud accounts when accessing Headlamp deployed on GKE clusters.

Key Changes:

  • Implements RFC 7636-compliant PKCE OAuth 2.0 flow with Google as the identity provider
  • Adds automatic GKE cluster detection based on server URL patterns
  • Integrates OAuth flow into the authentication chooser UI with conditional rendering

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
frontend/src/lib/k8s/gke.ts Implements GKE cluster detection and OAuth initiation utilities
frontend/src/lib/k8s/gke.test.ts Comprehensive unit tests for GKE utilities
frontend/src/lib/k8s/cluster.ts Adds optional server property to Cluster interface
frontend/src/components/cluster/GCPLoginButton.tsx React component for Google sign-in button with conditional rendering
frontend/src/components/cluster/GCPLoginButton.test.tsx Component tests for GCPLoginButton
frontend/src/components/authchooser/index.tsx Integrates GCP OAuth option and disables auto-redirect to token page
backend/pkg/gcp/auth.go Core OAuth 2.0 authenticator with PKCE, token refresh, and caching
backend/pkg/gcp/auth_test.go Unit tests for GCP authenticator functions
backend/pkg/auth/gcp.go HTTP handlers for OAuth login, callback, and token refresh flows
backend/pkg/config/config.go Adds GCP OAuth configuration fields with validation
backend/cmd/server.go Populates GCP OAuth configuration from config
backend/cmd/headlamp.go Registers OAuth routes and clears in-cluster auth when GCP OAuth enabled
backend/go.mod Adds cloud.google.com/go/compute/metadata dependency
backend/go.sum Updates dependency checksums including golang.org/x/sys version bump
docs/GCP_OAUTH_GKE_SETUP.md Comprehensive deployment and configuration guide

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@skoeva skoeva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for looking into this.

I see the PR was generated with Claude, please make sure to thoroughly go through the Copilot comments, test your changes, and ensure tests pass before marking this PR ready for review.

@skoeva skoeva marked this pull request as draft December 1, 2025 16:04
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 1, 2025
@drduker
Copy link
Author

drduker commented Dec 1, 2025

Hi, thanks for looking into this.

I see the PR was generated with Claude, please make sure to thoroughly go through the Copilot comments, test your changes, and ensure tests pass before marking this PR ready for review.

Yes, was just trying to get it working, which it is. Not sure how much more time want to put into this yet.

@skoeva
Copy link
Contributor

skoeva commented Dec 1, 2025

No worries! That's good to know. Feel free to ping if you would like someone else to take this over

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2025
@illume
Copy link
Contributor

illume commented Dec 2, 2025

Not sure how much more time want to put into this yet.

Ok, no worries. Thanks for sharing anyway.

Let’s leave this open for a while longer. If someone else wants to pick this up they can :) Otherwise we can close this and it’s archived for anyone searching who may be interested.

@drduker drduker force-pushed the feature/gcp-oauth-authentication branch from fc40cb0 to 9f382d6 Compare December 4, 2025 19:10
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 4, 2025
@drduker drduker marked this pull request as ready for review December 4, 2025 19:11
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 4, 2025
@drduker
Copy link
Author

drduker commented Dec 4, 2025

@illume @skoeva - should have fixed the copilot issues and pipeline linting issues.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Dec 4, 2025
@drduker
Copy link
Author

drduker commented Dec 4, 2025

Just signed the CLA thing.

@drduker drduker force-pushed the feature/gcp-oauth-authentication branch 2 times, most recently from 799c0b4 to df28d25 Compare December 6, 2025 03:47
@illume illume requested a review from Copilot December 6, 2025 18:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 28 out of 31 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • frontend/package-lock.json: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

if cluster == "" {
http.Error(w, "cluster parameter required", http.StatusBadRequest)
return
}
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster parameter should be validated against validClusterNamePattern before storing it in a cookie. Currently, validation only happens in the callback handler, which means an attacker could inject malicious cluster names that get stored in cookies during the login flow. Add validation after line 50:

if !validClusterNamePattern.MatchString(cluster) {
	http.Error(w, "invalid cluster name format", http.StatusBadRequest)
	return
}
Suggested change
}
}
if !validClusterNamePattern.MatchString(cluster) {
http.Error(w, "invalid cluster name format", http.StatusBadRequest)
return
}

Copilot uses AI. Check for mistakes.
spec:
containers:
- name: headlamp
image: lucaspick/headlamp-gcp-oauth:v6 # Use your custom image
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation references a personal Docker image lucaspick/headlamp-gcp-oauth:v6. This should be updated to reference the official Headlamp image once this feature is merged, or provide instructions for users to build their own image. Consider using a placeholder like headlamp/headlamp:latest or add a note that users need to build from this branch.

Copilot uses AI. Check for mistakes.
Comment on lines 43 to 292
// HandleGCPAuthLogin initiates the GCP OAuth login flow for GKE clusters.
func HandleGCPAuthLogin(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
cluster := r.URL.Query().Get("cluster")
if cluster == "" {
http.Error(w, "cluster parameter required", http.StatusBadRequest)
return
}

// Generate state token for CSRF protection
state, err := gcp.GenerateRandomState()
if err != nil {
logger.Log(logger.LevelError, nil, err, "failed to generate state")
http.Error(w, "failed to generate state", http.StatusInternalServerError)

return
}

// Generate PKCE code verifier and challenge for enhanced security
codeVerifier, err := gcp.GenerateCodeVerifier()
if err != nil {
logger.Log(logger.LevelError, nil, err, "failed to generate code verifier")
http.Error(w, "failed to generate code verifier", http.StatusInternalServerError)

return
}

codeChallenge := gcp.GenerateCodeChallenge(codeVerifier)

secure := IsSecureContext(r)

// Store state, cluster, and PKCE verifier in cookies for validation in callback
setOAuthCookie(w, gcpOAuthStateCookie, state, secure)
setOAuthCookie(w, gcpOAuthClusterCookie, cluster, secure)
setOAuthCookie(w, gcpOAuthVerifierCookie, codeVerifier, secure)

// Redirect to Google OAuth
authURL := gcpAuth.GetAuthCodeURL(state, codeChallenge)

logger.Log(logger.LevelInfo, map[string]string{
"cluster": cluster,
}, nil, "initiating GCP OAuth flow")

http.Redirect(w, r, authURL, http.StatusFound)
}
}

// gcpCallbackData holds validated data from the OAuth callback.
type gcpCallbackData struct {
cluster string
codeVerifier string
code string
}

// validateGCPCallback validates the OAuth callback request and returns extracted data.
func validateGCPCallback(r *http.Request) (*gcpCallbackData, error) {
// Validate state token (CSRF protection)
stateCookie, err := r.Cookie(gcpOAuthStateCookie)
if err != nil {
return nil, fmt.Errorf("state cookie not found: %w", err)
}

stateParam := r.URL.Query().Get("state")
if stateCookie.Value != stateParam {
return nil, fmt.Errorf("state mismatch: cookie=%s, param=%s", stateCookie.Value, stateParam)
}

// Get cluster from cookie
clusterCookie, err := r.Cookie(gcpOAuthClusterCookie)
if err != nil {
return nil, fmt.Errorf("cluster cookie not found: %w", err)
}

cluster := clusterCookie.Value
if !validClusterNamePattern.MatchString(cluster) {
return nil, fmt.Errorf("invalid cluster name format: %s", cluster)
}

// Check for OAuth errors
if errParam := r.URL.Query().Get("error"); errParam != "" {
errDesc := r.URL.Query().Get("error_description")
return nil, fmt.Errorf("OAuth error: %s - %s", errParam, errDesc)
}

code := r.URL.Query().Get("code")
if code == "" {
return nil, fmt.Errorf("no code in request")
}

// Get PKCE code verifier (optional)
codeVerifier := ""
if verifierCookie, err := r.Cookie(gcpOAuthVerifierCookie); err == nil {
codeVerifier = verifierCookie.Value
}

return &gcpCallbackData{
cluster: cluster,
codeVerifier: codeVerifier,
code: code,
}, nil
}

// HandleGCPAuthCallback handles the OAuth callback from Google.
func HandleGCPAuthCallback(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

data, err := validateGCPCallback(r)
if err != nil {
logger.Log(logger.LevelError, nil, err, "OAuth callback validation failed")
http.Error(w, err.Error(), http.StatusBadRequest)

return
}

token, err := gcpAuth.Exchange(ctx, data.code, data.codeVerifier)
if err != nil {
logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, err, "failed to exchange code")
http.Error(w, "failed to exchange token", http.StatusInternalServerError)

return
}

gkeToken, err := gcpAuth.GetGKEAccessToken(ctx, token)
if err != nil {
logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, err, "failed to get GKE token")
http.Error(w, "failed to get GKE token", http.StatusInternalServerError)

return
}

// Cache the refresh token (non-fatal if it fails)
if token.RefreshToken != "" {
if cacheErr := gcpAuth.CacheRefreshToken(ctx, data.cluster, gkeToken, token.RefreshToken); cacheErr != nil {
logger.Log(logger.LevelError, map[string]string{"cluster": data.cluster}, cacheErr, "failed to cache refresh token")
}
}

SetTokenCookie(w, r, data.cluster, gkeToken, baseURL)

secure := IsSecureContext(r)
clearOAuthCookie(w, gcpOAuthStateCookie, secure)
clearOAuthCookie(w, gcpOAuthClusterCookie, secure)
clearOAuthCookie(w, gcpOAuthVerifierCookie, secure)

logger.Log(logger.LevelInfo, map[string]string{"cluster": data.cluster}, nil, "GCP OAuth flow completed")

redirectURL := fmt.Sprintf("/#/c/%s", data.cluster)
if baseURL != "" {
redirectURL = "/" + baseURL + redirectURL
}

http.Redirect(w, r, redirectURL, http.StatusFound)
}
}

// HandleGCPTokenRefresh handles token refresh requests for GKE clusters.
func HandleGCPTokenRefresh(gcpAuth *gcp.GCPAuthenticator, baseURL string) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
ctx := r.Context()

cluster, token := ParseClusterAndToken(r)
if cluster == "" || token == "" {
http.Error(w, "cluster and token required", http.StatusBadRequest)
return
}

// Get cached refresh token
refreshToken, err := gcpAuth.GetCachedRefreshToken(ctx, cluster, token)
if err != nil {
logger.Log(logger.LevelError, map[string]string{
"cluster": cluster,
}, err, "failed to get cached refresh token")
http.Error(w, "no refresh token available", http.StatusUnauthorized)

return
}

// Refresh the token
newToken, err := gcpAuth.RefreshToken(ctx, refreshToken)
if err != nil {
logger.Log(logger.LevelError, map[string]string{
"cluster": cluster,
}, err, "failed to refresh token")
http.Error(w, "failed to refresh token", http.StatusInternalServerError)

return
}

// Get new GKE access token
newGKEToken, err := gcpAuth.GetGKEAccessToken(ctx, newToken)
if err != nil {
logger.Log(logger.LevelError, map[string]string{
"cluster": cluster,
}, err, "failed to get new GKE access token")
http.Error(w, "failed to get new GKE token", http.StatusInternalServerError)

return
}

// Cache the new refresh token if we got one (non-fatal if it fails)
if newToken.RefreshToken != "" {
if cacheErr := gcpAuth.CacheRefreshToken(ctx, cluster, newGKEToken, newToken.RefreshToken); cacheErr != nil {
logger.Log(logger.LevelError, map[string]string{"cluster": cluster}, cacheErr, "failed to cache new refresh token")
}
}

// Set new token in cookie
SetTokenCookie(w, r, cluster, newGKEToken, baseURL)

logger.Log(logger.LevelInfo, map[string]string{
"cluster": cluster,
}, nil, "token refreshed successfully")

w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte("token refreshed"))
}
}

// setOAuthCookie sets a temporary cookie for OAuth flow state.
func setOAuthCookie(w http.ResponseWriter, name, value string, secure bool) {
http.SetCookie(w, &http.Cookie{
Name: name,
Value: value,
Path: "/",
MaxAge: int(oauthFlowTimeout.Seconds()),
HttpOnly: true,
Secure: secure,
SameSite: http.SameSiteLaxMode,
})
}

// clearOAuthCookie clears a cookie by setting it to expire immediately.
func clearOAuthCookie(w http.ResponseWriter, name string, secure bool) {
http.SetCookie(w, &http.Cookie{
Name: name,
Value: "",
Path: "/",
MaxAge: -1,
HttpOnly: true,
Secure: secure,
SameSite: http.SameSiteLaxMode,
})
}
Copy link

Copilot AI Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GCP OAuth HTTP handlers (HandleGCPAuthLogin, HandleGCPAuthCallback, HandleGCPTokenRefresh) lack unit test coverage. While the underlying GCPAuthenticator has tests in backend/pkg/gcp/auth_test.go, the HTTP handler functions should also have tests to verify request validation, error handling, cookie management, and redirect logic. Consider adding tests in a new file backend/pkg/auth/gcp_test.go.

Copilot uses AI. Check for mistakes.
@drduker drduker force-pushed the feature/gcp-oauth-authentication branch from df28d25 to 72ef722 Compare December 7, 2025 19:14
This implementation adds GCP OAuth 2.0 authentication to Headlamp, replacing
the deprecated Identity Service for GKE. Users can authenticate with their
Google Cloud account, and the authentication tokens are used to access
Kubernetes resources with proper RBAC.

Backend changes:
- New GCP authenticator package with RFC 7636-compliant PKCE support
- OAuth HTTP handlers for login, callback, and token refresh
- Configuration via environment variables
- Token caching and automatic refresh mechanisms
- Input validation to prevent injection attacks

Frontend changes:
- GCPLoginButton component for Google sign-in
- GKE cluster detection based on server URL patterns
- Integration into existing authentication chooser UI
- Comprehensive test coverage

Documentation:
- Complete setup guide for GKE deployments
- RBAC configuration examples
- Troubleshooting guide

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@drduker drduker force-pushed the feature/gcp-oauth-authentication branch from 72ef722 to 83732cd Compare December 7, 2025 19:48
drduker and others added 2 commits December 8, 2025 09:08
- Fix golangci-lint wsl errors in gcp_test.go by adding blank lines
  before assignments that were cuddled with non-assignments
- Restore accidentally removed redirect logic in AuthChooser that
  automatically redirects to the token page when a cluster requires
  token authentication without OIDC configured

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When GCP OAuth is configured, show the auth chooser dialog instead of
automatically redirecting to the token page. This allows users to choose
between Google Sign In and token authentication.

The redirect to token page is now conditional on GCP OAuth being disabled,
which preserves backward compatibility with e2e tests and non-GCP deployments.

Also updated GCPLoginButton to only show when GCP OAuth is explicitly enabled
via environment variable, not based on cluster type detection.
@drduker drduker force-pushed the feature/gcp-oauth-authentication branch from e6be69d to 7fd4a1b Compare December 8, 2025 19:44
Tests now mock isGCPOAuthEnabled to return true by default, and use
waitFor for async state updates since the component checks OAuth status
on mount.
@drduker drduker requested a review from skoeva December 9, 2025 06:15
@illume illume added this to the v0.40.0 milestone Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants