This repository contains the complete infrastructure and CI/CD pipeline for a take-home assessment.
It demonstrates a robust DevSecOps approach to deploying, securing, and managing a containerized application on Google Kubernetes Engine (GKE).
The project goes beyond a simple deployment; it's a narrative of building resilient, secure, and automated infrastructure from the ground up, including navigating and solving complex, real-world platform challenges.
The Juice Shop application is deployed and accessible with HTTPS at:
https://juiceshop.34.107.194.151.nip.io
Monitoring Portal:
https://grafana.34.107.194.151.nip.io/
(Note: This infrastructure may be torn down after the assessment period.)
The architecture is designed for security, automation, and scalability, with a clear flow from a code commit to a live, secured endpoint.
- A developer pushes code to the main branch on the GitHub repository.
- GitHub Actions is triggered, starting the CI/CD pipeline.
- The pipeline authenticates to Google Cloud securely using Workload Identity Federation (WIF).
- A Docker image is built and scanned for vulnerabilities using Trivy.
- If the scan passes, the image is pushed to a private Google Artifact Registry.
- The pipeline deploys the image to GKE using Helm.
- GKE Ingress provisions a Google-managed SSL certificate, routing traffic from a static IP.
- End-users access the application securely over HTTPS.
- NetwkPolicy Enable by Default Deny
- Grafan Dashboard Live
The inspiration for this project comes directly from Axiler's mission:
To build "silent guardians for the digital frontier."
I deployed OWASP Juice Shop, a deliberately vulnerable app, to demonstrate proactive DevSecOps security.
This project showcases:
- Infrastructure as Code (IaC): Auditable, repeatable environments.
- CI/CD Automation: GitOps-style workflow from commit to production.
- Container Security: "Shift-left" scanning for vulnerabilities before deployment.
- Cloud-Native Security: Keyless authentication + managed TLS.
- Scalability: Auto-scaling managed Kubernetes clusters.
- Monitoring Grafana + Prometheus Live Dashboard.
| Category | Tool | Why I Chose It |
|---|---|---|
| Application | OWASP Juice Shop | Chosen specifically because it is a deliberately insecure application |
| Cloud Provider | Google Cloud Platform (GCP) | Free trial credits, mature GKE & WIF. |
| Orchestration | GKE | Managed, self-healing, auto-scaling, GCP-native integration. |
| Infrastructure as Code | Terraform | Industry-standard IaC, repeatable & auditable. |
| CI/CD | GitHub Actions | Native to repo, OIDC support, rich marketplace. |
| DNS Service | nip.io | A free and clever wildcard DNS service. It was used to provide a valid, resolvable domain name for the application's public IP address |
| Packaging | Helm | Version-controlled deployments, upgrades, rollbacks. |
| Containerization | Docker | Lightweight, portable, consistent environments. |
| Security Scanning | Trivy | Open-source, fast, CI/CD integrated scanning. |
| Authentication | Workload Identity Federation (WIF) | Keyless, secure, eliminates long-lived secrets. |
| Networking & HTTPS | GKE Ingress + Managed Certificates | Automated TLS, zero maintenance. |
| Monitoring Tools | Prometheus + Grafana | Open Source |
Building this pipeline was a multi-stage process that involved overcoming a series of realistic infrastructure and platform challenges. This journey highlights a core DevOps principle: build, test, and automate incrementally.
- GCP Quota Limits: The initial
terraform applyfailed due to a default SSD quota limit in the new GCP project. The fix was to explicitly define a smaller, cost-effectivepd-standarddisk for the GKE nodes, demonstrating resource management. - Regional vs. Zonal Clusters: An early configuration created a regional GKE cluster, resulting in 6 nodes instead of the intended 2. I corrected the Terraform code to create a more efficient zonal cluster, showcasing an understanding of cloud architecture and cost control.
- Validation: Performed a full manual deployment using Helm (
helm install ...) to confirm the GKE cluster was healthy and the Juice Shop application's Helm chart was correctly configured. - Baseline: Established a "known good" state, making it easier to debug subsequent automation issues.
- The Initial
unauthorized_clientError: The pipeline immediately failed with a WIF error, indicating the OIDC token from GitHub was rejected by GCP's attribute condition. This triggered a deep investigation into every component of the authentication chain. - The "Zombie" Resource Contradiction: Debugging revealed a bizarre platform-level issue:
gcloud ... pools delete→ Not Foundgcloud ... pools create→ Already Exists
- The Solution – A Clean Slate: This behavior proved the issue was a resource state propagation problem within the GCP project. The only viable solution was to start fresh in a brand new GCP project and use completely unique names for the WIF components. This methodical approach to isolating and bypassing a platform-level bug was the key to moving forward.
- ImagePullBackOff: The first Helm deployment from the pipeline timed out.
kubectl describe podrevealed the issue: GKE nodes lacked permission to pull images from the private Google Artifact Registry. Solved by granting the Artifact Registry Reader role to the GKE nodes' default service account. - Pending Pods: Next run timed out with pods stuck in a Pending state. The root cause was insufficient CPU/memory resources. Enabled GKE cluster autoscaling, allowing the cluster to automatically add new nodes on demand—the cloud-native solution to resource contention.
- The Initial FailedNotVisible Error: The ManagedCertificate resource for Grafana was persistently failing its validation check. This indicated that the Google Cloud Load Balancer, created by the GKE Ingress, was unreachable from the public internet.
- A Multi-Layered Investigation: The troubleshooting journey involved methodically isolating and eliminating potential causes:
Zero-Trust Policies:I first suspected the new NetworkPolicy rules were blocking the CA's validation servers or GKE's health checkers. O created more permissive rules, but the error remained.Ingress Conflicts:I discovered multiple issues with the Ingress setup: a "catch-all" rule on the Juice Shop Ingress was hijacking traffic, and a missing default-http-backend service was preventing the GKE controller from syncing any changes. I fixed both.The "Two Load Balancers" Problem:After fixing the above, GKE created two separate, conflicting Application Load Balancers instead of merging the Ingress rules. The new one for Grafana was created without a public IP (frontend), confirming a deep-seated configuration conflict.- The Final Solution – The Reverse Proxy Pattern: After exhausting all standard Ingress configurations, the definitive solution was to simplify the task for the GKE controller. I implemented a classic reverse proxy pattern:
- A lightweight NGINX proxy Deployment was created in the default namespace.
- The single, unified Ingress now only routes to services within its own namespace (juice-shop-service and grafana-proxy-service).
- The NGINX proxy then handles the simple and reliable cross-namespace forwarding to the real Grafana service in the monitoring namespace.
This elegant solution bypassed the GKE controller's complex and problematic cross-namespace logic, providing a stable, secure, and scalable frontend for both applications. It's a testament to solving problems by moving up the stack and abstracting away platform-level complexities.
This multi-phase journey, from manual deployment to a fully automated and hardened pipeline, reflects a real-world DevOps workflow of iterative improvement and persistent problem-solving.
- Centralized Logging: Add EFK (Elasticsearch, Fluentd, Kibana) stack.
- GitOps Deployment: Use ArgoCD for a pull-based deployment model.
This project successfully demonstrates a complete, secure, and automated DevSecOps workflow on GKE. From writing infrastructure as code with Terraform to navigating complex authentication and networking bugs in a cloud-native environment, it showcases the persistence and deep technical knowledge required to build and maintain resilient systems. The result is a "silent guardian"—an automated pipeline that securely delivers applications to the digital frontier.
