Skip to content

devopsabcs-engineering/aks-governance

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AKS Governance — ACME Inc. Kubernetes Governance Architecture

Strategic governance reference and runnable Proof of Concept for deploying and governing Azure Kubernetes Service (AKS) — and, where required, Azure Red Hat OpenShift (ARO) — for internal clients with least-privilege, landing-zone-aligned, GitOps-driven operations.

🌐 Language: English · Français

📄 Companion report: ACME_Kubernetes_Governance_Architecture_Report.docx


Table of contents


Terminology

Note

ODSOffre de Service (Service Offering): the central ACME Inc. platform/self-service capability that provisions and governs Kubernetes (AKS, and ARO where required) on behalf of internal clients. Throughout this document, "ODS" refers to that central platform team and its deployment automation — e.g., the entity granted scoped rights to deploy into workload subscriptions and the owner of centralized observability and governance.

Federated ODS — the recommended target shape of that service offering: the same central governance/tooling/observability hosted in platform subscriptions, with workload clusters federated into per-client landing-zone subscriptions. This contrasts with the concentrated single-subscription bootstrap, where everything lives in one subscription.

Other acronyms used below: AKS (Azure Kubernetes Service), ARO (Azure Red Hat OpenShift), CAPI/CAPZ (Cluster API / Cluster API Provider Azure), ASO (Azure Service Operator), GIA (ACME Inc. identity & access management / Gestion des identités et des accès).


Bottom line

Important

Strategic recommendation: move toward a landing-zone-aligned multi-subscription ODS (central governance/tooling, with workload clusters in client-aligned landing-zone subscriptions), and treat the single-subscription model only as a tactical bootstrap pattern if ACME Inc. needs a short path around current cross-subscription network / GIA friction.

That recommendation is grounded in ACME Inc.'s own ODS objectives: simplify AKS/ARO consumption, centralize observability/governance, support both client-oriented and mutualized service tiers, prefer AKS by default, and use ARO only for specific workload classes such as CP4D and MQ. It is also consistent with Microsoft's landing-zone guidance, which places shared connectivity/security services in platform subscriptions and workloads in application landing-zone subscriptions with centralized policy and hub-and-spoke connectivity.


Why a multi-subscription model is the best fit

Least privilege / GIA

ACME Inc. explicitly asked how to deploy AKS and ARO for internal clients with the minimum rights possible, called out the problematic Microsoft.Authorization/*/Write permission family, and asked for pre-provisioned landing zones plus custom roles. A multi-subscription landing-zone model is the cleanest way to enforce those boundaries because prerequisites can be pre-created by the platform team and ODS automation can be granted only scoped rights on the target workload subscriptions, in line with Azure RBAC least-privilege best practices.

Why the single-subscription idea exists at all

The internal notes explicitly consider a one-subscription design because one managed identity can deploy multiple clusters in the same subscription, and because a cross-subscription model would otherwise introduce Palo Alto / inter-subscription communication complexity. That makes the single-sub option attractive operationally in the very short term, especially while the governance foundation is still forming.

Why single-sub should not be the end state

The same notes flag scale and boundary pressure — ACME Inc. already references 130–140 AKS clusters and the need to size VNets for large node counts and "predict Azure limits." Those limits are real and bounded: a single subscription is capped (for example, 5,000 AKS clusters per subscription and 5,000 nodes per cluster), and subscription-wide service limits apply to networking, compute, and identity objects alike. At that scale, VNet IP planning (Azure CNI addressing) and large-cluster best practices become first-class concerns. On the ARO side, control-plane scale-up can happen automatically while scale-down must be explicitly requested, with financial/operational implications that need governance and ownership clarity. This is exactly the blast-radius, quota, and accountability problem that multi-subscription boundaries are meant to contain.

Identity and Conditional Access realities

ACME Inc.'s internal notes on private AKS / Entra sign-in behavior conclude that IP allow-listing is not a reliable architectural control for AKS creation/authentication flows, and that the clean supported answer is managed identity / workload identity — not brittle source-IP assumptions (Conditional Access network conditions, private AKS clusters). That pushes the design toward pre-provisioned landing zones, managed identities, and scoped RBAC instead of broad human/operator privileges.


AKS vs ARO governance is not symmetric

Note

One of the most important findings in the internal material is that AKS and ARO cannot be governed as if they were identical Azure resource types.

An internal governance summary states that ARO surfaced through Azure Arc behaves as connectedClusters, while AKS is managedClusters, so AKS-targeted policy sets don't automatically apply to ARO the same way. In other words, "uniform policy coverage" across AKS and ARO is not the default product behavior and must be solved as an architecture/governance pattern, not a support fix.

That matters directly to the recommendation:

  • Use AKS-native governance controls (Azure Policy for Kubernetes, plus Kyverno) where they fit AKS best.
  • Use Arc / Kubernetes-native controls plus GitOps where ARO requires a different enforcement path.

This also lines up with the internal ARO thread where managed identity was described as non-negotiable, while the team documented gaps in ASO/Terraform support for ARO managed-identity cluster creation at the time and considered a temporary wrapper pipeline as "Plan C." Publicly, Microsoft now documents managed-identity ARO clusters as GA, including portal-based deployment, which materially improves the viability of a secure ARO target pattern (create an ARO cluster, ARO overview).


The two customer-proposed options

Option 1 — Single subscription

Best use: tactical bootstrap / temporary service-cell.

It reduces cross-subscription network friction, centralizes operations quickly, and simplifies the first implementation of Argo CD + management automation because the identity and networking blast radius are all inside one subscription. But it also concentrates quota risk (subscription limits, AKS limits), weakens tenant/workload isolation, complicates cost/showback separation, and gives you fewer native boundaries for least privilege over time — especially once the estate grows beyond a few controlled service tiers.

Option 2 — Management subscription + workload subscriptions

Best use: strategic architecture, especially once the landing-zone foundation is in place.

This aligns better with Azure landing zones, gives clearer ownership and policy boundaries, supports pre-provisioned prerequisites, and matches the ODS "client-oriented" deployment model more naturally. The main downside is that it requires more platform readiness up front: identity scoping, network peering / private DNS / firewall pathing, and a clean agreement on which team owns which prerequisite under GIA/security constraints.


Comparison synthesis

Criterion Single subscription Multi-subscription Advantage Executive reading
Least privilege Low to medium High 🟢 Multi-sub Rights can be scoped per client or per workload.
Network / Palo Alto Simple More complex 🔵 Single-sub Cross-subscription requires more network coordination.
Scalability Limited Strong 🟢 Multi-sub Better management of quotas, costs, and boundaries.
Governance Centralized but concentrated Centralized with better boundaries 🟢 Multi-sub Better balance between central control and isolation.
Time-to-value Fast Medium 🔵 Single-sub Good transition model, weaker end state.

Tip

Reading: the multi-subscription option clearly wins on security, governance, and durability; the single-subscription option wins mainly on start-up simplicity.


Microsoft-aligned alternatives

Beyond the customer's original two options, the report includes two Microsoft-aligned patterns:

  1. Landing-zone-aligned federated ODS — central governance/tooling, shared connectivity/security subscriptions, and client/workload AKS/ARO clusters deployed into client-aligned application subscriptions with pre-provisioned prerequisites and scoped managed identities. See Azure landing zone design principles and the AKS baseline architecture.
  2. Fleet-governed distributed AKS operations — an AKS Fleet Manager overlay for multi-cluster namespace governance, quotas, RBAC, upgrades, resource placement, and staged Git-based deployment — paired with Arc-aware handling for ARO where feature parity differs.

How the design treats CAPI/CAPZ + Argo CD

For AKS, the design uses a clear split of responsibility:

For ARO, the design is more conservative:

  • Use Argo CD for add-ons, policy/config standardization, and workload delivery where appropriate.
  • Use a managed-identity-compatible ARO provisioning path (portal / ARM / Bicep / supported CLI) because the internal thread documented that ARO creation through the CAPZ/ASO/Terraform path was blocked by managed-identity support gaps at that time. Microsoft's docs now confirm managed identity GA for ARO.

Note

This repository's PoC also demonstrates how to keep Argo CD Synced against Kyverno's self-managed fields (CRD spec.conversion, ClusterPolicy admission defaults) via ignoreDifferences — a practical detail when running GitOps governance at scale.


Executive one-liner

Important

Do not make the single-subscription design your destination. Use it only if you need a near-term bridge around today's cross-subscription constraints, but build toward a landing-zone-aligned multi-subscription ODS with managed identities, scoped RBAC, GitOps standardization, and AKS/ARO-specific governance paths.


Proof of Concept

A runnable CAPI/CAPZ + Kyverno + Argo CD governance PoC backs this report.

  • 📓 Operations runbook: docs/runbook.md — required GitHub secrets, the OIDC app registration, the aksgov-poc-teardown approval environment, the local-first script run order, and the customer-demo walkthrough.
  • ⚙️ Pipeline: .github/workflows/aksgov-poc-demo.yml.

What it provisions and demonstrates:

Stage What happens
Management cluster A CAPI/CAPZ/ASO management AKS cluster + Argo CD + Kyverno (Bicep + clusterctl init).
Workload clusters Two workload AKS clusters provisioned declaratively via CAPZ/ASO.
GitOps fan-out Argo CD ApplicationSets install Kyverno and fan governance ClusterPolicy objects to every workload cluster.
Governance demo A two-phase Audit → Enforce registry policy captures a real Kyverno PolicyReport, then blocks violating Pods; a minimum-Kubernetes-version policy is also demonstrated.
Evidence + wiki CLI evidence is captured and published to the repository wiki.
Teardown All Azure resources are removed behind a manual approval gate.

References

Tip

Quotas and limits are the load-bearing constraint behind the single-vs-multi-subscription decision — start with the first three links below.

Quotas & limits

Landing zones, network & architecture

Identity, RBAC & least privilege

Policy & governance enforcement

Azure Red Hat OpenShift (ARO)

Arc & multi-cluster fleet

Platform engineering, GitOps & Cluster API


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors