Enterprise Architecture
This guide covers architectural patterns and best practices for designing and managing large-scale Kubernetes deployments across AWS, Azure, and GCP.
Multi-Cluster Architecture Models
Hub and Spoke Model
The hub cluster centrally manages configuration, security policies, and observability for multiple spoke clusters.
ββββββββββββββββ
β Hub Cluster β
β (Admin/Mgmt)β
βββββββββ¬βββββββ
β
βββββββββββββββββββΌββββββββββββββββββ
β β β
ββββββββΌβββββββ ββββββββΌβββββββ ββββββββΌβββββββ
β Spoke β β Spoke β β Spoke β
β (Workload) β β (Workload) β β (Workload) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
Real-life example: Financial services organization with regulated workloads in separate clusters but unified governance.
Multi-Regional Architecture
Independent cluster instances deployed across regions for data sovereignty and resilience.
ββββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββ
β US-East Region β β Europe Region β β APAC Region β
β β β β β β
β ββββββββββ βββββββββββ β ββββββββββ βββββββββββ β ββββββββββ βββββββββββ
β βProd β βStaging ββ β βProd β βStaging ββ β βProd β βStaging ββ
β βCluster β βCluster ββ β βCluster β βCluster ββ β βCluster β βCluster ββ
β ββββββββββ βββββββββββ β ββββββββββ βββββββββββ β ββββββββββ βββββββββββ
ββββββββββββββββββββββββ ββββββββββββββββββββββββ ββββββββββββββββββββββββ
Real-life example: Global SaaS provider maintaining regional data residency while providing uniform service.
Cloud-Specific Implementation Patterns
AWS EKS Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Region β
β βββββββββββββββββββ βββββββββββββββββββ β
β β AZ-1 β β AZ-2 β β
β β β β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β βWorker Nodes β β β βWorker Nodes β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β β β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EKS Control Plane β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AWS ALB ββ AWS ECR ββRoute 53 ββ CloudWatch Logs β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Best Practices:
Use EKS add-ons for CNI, CoreDNS, and kube-proxy
Leverage AWS Load Balancer Controller for ALB/NLB integration
Use Node Groups with Auto Scaling Groups
Implement dedicated VPC endpoints for ECR, S3, and other AWS services
Configure AWS IAM for Kubernetes RBAC integration
Real-life considerations:
ALB for ingress offers native integrations with AWS WAF and Shield
Use Cluster Autoscaler with multiple node groups for cost optimization
Auto-scaling with Karpenter provides faster node provisioning
Azure AKS Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Region β
β βββββββββββββββββββ βββββββββββββββββββ β
β β AZ-1 β β AZ-2 β β
β β β β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β Node Pools β β β β Node Pools β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β β β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β AKS Control Plane β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β βAzure App Gtwy ββ Azure Cont. ββ Azure Monitor ββ
β β ββ Registry ββ Container Insights ββ
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Best Practices:
Enable managed identity and RBAC integration
Implement Azure CNI networking for enterprise-scale deployments
Use separate node pools for system and application workloads
Configure CSI drivers for Azure Disk and File storage
Leverage Azure Policy for AKS
Real-life considerations:
Application Gateway Ingress Controller for WAF capabilities
Azure Container Registry with geo-replication for multi-region deployments
Use Virtual Node (with Azure Container Instances) for burst workloads
GCP GKE Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Region β
β βββββββββββββββββββ βββββββββββββββββββ β
β β Zone A β β Zone B β β
β β β β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β Node Pools β β β β Node Pools β β β
β β βββββββββββββββ β β βββββββββββββββ β β
β β β β β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β GKE Control Plane β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βCloud Load ββ Container ββ Cloud Monitoring β β
β βBalancing ββ Registry ββ & Logging β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Best Practices:
Use GKE Autopilot for simplified operations
Enable GKE Standard clusters with node auto-provisioning
Implement Workload Identity for secure GCP API access
Configure Cloud NAT for private GKE clusters
Use Binary Authorization for supply chain security
Real-life considerations:
Multi-cluster ingress and service mesh with Cloud Service Mesh
GKE Enterprise for enhanced multi-cluster management
Container-Optimized OS for improved security posture
Multi-Cloud Kubernetes Architecture
For organizations operating across multiple clouds, these patterns enable consistent management:
Fleet Management Approach
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Central Management Plane β
β β
β βββββββββββββββββ βββββββββββββββββ ββββββββββββββββββ β
β βGitOps System β βFleet Manager β βCentralized β β
β β(Flux/ArgoCD) β β(e.g., Rancher)β βObservability β β
β βββββββββ¬ββββββββ βββββββββ¬ββββββββ ββββββββββ¬ββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βΌ βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β AWS EKS β β Azure AKS β β Google GKE β
β Clusters β β Clusters β β Clusters β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
Implementation strategies:
Unified configuration repository with environment-specific overlays
Federation layer for cross-cluster service discovery
Standardized CRDs across all clusters
Central identity management with federation to cloud IAM systems
Common observability and alerting platform
Network Architecture Models
Multi-Tier Network Security Model
ββββββββββββββββββββββββββββββββββββββββββββββ
β Ingress Tier β
β ββββββββββββββββββββββββββββββββββββββββ β
β βWAF & DDoS Protection β β
β ββββββββββββββββββ¬ββββββββββββββββββββββ β
β β β
β ββββββββββββββββββΌββββββββββββββββββββββ β
β βAPI Gateway / Ingress Controller β β
β ββββββββββββββββββ¬ββββββββββββββββββββββ β
ββββββββββββββββββββΌββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββββββββ
β Service Mesh β
β ββββββββββββββββββββββββββββββββββββββββ β
β βmTLS Encryption β β
β βTraffic Management β β
β βService-to-Service Authorization β β
β ββββββββββββββββββ¬ββββββββββββββββββββββ β
ββββββββββββββββββββΌββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββββββββ
β Pod Security β
β ββββββββββββββββββββββββββββββββββββββββ β
β βPod Security Policies/Standards β β
β βNetwork Policies β β
β βRuntime Security β β
β ββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββ
Implementation components:
AWS: ALB + AWS Shield + WAF + AppMesh/Istio + Calico
Azure: App Gateway + Azure Firewall + Istio/Linkerd + Azure CNI + Calico
GCP: Cloud Load Balancer + Cloud Armor + Anthos Service Mesh + Calico
Storage Architecture Best Practices
Data-Intensive Workload Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stateful Application Deployment β
β β
β ββββββββββββββββββββββββββββ βββββββββββββββββββββββββββ β
β β StatefulSet β β Operator-managed DB β β
β β β β β β
β β ββββββββββββββ β β βββββββββββββββββββββββ β β
β β βPVC Templatesβ β β βCustom Resource Def.β β β
β β βββββββ¬βββββββ β β ββββββββββββ¬βββββββββββ β β
β βββββββββΌββββββββββββββββββ ββββββββββββββΌβββββββββββββ β
β β β β
β βββββββββΌβββββββββββββββββββββββββββββββββββΌββββββββββββββββ
β β Storage Class Abstraction ββ
β βββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β β β
β βββββββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββ
β β Cloud Provider Storage Integration β
β β β
β β AWS: EBS, EFS, FSx Azure: Disk, Files GCP: PD β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cloud-specific recommendations:
AWS: Use gp3 volumes for general workloads, io2 for high-performance databases
Azure: Use Premium SSD v2 for dynamic scaling of performance
GCP: Use Regional Persistent Disks for high-availability storage
Multi-Tenancy Models
Hard Multi-tenancy
Separate clusters for each tenant ensure complete isolation.
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
βTenant A Cluster β βTenant B Cluster β βTenant C Cluster β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
βββββββββΌββββββββββββββββββββΌβββββββββββββββββββββΌββββββββ
β Central Management Plane β
β (Configuration, Monitoring, Security Policy, Billing) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Soft Multi-tenancy
Namespace-based isolation within a shared cluster.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Shared Cluster β
β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
β βNamespace A β βNamespace B β βNamespace C β β
β β(Tenant A) β β(Tenant B) β β(Tenant C) β β
β β β β β β β β
β β βββββββββββ β β βββββββββββ β β βββββββββββ β β
β β βResource β β β βResource β β β βResource β β β
β β βQuotas β β β βQuotas β β β βQuotas β β β
β β βββββββββββ β β βββββββββββ β β βββββββββββ β β
β β β β β β β β
β β βββββββββββ β β βββββββββββ β β βββββββββββ β β
β β βNetwork β β β βNetwork β β β βNetwork β β β
β β βPolicies β β β βPolicies β β β βPolicies β β β
β β βββββββββββ β β βββββββββββ β β βββββββββββ β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Implementation tools:
Hierarchical namespace controller
Network policies with advanced CNI implementations
OPA Gatekeeper or Kyverno for policy enforcement
ResourceQuotas and LimitRanges
Pod Security Standards
Control Plane Scaling Considerations
API Server Scaling
Maximum number of clusters:
AWS EKS: 100 clusters per region per account (soft limit)
Azure AKS: 1000 clusters per subscription (soft limit)
GCP GKE: 50 clusters per project (soft limit)
Maximum nodes per cluster:
AWS EKS: 5,000 nodes
Azure AKS: 5,000 nodes
GCP GKE: 15,000 nodes
API server recommendations:
Implement efficient watch caches
Use server-side filtering of list requests
Optimize etcd for large clusters
Consider specialized control plane scaling for >5000 nodes
Disaster Recovery Architecture
Multi-Region Active-Passive Pattern
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β Primary Region β β Secondary Region β
β β β β
β βββββββββββββββββββββββ β β βββββββββββββββββββββββ β
β β Active K8s Cluster β β β β Passive K8s Cluster β β
β βββββββββββ¬ββββββββββββ β β βββββββββββ¬ββββββββββββ β
β β β β β β
β βββββββββββΌββββββββββββ β β βββββββββββΌββββββββββββ β
β βDatabase Primary ββββSync/Asyncββ€Database Replica β β
β βββββββββββββββββββββββ β β βββββββββββββββββββββββ β
βββββββββββββββββββββββββββ βββββββββββββββββββββββββββ
β β²
β β
β βββββββββββββββββ β
ββββββββββΊβ Global Load βββββββββββ
β Balancer β
βββββββββββββββββ
Recovery strategies:
Regular etcd snapshots with cross-region backup
GitOps-driven configuration ensures consistent redeployment
Stateful data replication with appropriate consistency models
DNS or global load balancer for traffic redirection
Cost Optimization Architecture
Cost-Efficient Node Design
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
β β System Node Pool β β General Workload β β
β β (On-demand) β β (Spot/Preemptible)β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
β β Memory-Optimized β β Compute-Optimized β β
β β (Critical DBs) β β (Batch Processing)β β
β ββββββββββββββββββββββ ββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββ
Cloud-specific recommendations:
AWS: Mix Spot Instances with On-Demand and Savings Plans
Azure: Use Spot VMs with AKS and Azure Reservations
GCP: Combine Spot VMs with Committed Use Discounts
Optimization techniques:
Cluster autoscaler with scale-down rules
Pod Priority and Preemption for critical workloads
Right-sizing deployments with VPA
Implement node auto-provisioning
Schedule non-critical batch jobs during off-peak hours
References
Last updated