Google Kubernetes Engine (GKE)
Deploying and managing Google Kubernetes Engine (GKE) clusters
Google Kubernetes Engine (GKE) is Google Cloud's managed Kubernetes service that provides a secure, production-ready environment for deploying containerized applications. This guide focuses on practical deployment scenarios using Terraform and gcloud CLI.
Key Features
Autopilot: Fully managed Kubernetes experience with hands-off operations
Standard: More control over cluster configuration and node management
GKE Enterprise: Advanced multi-cluster management and governance features
Auto-scaling: Automatic scaling of node pools based on workload demand
Auto-upgrade: Automated Kubernetes version upgrades
Multi-zone/region: Deploy across zones/regions for high availability
VPC-native networking: Uses alias IP ranges for pod networking
Container-Optimized OS: Secure by default OS for GKE nodes
Workload Identity: Secure access to Google Cloud services from pods
Deploying GKE with Terraform
Standard Cluster Deployment
resource "google_container_cluster" "primary" {
name = "my-gke-cluster"
location = "us-central1-a"
remove_default_node_pool = true
initial_node_count = 1
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
# IP allocation policy for VPC-native
ip_allocation_policy {
cluster_ipv4_cidr_block = "/16"
services_ipv4_cidr_block = "/22"
}
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Release channel for auto-upgrades
release_channel {
channel = "REGULAR"
}
# Maintenance window
maintenance_policy {
recurring_window {
start_time = "2022-01-01T00:00:00Z"
end_time = "2022-01-02T00:00:00Z"
recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
}
}
}
resource "google_container_node_pool" "primary_nodes" {
name = "primary-node-pool"
location = "us-central1-a"
cluster = google_container_cluster.primary.name
node_count = 3
management {
auto_repair = true
auto_upgrade = true
}
autoscaling {
min_node_count = 1
max_node_count = 10
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
# Google recommends custom service accounts with minimal permissions
service_account = google_service_account.gke_sa.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
# Enable workload identity on node pool
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
env = "production"
}
tags = ["gke-node", "production"]
}
}
resource "google_service_account" "gke_sa" {
account_id = "gke-service-account"
display_name = "GKE Service Account"
}
resource "google_project_iam_member" "gke_sa_roles" {
for_each = toset([
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/artifactregistry.reader"
])
role = each.key
member = "serviceAccount:${google_service_account.gke_sa.email}"
project = var.project_id
}
resource "google_compute_network" "vpc" {
name = "gke-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = "gke-subnet"
ip_cidr_range = "10.10.0.0/16"
region = "us-central1"
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.20.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.30.0.0/16"
}
}Autopilot Cluster Deployment
Deploying GKE with gcloud CLI
Creating a Standard Cluster
Creating an Autopilot Cluster
Real-World Example: Deploying a Microservice Application
This example demonstrates deploying a complete microservices application to GKE:
Step 1: Create GKE infrastructure with Terraform
Step 2: Create Kubernetes manifests for the application
Step 3: Create Deployment Pipeline (Cloud Build)
Best Practices
Security
Use private clusters with no public endpoint
Implement Workload Identity for pod-level access to Google Cloud resources
Apply the principle of least privilege for service accounts
Enable Binary Authorization for secure supply chain
Keep nodes and master updated with release channels
Reliability
Deploy across multiple zones/regions for high availability
Use Pod Disruption Budgets to ensure availability during maintenance
Implement proper health checks and readiness/liveness probes
Set appropriate resource requests and limits
Use node auto-provisioning to handle fluctuating workloads
Cost Optimization
Use Autopilot for hands-off management and optimized costs
Leverage Spot VMs for batch or fault-tolerant workloads
Set up cluster autoscaler to scale nodes based on demand
Use horizontal pod autoscaling (HPA) based on CPU/memory/custom metrics
Implement PodNodeSelector to ensure pods run on appropriate nodes
Monitoring and Logging
Enable Cloud Monitoring and Logging during cluster creation
Set up custom dashboards for cluster and application metrics
Create log-based alerts for critical issues
Use Cloud Trace and Profiler for application performance monitoring
Implement distributed tracing using OpenTelemetry
Common Issues and Troubleshooting
Networking Issues
Ensure pod CIDR ranges don't overlap with VPC subnets
Check firewall rules for master-to-node and node-to-node communication
Verify kube-proxy is running correctly for service networking
Use Network Policy to control pod-to-pod traffic
Performance Problems
Review pod resource settings (requests/limits)
Check for node resource exhaustion (CPU, memory)
Look for noisy neighbor issues on shared nodes
Monitor network throughput and latency
Deployment Failures
Verify service account permissions
Check image pull errors (registry access, image existence)
Examine pod events with
kubectl describe podReview logs with
kubectl logsor Cloud Logging
Scaling Issues
Ensure cluster autoscaler is properly configured
Check if pods have appropriate resource requests
Verify node resource availability
Look for pod affinity/anti-affinity conflicts
Further Reading
Last updated