Cloud PlatformsGoogle Cloud Services Google Kubernetes Engine (GKE) Deploying and managing Google Kubernetes Engine (GKE) clusters
Google Kubernetes Engine (GKE) is Google Cloud's managed Kubernetes service that provides a secure, production-ready environment for deploying containerized applications. This guide focuses on practical deployment scenarios using Terraform and gcloud CLI.
Key Features
Autopilot : Fully managed Kubernetes experience with hands-off operations
Standard : More control over cluster configuration and node management
GKE Enterprise : Advanced multi-cluster management and governance features
Auto-scaling : Automatic scaling of node pools based on workload demand
Auto-upgrade : Automated Kubernetes version upgrades
Multi-zone/region : Deploy across zones/regions for high availability
VPC-native networking : Uses alias IP ranges for pod networking
Container-Optimized OS : Secure by default OS for GKE nodes
Workload Identity : Secure access to Google Cloud services from pods
Standard Cluster Deployment
Copy resource "google_container_cluster" "primary" {
name = "my-gke-cluster"
location = "us-central1-a"
remove_default_node_pool = true
initial_node_count = 1
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
# IP allocation policy for VPC-native
ip_allocation_policy {
cluster_ipv4_cidr_block = "/16"
services_ipv4_cidr_block = "/22"
}
# Private cluster configuration
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.0/28"
}
# Release channel for auto-upgrades
release_channel {
channel = "REGULAR"
}
# Maintenance window
maintenance_policy {
recurring_window {
start_time = "2022-01-01T00:00:00Z"
end_time = "2022-01-02T00:00:00Z"
recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
}
}
}
resource "google_container_node_pool" "primary_nodes" {
name = "primary-node-pool"
location = "us-central1-a"
cluster = google_container_cluster.primary.name
node_count = 3
management {
auto_repair = true
auto_upgrade = true
}
autoscaling {
min_node_count = 1
max_node_count = 10
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
# Google recommends custom service accounts with minimal permissions
service_account = google_service_account.gke_sa.email
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
# Enable workload identity on node pool
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
env = "production"
}
tags = ["gke-node", "production"]
}
}
resource "google_service_account" "gke_sa" {
account_id = "gke-service-account"
display_name = "GKE Service Account"
}
resource "google_project_iam_member" "gke_sa_roles" {
for_each = toset([
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/artifactregistry.reader"
])
role = each.key
member = "serviceAccount:${google_service_account.gke_sa.email}"
project = var.project_id
}
resource "google_compute_network" "vpc" {
name = "gke-vpc"
auto_create_subnetworks = false
}
resource "google_compute_subnetwork" "subnet" {
name = "gke-subnet"
ip_cidr_range = "10.10.0.0/16"
region = "us-central1"
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "10.20.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "10.30.0.0/16"
}
}
Autopilot Cluster Deployment
Copy resource "google_container_cluster" "autopilot" {
name = "autopilot-cluster"
location = "us-central1"
# Enable Autopilot mode
enable_autopilot = true
# Network configuration
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
# IP allocation policy for VPC-native
ip_allocation_policy {
cluster_ipv4_cidr_block = "/16"
services_ipv4_cidr_block = "/22"
}
# Release channel (required for Autopilot)
release_channel {
channel = "REGULAR"
}
# Workload identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
}
Deploying GKE with gcloud CLI
Creating a Standard Cluster
Copy # Create VPC
gcloud compute networks create gke-vpc --subnet-mode=custom
# Create subnet
gcloud compute networks subnets create gke-subnet \
--network=gke-vpc \
--region=us-central1 \
--range=10.10.0.0/16 \
--secondary-range=pods=10.20.0.0/16,services=10.30.0.0/16
# Create service account
gcloud iam service-accounts create gke-sa --display-name="GKE Service Account"
# Assign roles
for role in roles/logging.logWriter roles/monitoring.metricWriter roles/monitoring.viewer roles/artifactregistry.reader
do
gcloud projects add-iam-policy-binding $(gcloud config get-value project) \
--member="serviceAccount:gke-sa@$(gcloud config get-value project).iam.gserviceaccount.com" \
--role="${role}"
done
# Create GKE cluster
gcloud container clusters create my-gke-cluster \
--zone=us-central1-a \
--network=gke-vpc \
--subnetwork=gke-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--enable-ip-alias \
--enable-private-nodes \
--master-ipv4-cidr=172.16.0.0/28 \
--enable-master-global-access \
--no-enable-basic-auth \
--release-channel=regular \
--workload-pool=$(gcloud config get-value project).svc.id.goog \
--no-issue-client-certificate \
--num-nodes=1 \
--enable-autoscaling \
--min-nodes=1 \
--max-nodes=10 \
--machine-type=e2-standard-4 \
--disk-size=100 \
--disk-type=pd-standard \
--service-account=gke-sa@$(gcloud config get-value project).iam.gserviceaccount.com \
--scopes=https://www.googleapis.com/auth/cloud-platform \
--metadata=disable-legacy-endpoints=true \
--tags=gke-node,production \
--node-labels=env=production \
--enable-autoupgrade \
--enable-autorepair
Creating an Autopilot Cluster
Copy # Create VPC and subnet (same as above)
# Create Autopilot cluster
gcloud container clusters create-auto autopilot-cluster \
--region=us-central1 \
--network=gke-vpc \
--subnetwork=gke-subnet \
--cluster-secondary-range-name=pods \
--services-secondary-range-name=services \
--enable-master-global-access \
--release-channel=regular \
--workload-pool=$(gcloud config get-value project).svc.id.goog
Real-World Example: Deploying a Microservice Application
This example demonstrates deploying a complete microservices application to GKE:
Copy # main.tf - GKE Infrastructure
provider "google" {
project = var.project_id
region = var.region
}
# VPC Network
resource "google_compute_network" "vpc" {
name = "microservices-vpc"
auto_create_subnetworks = false
}
# Subnet
resource "google_compute_subnetwork" "subnet" {
name = "microservices-subnet"
ip_cidr_range = "10.0.0.0/16"
region = var.region
network = google_compute_network.vpc.id
secondary_ip_range {
range_name = "pods"
ip_cidr_range = "192.168.0.0/16"
}
secondary_ip_range {
range_name = "services"
ip_cidr_range = "172.16.0.0/16"
}
}
# NAT Router and Gateway for private clusters
resource "google_compute_router" "router" {
name = "microservices-router"
region = var.region
network = google_compute_network.vpc.id
}
resource "google_compute_router_nat" "nat" {
name = "microservices-nat"
router = google_compute_router.router.name
region = var.region
nat_ip_allocate_option = "AUTO_ONLY"
source_subnetwork_ip_ranges_to_nat = "ALL_SUBNETWORKS_ALL_IP_RANGES"
}
# Service Account
resource "google_service_account" "gke_sa" {
account_id = "microservices-gke-sa"
display_name = "Microservices GKE Service Account"
}
# IAM roles for the Service Account
resource "google_project_iam_member" "gke_sa_roles" {
for_each = toset([
"roles/logging.logWriter",
"roles/monitoring.metricWriter",
"roles/monitoring.viewer",
"roles/artifactregistry.reader"
])
role = each.key
member = "serviceAccount:${google_service_account.gke_sa.email}"
project = var.project_id
}
# GKE Cluster
resource "google_container_cluster" "primary" {
name = "microservices-cluster"
location = var.region
# We create a separate node pool below
remove_default_node_pool = true
initial_node_count = 1
network = google_compute_network.vpc.name
subnetwork = google_compute_subnetwork.subnet.name
ip_allocation_policy {
cluster_secondary_range_name = "pods"
services_secondary_range_name = "services"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = "172.16.0.32/28"
}
# Enable Binary Authorization
binary_authorization {
evaluation_mode = "PROJECT_SINGLETON_POLICY_ENFORCE"
}
# Enable Workload Identity
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
# Release channel
release_channel {
channel = "REGULAR"
}
}
# Node Pools
resource "google_container_node_pool" "general" {
name = "general"
location = var.region
cluster = google_container_cluster.primary.name
autoscaling {
min_node_count = 1
max_node_count = 5
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = "e2-standard-4"
disk_size_gb = 100
disk_type = "pd-standard"
service_account = google_service_account.gke_sa.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
role = "general"
}
taint = []
}
}
# Create a dedicated node pool for database workloads
resource "google_container_node_pool" "database" {
name = "database"
location = var.region
cluster = google_container_cluster.primary.name
autoscaling {
min_node_count = 1
max_node_count = 3
}
management {
auto_repair = true
auto_upgrade = true
}
node_config {
machine_type = "e2-highmem-4"
disk_size_gb = 200
disk_type = "pd-ssd"
service_account = google_service_account.gke_sa.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA"
}
labels = {
role = "database"
}
taint {
key = "workloadType"
value = "database"
effect = "NO_SCHEDULE"
}
}
}
# Artifact Registry (for storing container images)
resource "google_artifact_registry_repository" "repo" {
provider = google-beta
location = var.region
repository_id = "microservices"
format = "DOCKER"
# Encryption using CMEK (Customer-Managed Encryption Keys)
kms_key_name = google_kms_crypto_key.artifact_key.id
}
# KMS Key for encrypting Artifact Registry
resource "google_kms_key_ring" "keyring" {
name = "microservices-keyring"
location = var.region
}
resource "google_kms_crypto_key" "artifact_key" {
name = "artifact-key"
key_ring = google_kms_key_ring.keyring.id
}
Step 2: Create Kubernetes manifests for the application
Copy # namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: microservices
labels:
istio-injection: enabled
---
# frontend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
namespace: microservices
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/frontend:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /readiness
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
serviceAccountName: frontend-sa
---
# frontend-service.yaml
apiVersion: v1
kind: Service
metadata:
name: frontend
namespace: microservices
spec:
selector:
app: frontend
ports:
- port: 80
targetPort: 8080
type: ClusterIP
---
# backend.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: backend-api
namespace: microservices
spec:
replicas: 3
selector:
matchLabels:
app: backend-api
template:
metadata:
labels:
app: backend-api
spec:
containers:
- name: backend-api
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/backend:latest
ports:
- containerPort: 8081
env:
- name: DB_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: db_host
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
cpu: 500m
memory: 1Gi
serviceAccountName: backend-sa
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- backend-api
topologyKey: "kubernetes.io/hostname"
---
# database.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: database
namespace: microservices
spec:
serviceName: database
replicas: 1
selector:
matchLabels:
app: database
template:
metadata:
labels:
app: database
spec:
containers:
- name: database
image: us-central1-docker.pkg.dev/PROJECT_ID/microservices/postgres:13
ports:
- containerPort: 5432
env:
- name: POSTGRES_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
- name: POSTGRES_DB
value: app
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
nodeSelector:
role: database
tolerations:
- key: workloadType
operator: Equal
value: database
effect: NoSchedule
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "premium-rwo"
resources:
requests:
storage: 100Gi
---
# ingress.yaml (using Ingress-NGINX controller)
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: microservices-ingress
namespace: microservices
annotations:
kubernetes.io/ingress.class: "nginx"
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: frontend
port:
number: 80
- host: api.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: backend-api
port:
number: 80
Step 3: Create Deployment Pipeline (Cloud Build)
Copy # cloudbuild.yaml
steps:
# Build the container images
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}', './frontend']
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}', './backend']
# Push the container images to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}']
# Deploy to GKE
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'apply'
- '-f'
- 'kubernetes/namespace.yaml'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
# Create secrets
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'create'
- 'secret'
- 'generic'
- 'db-credentials'
- '--namespace=microservices'
- '--from-literal=username=admin'
- '--from-literal=password=${_DB_PASSWORD}'
- '--dry-run=client'
- '-o'
- 'yaml'
- '|'
- 'kubectl'
- 'apply'
- '-f'
- '-'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
# Update kubernetes manifests with the new image version
- name: 'gcr.io/cloud-builders/sed'
args:
- '-i'
- 's|us-central1-docker.pkg.dev/PROJECT_ID/microservices/frontend:latest|us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/frontend:${_VERSION}|g'
- 'kubernetes/frontend.yaml'
- name: 'gcr.io/cloud-builders/sed'
args:
- '-i'
- 's|us-central1-docker.pkg.dev/PROJECT_ID/microservices/backend:latest|us-central1-docker.pkg.dev/${PROJECT_ID}/microservices/backend:${_VERSION}|g'
- 'kubernetes/backend.yaml'
# Apply the Kubernetes manifests
- name: 'gcr.io/cloud-builders/kubectl'
args:
- 'apply'
- '-f'
- 'kubernetes/.'
env:
- 'CLOUDSDK_COMPUTE_REGION=us-central1'
- 'CLOUDSDK_CONTAINER_CLUSTER=microservices-cluster'
substitutions:
_VERSION: '1.0.0'
_DB_PASSWORD: 'changeme' # Should be set via Cloud Build triggers or Secret Manager
options:
dynamic_substitutions: true
Best Practices
Security
Use private clusters with no public endpoint
Implement Workload Identity for pod-level access to Google Cloud resources
Apply the principle of least privilege for service accounts
Enable Binary Authorization for secure supply chain
Keep nodes and master updated with release channels
Reliability
Deploy across multiple zones/regions for high availability
Use Pod Disruption Budgets to ensure availability during maintenance
Implement proper health checks and readiness/liveness probes
Set appropriate resource requests and limits
Use node auto-provisioning to handle fluctuating workloads
Cost Optimization
Use Autopilot for hands-off management and optimized costs
Leverage Spot VMs for batch or fault-tolerant workloads
Set up cluster autoscaler to scale nodes based on demand
Use horizontal pod autoscaling (HPA) based on CPU/memory/custom metrics
Implement PodNodeSelector to ensure pods run on appropriate nodes
Monitoring and Logging
Enable Cloud Monitoring and Logging during cluster creation
Set up custom dashboards for cluster and application metrics
Create log-based alerts for critical issues
Use Cloud Trace and Profiler for application performance monitoring
Implement distributed tracing using OpenTelemetry
Common Issues and Troubleshooting
Networking Issues
Ensure pod CIDR ranges don't overlap with VPC subnets
Check firewall rules for master-to-node and node-to-node communication
Verify kube-proxy is running correctly for service networking
Use Network Policy to control pod-to-pod traffic
Review pod resource settings (requests/limits)
Check for node resource exhaustion (CPU, memory)
Look for noisy neighbor issues on shared nodes
Monitor network throughput and latency
Deployment Failures
Verify service account permissions
Check image pull errors (registry access, image existence)
Examine pod events with kubectl describe pod
Review logs with kubectl logs
or Cloud Logging
Scaling Issues
Ensure cluster autoscaler is properly configured
Check if pods have appropriate resource requests
Verify node resource availability
Look for pod affinity/anti-affinity conflicts
Further Reading