Cloud Storage

Deploying and managing Google Cloud Storage for object storage

Google Cloud Storage is a globally unified, scalable, and highly durable object storage service for storing and accessing any amount of data. It provides industry-leading availability, performance, security, and management features.

Key Features

Global Accessibility: Access data from anywhere in the world
Scalability: Store and retrieve any amount of data at any time
Durability: 11 9's (99.999999999%) durability for stored objects
Storage Classes: Standard, Nearline, Coldline, and Archive storage tiers
Object Versioning: Maintain history and recover from accidental deletions
Object Lifecycle Management: Automatically transition and delete objects
Strong Consistency: Read-after-write and list consistency
Customer-Managed Encryption Keys (CMEK): Control encryption keys
Object Hold and Retention Policies: Enforce compliance requirements
VPC Service Controls: Add security perimeter around sensitive data

Cloud Storage Classes

Storage Class

Purpose

Minimum Storage Duration

Typical Use Cases

Standard

High-performance, frequent access

None

Website content, active data, mobile apps

Nearline

Low-frequency access

30 days

Data accessed less than once a month

Coldline

Very low-frequency access

90 days

Data accessed less than once a quarter

Deploying Cloud Storage with Terraform

Basic Bucket Creation

resource "google_storage_bucket" "static_assets" {
  name          = "my-static-assets-bucket"
  location      = "US"
  storage_class = "STANDARD"
  
  labels = {
    environment = "production"
    department  = "engineering"
  }
  
  # Enable versioning for recovery
  versioning {
    enabled = true
  }
  
  # Use uniform bucket-level access (recommended)
  uniform_bucket_level_access = true
  
  # Public access prevention (recommended security setting)
  public_access_prevention = "enforced"
}

# Grant access to a service account
resource "google_storage_bucket_iam_member" "viewer" {
  bucket = google_storage_bucket.static_assets.name
  role   = "roles/storage.objectViewer"
  member = "serviceAccount:my-service-account@my-project.iam.gserviceaccount.com"
}

Advanced Configuration with Lifecycle Policies

resource "google_storage_bucket" "data_lake" {
  name          = "my-datalake-bucket"
  location      = "US-CENTRAL1"
  storage_class = "STANDARD"
  
  # Enable versioning
  versioning {
    enabled = true
  }
  
  # Enable object lifecycle management
  lifecycle_rule {
    condition {
      age = 30  # days
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }
  
  lifecycle_rule {
    condition {
      age = 90  # days
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }
  
  lifecycle_rule {
    condition {
      age = 365  # days
    }
    action {
      type          = "SetStorageClass"
      storage_class = "ARCHIVE"
    }
  }
  
  # Delete old non-current versions
  lifecycle_rule {
    condition {
      age = 30  # days
      with_state = "ARCHIVED"  # non-current versions
    }
    action {
      type = "Delete"
    }
  }
  
  # Use Customer-Managed Encryption Key (CMEK)
  encryption {
    default_kms_key_name = google_kms_crypto_key.bucket_key.id
  }
  
  # Other security settings
  uniform_bucket_level_access = true
  public_access_prevention = "enforced"
}

# Create KMS key for CMEK
resource "google_kms_key_ring" "storage_keyring" {
  name     = "storage-keyring"
  location = "us-central1"
}

resource "google_kms_crypto_key" "bucket_key" {
  name     = "bucket-key"
  key_ring = google_kms_key_ring.storage_keyring.id
}

# Grant Cloud Storage service account access to use KMS key
data "google_storage_project_service_account" "gcs_account" {}

resource "google_kms_crypto_key_iam_binding" "crypto_key_binding" {
  crypto_key_id = google_kms_crypto_key.bucket_key.id
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  
  members = [
    "serviceAccount:${data.google_storage_project_service_account.gcs_account.email_address}",
  ]
}

Static Website Hosting Configuration

resource "google_storage_bucket" "website" {
  name          = "my-static-website-bucket"
  location      = "US"
  storage_class = "STANDARD"
  
  # Enable website serving
  website {
    main_page_suffix = "index.html"
    not_found_page   = "404.html"
  }
  
  # Set CORS configuration
  cors {
    origin          = ["https://example.com"]
    method          = ["GET", "HEAD", "OPTIONS"]
    response_header = ["Content-Type", "Access-Control-Allow-Origin"]
    max_age_seconds = 3600
  }
  
  # Force bucket to serve content via HTTPS
  force_destroy = true
}

# Make objects publicly readable
resource "google_storage_bucket_iam_member" "public_read" {
  bucket = google_storage_bucket.website.name
  role   = "roles/storage.objectViewer"
  member = "allUsers"
}

# Upload index page
resource "google_storage_bucket_object" "index" {
  name   = "index.html"
  bucket = google_storage_bucket.website.name
  source = "./website/index.html"
  
  # Set content type
  content_type = "text/html"
}

# Upload 404 page
resource "google_storage_bucket_object" "not_found" {
  name   = "404.html"
  bucket = google_storage_bucket.website.name
  source = "./website/404.html"
  
  content_type = "text/html"
}

Managing Cloud Storage with gsutil

Basic Bucket Commands

# Create a bucket
gsutil mb -l us-central1 gs://my-bucket

# List buckets
gsutil ls

# List objects in a bucket
gsutil ls gs://my-bucket/

# Get bucket information
gsutil ls -L gs://my-bucket

# Enable bucket versioning
gsutil versioning set on gs://my-bucket

# Set default storage class
gsutil defstorageclass set NEARLINE gs://my-bucket

Object Operations

# Upload file(s)
gsutil cp file.txt gs://my-bucket/

# Upload directory
gsutil cp -r ./local-dir gs://my-bucket/dir/

# Upload with specific content type
gsutil -h "Content-Type:text/html" cp index.html gs://my-bucket/

# Download file(s)
gsutil cp gs://my-bucket/file.txt ./

# Download directory
gsutil cp -r gs://my-bucket/dir/ ./local-dir/

# Move/Rename objects
gsutil mv gs://my-bucket/old-name.txt gs://my-bucket/new-name.txt

# Delete object
gsutil rm gs://my-bucket/file.txt

# Delete all objects in a bucket
gsutil rm gs://my-bucket/**

# Delete bucket and all its contents
gsutil rm -r gs://my-bucket

Access Control

# Make object public
gsutil acl ch -u AllUsers:R gs://my-bucket/file.txt

# Set bucket-level IAM policy
gsutil iam ch serviceAccount:my-service@my-project.iam.gserviceaccount.com:objectViewer gs://my-bucket

# Get IAM policy
gsutil iam get gs://my-bucket

# Set uniform bucket-level access (recommended)
gsutil uniformbucketlevelaccess set on gs://my-bucket

# Disable public access
gsutil pap set enforced gs://my-bucket

Lifecycle Management

# Create a lifecycle policy JSON file
cat > lifecycle.json << EOF
{
  "lifecycle": {
    "rule": [
      {
        "action": {
          "type": "SetStorageClass",
          "storageClass": "NEARLINE"
        },
        "condition": {
          "age": 30,
          "matchesStorageClass": ["STANDARD"]
        }
      },
      {
        "action": {
          "type": "Delete"
        },
        "condition": {
          "age": 365
        }
      }
    ]
  }
}
EOF

# Apply lifecycle policy to bucket
gsutil lifecycle set lifecycle.json gs://my-bucket

# View current lifecycle policy
gsutil lifecycle get gs://my-bucket

Real-World Example: Multi-Region Data Lake Architecture

This example demonstrates a complete data lake architecture using Cloud Storage:

Architecture Overview

Landing Zone: Raw data ingestion bucket
Processing Zone: Data transformation and staging
Curated Zone: Processed, high-quality data
Archive Zone: Long-term, cold storage

Terraform Implementation

provider "google" {
  project = var.project_id
  region  = "us-central1"
}

# Create VPC with private access
resource "google_compute_network" "data_lake_network" {
  name                    = "data-lake-network"
  auto_create_subnetworks = false
}

resource "google_compute_subnetwork" "data_lake_subnet" {
  name          = "data-lake-subnet"
  ip_cidr_range = "10.0.0.0/16"
  region        = "us-central1"
  network       = google_compute_network.data_lake_network.id
  
  # Enable Google Private Access
  private_ip_google_access = true
}

# Create VPC Service Controls perimeter
resource "google_access_context_manager_service_perimeter" "data_perimeter" {
  parent = "accessPolicies/${google_access_context_manager_access_policy.data_policy.name}"
  name   = "accessPolicies/${google_access_context_manager_access_policy.data_policy.name}/servicePerimeters/data_lake_perimeter"
  title  = "Data Lake Perimeter"
  
  status {
    resources = ["projects/${var.project_id}"]
    restricted_services = ["storage.googleapis.com"]
    
    ingress_policies {
      ingress_from {
        identities = [
          "serviceAccount:${google_service_account.data_processor.email}",
        ]
      }
      ingress_to {
        resources = ["*"]
        operations {
          service_name = "storage.googleapis.com"
          method_selectors {
            method = "*"
          }
        }
      }
    }
  }
}

resource "google_access_context_manager_access_policy" "data_policy" {
  parent = "organizations/${var.organization_id}"
  title  = "Data Lake Access Policy"
}

# Service Account for data processing
resource "google_service_account" "data_processor" {
  account_id   = "data-processor"
  display_name = "Data Lake Processing Service Account"
}

# KMS for encryption
resource "google_kms_key_ring" "data_lake_keyring" {
  name     = "data-lake-keyring"
  location = "global"
}

resource "google_kms_crypto_key" "data_lake_key" {
  name     = "data-lake-key"
  key_ring = google_kms_key_ring.data_lake_keyring.id
  
  # Rotation settings
  rotation_period = "7776000s" # 90 days
  
  # Protect against destruction
  lifecycle {
    prevent_destroy = true
  }
}

# Grant KMS access to service account
resource "google_kms_crypto_key_iam_binding" "data_lake_key_binding" {
  crypto_key_id = google_kms_crypto_key.data_lake_key.id
  role          = "roles/cloudkms.cryptoKeyEncrypterDecrypter"
  
  members = [
    "serviceAccount:${data.google_storage_project_service_account.gcs_account.email_address}",
  ]
}

data "google_storage_project_service_account" "gcs_account" {}

# Create buckets for the data lake zones
resource "google_storage_bucket" "landing_zone" {
  name          = "${var.project_id}-landing-zone"
  location      = "US"
  storage_class = "STANDARD"
  
  # Security settings
  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
  
  # Set CMEK encryption
  encryption {
    default_kms_key_name = google_kms_crypto_key.data_lake_key.id
  }
  
  # Lifecycle policies
  lifecycle_rule {
    condition {
      age = 7
    }
    action {
      type = "Delete"
    }
  }
  
  # Ensure data is kept for compliance even if deleted in Terraform
  lifecycle {
    prevent_destroy = true
  }
  
  # Logging configuration
  logging {
    log_bucket        = google_storage_bucket.logs.name
    log_object_prefix = "landing-zone"
  }
}

resource "google_storage_bucket" "processing_zone" {
  name          = "${var.project_id}-processing-zone"
  location      = "US"
  storage_class = "STANDARD"
  
  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
  
  encryption {
    default_kms_key_name = google_kms_crypto_key.data_lake_key.id
  }
  
  # Transition to Nearline after 30 days
  lifecycle_rule {
    condition {
      age = 30
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }
  
  # Delete after 60 days
  lifecycle_rule {
    condition {
      age = 60
    }
    action {
      type = "Delete"
    }
  }
  
  logging {
    log_bucket        = google_storage_bucket.logs.name
    log_object_prefix = "processing-zone"
  }
}

resource "google_storage_bucket" "curated_zone" {
  name          = "${var.project_id}-curated-zone"
  location      = "US"
  storage_class = "STANDARD"
  
  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
  
  # Enable versioning for data protection
  versioning {
    enabled = true
  }
  
  encryption {
    default_kms_key_name = google_kms_crypto_key.data_lake_key.id
  }
  
  # Lifecycle management
  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "NEARLINE"
    }
  }
  
  lifecycle_rule {
    condition {
      age = 365
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }
  
  # Delete non-current versions after 30 days
  lifecycle_rule {
    condition {
      age        = 30
      with_state = "ARCHIVED"
    }
    action {
      type = "Delete"
    }
  }
  
  logging {
    log_bucket        = google_storage_bucket.logs.name
    log_object_prefix = "curated-zone"
  }
}

resource "google_storage_bucket" "archive_zone" {
  name          = "${var.project_id}-archive-zone"
  location      = "US"
  storage_class = "ARCHIVE"
  
  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
  
  # Enable object holds for compliance
  retention_policy {
    retention_period = 31536000 # 1 year in seconds
  }
  
  encryption {
    default_kms_key_name = google_kms_crypto_key.data_lake_key.id
  }
  
  logging {
    log_bucket        = google_storage_bucket.logs.name
    log_object_prefix = "archive-zone"
  }
}

# Create bucket for access logs
resource "google_storage_bucket" "logs" {
  name          = "${var.project_id}-access-logs"
  location      = "US"
  storage_class = "STANDARD"
  
  uniform_bucket_level_access = true
  public_access_prevention    = "enforced"
  
  # Set lifecycle for logs
  lifecycle_rule {
    condition {
      age = 90
    }
    action {
      type          = "SetStorageClass"
      storage_class = "COLDLINE"
    }
  }
  
  lifecycle_rule {
    condition {
      age = 365
    }
    action {
      type = "Delete"
    }
  }
}

# IAM permissions for the buckets
resource "google_storage_bucket_iam_binding" "landing_zone_writer" {
  bucket = google_storage_bucket.landing_zone.name
  role   = "roles/storage.objectCreator"
  
  members = [
    "serviceAccount:${google_service_account.data_ingestion.email}",
  ]
}

resource "google_storage_bucket_iam_binding" "processing_zone_reader" {
  bucket = google_storage_bucket.landing_zone.name
  role   = "roles/storage.objectViewer"
  
  members = [
    "serviceAccount:${google_service_account.data_processor.email}",
  ]
}

resource "google_storage_bucket_iam_binding" "processing_zone_writer" {
  bucket = google_storage_bucket.processing_zone.name
  role   = "roles/storage.objectAdmin"
  
  members = [
    "serviceAccount:${google_service_account.data_processor.email}",
  ]
}

resource "google_storage_bucket_iam_binding" "curated_zone_writer" {
  bucket = google_storage_bucket.curated_zone.name
  role   = "roles/storage.objectAdmin"
  
  members = [
    "serviceAccount:${google_service_account.data_processor.email}",
  ]
}

resource "google_storage_bucket_iam_binding" "curated_zone_viewer" {
  bucket = google_storage_bucket.curated_zone.name
  role   = "roles/storage.objectViewer"
  
  members = [
    "serviceAccount:${google_service_account.data_analyst.email}",
    "group:data-analysts@example.com",
  ]
}

resource "google_storage_bucket_iam_binding" "archive_zone_writer" {
  bucket = google_storage_bucket.archive_zone.name
  role   = "roles/storage.objectAdmin"
  
  members = [
    "serviceAccount:${google_service_account.data_processor.email}",
  ]
}

# Additional service accounts
resource "google_service_account" "data_ingestion" {
  account_id   = "data-ingestion"
  display_name = "Data Ingestion Service Account"
}

resource "google_service_account" "data_analyst" {
  account_id   = "data-analyst"
  display_name = "Data Analyst Service Account"
}

# Notification configuration for new file arrivals
resource "google_storage_notification" "landing_zone_notification" {
  bucket         = google_storage_bucket.landing_zone.name
  payload_format = "JSON_API_V1"
  topic          = google_pubsub_topic.landing_zone_notifications.id
  event_types    = ["OBJECT_FINALIZE"]
}

resource "google_pubsub_topic" "landing_zone_notifications" {
  name = "landing-zone-notifications"
}

resource "google_pubsub_topic_iam_binding" "landing_zone_publisher" {
  topic   = google_pubsub_topic.landing_zone_notifications.name
  role    = "roles/pubsub.publisher"
  members = [
    "serviceAccount:${data.google_storage_project_service_account.gcs_account.email_address}",
  ]
}

Data Lifecycle Automation Script

# data_lifecycle.py
from google.cloud import storage
import datetime
import logging

def move_processed_data(event, context):
    """Cloud Function triggered by Pub/Sub to move processed data"""
    # Get bucket and file details
    bucket_name = event['attributes']['bucketId']
    object_name = event['attributes']['objectId']
    
    if not object_name.endswith('.processed'):
        return
        
    # Initialize storage client
    storage_client = storage.Client()
    
    # Set source and destination buckets
    source_bucket = storage_client.bucket(bucket_name)
    processed_blob = source_bucket.blob(object_name)
    
    # Determine target bucket based on data type
    object_metadata = processed_blob.metadata
    data_type = object_metadata.get('data_type', 'unknown')
    
    if data_type == 'report':
        dest_bucket_name = f"{bucket_name.split('-')[0]}-curated-zone"
        dest_path = f"reports/{datetime.datetime.now().strftime('%Y/%m/%d')}/{object_name.replace('.processed', '')}"
    elif data_type == 'archive':
        dest_bucket_name = f"{bucket_name.split('-')[0]}-archive-zone"
        dest_path = f"{datetime.datetime.now().strftime('%Y/%m')}/{object_name.replace('.processed', '')}"
    else:
        dest_bucket_name = f"{bucket_name.split('-')[0]}-curated-zone"
        dest_path = f"other/{object_name.replace('.processed', '')}"

    # Copy to destination
    dest_bucket = storage_client.bucket(dest_bucket_name)
    source_blob = source_bucket.blob(object_name)
    
    # Copy with metadata
    dest_blob = source_bucket.copy_blob(
        source_blob, dest_bucket, dest_path
    )
    
    # Delete original after successful copy
    source_blob.delete()
    
    logging.info(f"Moved {object_name} to {dest_bucket_name}/{dest_path}")

Best Practices

Bucket Naming and Organization
- Choose globally unique, DNS-compliant names
- Use consistent naming conventions
- Organize objects with clear prefix hierarchy
- Consider regional requirements for data storage
Security
- Enable uniform bucket-level access
- Use VPC Service Controls for sensitive data
- Apply appropriate IAM roles with least privilege
- Enforce public access prevention
- Use CMEK for regulated data
- Enable object holds for compliance
Cost Optimization
- Choose appropriate storage classes for data access patterns
- Implement lifecycle policies for automatic transitions
- Use composite objects for small files
- Monitor usage with Cloud Monitoring
- Consider requester pays for shared datasets
Performance
- Store frequently accessed data in regions close to users
- Use parallel composite uploads for large files
- Avoid small, frequent operations
- Use signed URLs for temporary access
- Implement connection pooling in applications
Data Management
- Enable object versioning for critical data
- Configure access logs for audit trails
- Use object metadata for classification
- Set up notifications for bucket events
- Implement retention policies for compliance

Common Issues and Troubleshooting

Access Denied Errors

Verify IAM permissions and roles
Check for VPC Service Controls blocking access
Ensure service accounts have proper permissions
Validate CMEK access for encrypted buckets
Check organization policies for restrictions

Performance Issues

Review network configuration for private Google access
Ensure proper region selection for proximity to users
Monitor request rates and throttling
Check object naming patterns for hotspots
Optimize upload/download processes

Cost Management

Review storage distribution across classes
Check lifecycle policies for effectiveness
Monitor large, unnecessary object versions
Watch for unexpected egress charges
Verify requester-pays configuration

Data Management

Validate versioning is working as expected
Check retention policy effectiveness
Monitor object holds and legal holds
Verify notification configurations
Ensure backups are properly configured