Bigtable
Google Cloud Bigtable is GCP’s fully managed, scalable, and high-performance wide-column NoSQL database service. It is designed for large analytical and operational workloads, such as time-series data, IoT, financial data, and user analytics.
Key Features
Massive scalability: Handles petabytes of data and millions of reads/writes per second
Low latency: Consistent single-digit millisecond response times
Fully managed: No server management, automatic scaling, patching, and backups
HBase API compatibility: Migrate HBase workloads with minimal changes
Seamless GCP integration: Works with Dataflow, Dataproc, BigQuery, and more
Replication: Multi-region replication for high availability
Architecture Overview
Table: Contains rows, each identified by a unique row key
Column families: Group related columns for storage and performance tuning
Cells: Intersection of row and column, can store multiple timestamped versions
Clusters: Compute resources in one or more GCP regions
Common Use Cases
Time-series data (IoT, monitoring, financial ticks)
Real-time analytics and personalization
Large-scale graph or recommendation engines
User profile and event data storage
Example: Deploying Bigtable with Terraform
resource "google_bigtable_instance" "main" {
name = "my-bigtable-instance"
instance_type = "PRODUCTION"
cluster {
cluster_id = "my-bigtable-cluster"
zone = "us-central1-b"
num_nodes = 3
storage_type = "SSD"
}
}
resource "google_bigtable_table" "users" {
name = "users"
instance_name = google_bigtable_instance.main.name
column_family {
family = "profile"
}
column_family {
family = "activity"
}
}
Example: Writing and Reading Data (Python)
from google.cloud import bigtable
client = bigtable.Client(project="my-project", admin=True)
instance = client.instance("my-bigtable-instance")
table = instance.table("users")
# Write a row
direct_row = table.direct_row("user#1234")
direct_row.set_cell("profile", "name", "Alice")
direct_row.set_cell("activity", "last_login", "2024-06-01T12:00:00Z")
direct_row.commit()
# Read a row
row = table.read_row("user#1234")
print(row.cells["profile"][b"name"][0].value)
Best Practices
Row key design: Distribute writes evenly to avoid hotspots (e.g., use hashed prefixes)
Column family planning: Group columns with similar access patterns
Monitor performance: Use GCP Monitoring for CPU, storage, and latency
Backup and restore: Use scheduled backups for disaster recovery
References
Last updated