Observability Patterns

Distributed Tracing

OpenTelemetry Edge Configuration

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: edge-collector
spec:
  mode: daemonset
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            
    processors:
      batch:
        timeout: 1s
        send_batch_size: 1024
      memory_limiter:
        check_interval: 1s
        limit_mib: 100
        
    exporters:
      otlp/central:
        endpoint: central-collector:4317
        sending_queue:
          enabled: true
          num_consumers: 10
          queue_size: 5000
        retry_on_failure:
          enabled: true
          initial_interval: 5s
          max_interval: 30s
          max_elapsed_time: 300s
          
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [memory_limiter, batch]
          exporters: [otlp/central]

Metrics Collection

Edge Prometheus Setup

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: edge-prometheus
spec:
  replicas: 1
  retention: 24h
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage
        resources:
          requests:
            storage: 10Gi
  serviceMonitorSelector:
    matchLabels:
      app: edge-service
  resources:
    requests:
      memory: 400Mi
    limits:
      memory: 800Mi

Log Aggregation

Vector Configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: vector-config
data:
  vector.toml: |
    [sources.edge_logs]
      type = "kubernetes_logs"
      auto_partial_merge = true
      
    [transforms.edge_parser]
      type = "remap"
      inputs = ["edge_logs"]
      source = '''
        structured = parse_json!(.message)
        . = merge(., structured)
      '''
      
    [sinks.edge_storage]
      type = "loki"
      inputs = ["edge_parser"]
      endpoint = "http://loki:3100"
      encoding.codec = "json"
      labels = {app = "{{ kubernetes.pod_name }}", namespace = "{{ kubernetes.namespace }}"}

Best Practices

  1. Data Collection

    • Local buffering

    • Batch processing

    • Compression

    • Priority handling

  2. Resource Management

    • Storage optimization

    • Network efficiency

    • CPU/Memory limits

    • Retention policies

  3. Visualization

    • Real-time dashboards

    • Latency tracking

    • Error monitoring

    • Health status

  4. Edge Analytics

    • Pattern detection

    • Anomaly identification

    • Performance trends

    • Capacity planning

Last updated