Docker Image Security
Enterprise Docker Image Security Policy
Container Image Lifecycle Management & Security Framework
Table of Contents
Executive Summary
Governance and Ownership Model
Base Image Creation and Management
Container Registry Management
Security Scanning and Vulnerability Management
License Compliance and Open Source Management
Image Lifecycle Management
Best Practices and Technical Standards
Implementation Guidance
Assessment and Continuous Improvement
Appendices
1. Executive Summary
Container security represents a critical component of modern infrastructure protection. Unlike traditional virtual machines, containers share the host kernel, making isolation boundaries more permeable and security concerns more nuanced. A compromised container image can serve as a persistent attack vector, embedded with malicious code that propagates across development, staging, and production environments.
This policy establishes enterprise-wide standards for container image security, addressing the full lifecycle from base image selection through runtime deployment. The policy recognizes that container security is not a point-in-time assessment but rather a continuous process requiring automated tooling, clear ownership, and regular updates.
Purpose and Scope
This policy applies to all container images used within the organization, regardless of deployment target (Kubernetes, Docker Swarm, ECS, Cloud Run, etc.). It covers:
Base operating system images maintained centrally
Language runtime images (Python, Node.js, Java, Go, etc.)
Application-specific images built by development teams
Third-party images imported from external registries
Utility and tooling images used in CI/CD pipelines
The policy does not cover virtual machine images, serverless function packages (Lambda, Cloud Functions), or legacy application deployment methods.
Key Objectives
Reduce Attack Surface: Minimize the number of packages, libraries, and services included in container images. Each additional component represents a potential vulnerability. Our baseline Ubuntu image contains 88 packages versus 280 in the standard ubuntu:latest image.
Establish Clear Accountability: Define unambiguous ownership for each layer of the container image stack. When CVE-2024-12345 is discovered in OpenSSL, there should be no question about who is responsible for patching base images versus application dependencies.
Enable Rapid Response: Security vulnerabilities can be announced at any time. Our infrastructure must support building, testing, and deploying patched images within hours, not days or weeks.
Maintain Compliance: Track all software components, licenses, and versions to meet regulatory requirements (SOC 2, ISO 27001, GDPR) and avoid legal exposure from license violations.
Support Developer Velocity: Security should not become a bottleneck. Automated scanning, clear base images, and self-service tools enable developers to build securely without waiting for security team approvals.
2. Governance and Ownership Model
2.1 Organizational Structure
The traditional "throw it over the wall" model fails for container security. Development teams cannot rely solely on a central security team, and security teams cannot review every application deployment. Instead, we implement a shared responsibility model with clear boundaries.
2.1.1 Platform Engineering Team
Primary Responsibilities:
Base Image Curation and Maintenance Platform Engineering owns the "golden images" that serve as the foundation for all application containers. This includes:
Selecting upstream base images from trusted sources
Applying security hardening configurations
Removing unnecessary packages and services
Installing common tooling and certificates
Configuring non-root users and proper file permissions
Maintaining multiple versions to support different application needs
Example base image inventory:
Security Baseline Definition Platform Engineering defines what "secure by default" means for the organization. This includes technical controls like:
Mandatory non-root execution (UID >= 10000)
Read-only root filesystem where feasible
Dropped capabilities (NET_RAW, SYS_ADMIN, etc.)
No setuid/setgid binaries
Minimal installed packages (documented exceptions only)
Security-focused default configurations
Vulnerability Response for Base Layers When vulnerabilities affect base OS packages or language runtimes, Platform Engineering owns the response:
Assess impact and exploitability
Build patched base images
Test for breaking changes
Publish updated images with clear release notes
Notify consuming teams
Track adoption and follow up on stragglers
Registry Operations Platform Engineering manages the container registry infrastructure:
High availability configuration
Backup and disaster recovery
Access control and authentication
Image replication across regions
Storage optimization and garbage collection
Audit logging and compliance reporting
2.1.2 Application Development Teams
Primary Responsibilities:
Application Layer Security Development teams own everything they add on top of base images:
Application source code and binaries
Application dependencies (npm packages, pip packages, Maven artifacts, Go modules)
Application configuration files
Secrets management (though secrets should never be in images)
Custom scripts and utilities
Application-specific system configurations
Dependency Management Teams must actively maintain their dependency trees:
Vulnerability Remediation When scans identify vulnerabilities in application dependencies:
Assess whether the vulnerability affects the application (not all CVEs are exploitable in every context)
Update the vulnerable dependency to a patched version
Test the application thoroughly (breaking changes may have been introduced)
Rebuild and redeploy the image
Document the remediation in the ticket system
Image Rebuilds When Platform Engineering releases updated base images, development teams must:
Update the FROM line in Dockerfiles
Rebuild application images
Run integration tests
Deploy updated images through standard deployment pipelines
This typically happens monthly for routine updates and within days for critical security patches.
2.1.3 Security Team
Primary Responsibilities:
Policy Definition and Enforcement The Security team defines the security requirements that Platform Engineering and Development teams must implement. This includes:
Vulnerability severity thresholds (no critical CVEs in production)
Allowed base image sources (Docker Hub verified publishers, Red Hat, etc.)
Prohibited packages and configurations (telnet, FTP, debug symbols in production)
Scanning frequency and tool requirements
Exception process and approval workflow
Security Assessment and Validation The Security team validates that policies are effective:
Penetration testing of container images and runtime environments
Security architecture reviews of container platforms
Audit of base image hardening configurations
Review of scanning tool configurations and coverage
Analysis of vulnerability trends and response times
Threat Intelligence Integration Security maintains awareness of the threat landscape:
Monitoring security mailing lists and CVE databases
Analyzing proof-of-concept exploits for applicability
Coordinating disclosure of internally-discovered vulnerabilities
Providing context on vulnerability severity and exploitability
Incident Response When security incidents involve containers:
Leading forensic analysis of compromised containers
Coordinating response across Platform Engineering and Development teams
Identifying root causes and recommending preventive measures
Documenting incidents for lessons learned
2.2 Shared Responsibility Model
Container images are composed of layers, each with different ownership and security obligations.
Layer-by-Layer Breakdown
Base OS Layer (Platform Engineering Responsibility)
This layer includes the operating system packages and core utilities. For an Ubuntu-based image, this includes:
libc6, libssl3, libcrypto, and other core libraries
bash, sh, coreutils
Package managers (apt, dpkg)
System configuration files in /etc
When a vulnerability like CVE-2024-XXXXX affects libssl3, Platform Engineering must:
Monitor for the updated package from Ubuntu
Build a new base image with the patched package
Test that existing applications remain functional
Release the updated base image
Notify teams to rebuild
Runtime Layer (Platform Engineering Responsibility)
Language runtimes and frameworks maintained by Platform Engineering:
Python interpreter and standard library
Node.js runtime and built-in modules
OpenJDK JVM and class libraries
Go runtime
System-level dependencies these runtimes need
Example: When a vulnerability is discovered in the Node.js HTTP parser, Platform Engineering updates the Node.js base images across all maintained versions (Node 18, 20, 22) and publishes new images.
Application Dependencies (Development Team Responsibility)
Third-party libraries and packages installed by application teams:
npm packages (express, lodash, axios)
Python packages (django, flask, requests)
Java dependencies (spring-boot, hibernate, jackson)
Go modules (gin, gorm)
Example: When CVE-2024-YYYYY is discovered in the lodash npm package, the development team must:
Update package.json to specify a patched version
Run
npm auditto verify the fixTest the application with the updated dependency
Rebuild and redeploy the image
Application Code (Development Team Responsibility)
Custom code written by the organization:
Application logic and business rules
API endpoints and handlers
Database queries and data access
Authentication and authorization code
Configuration management
Security concerns include:
Injection vulnerabilities (SQL, command, XSS)
Broken authentication and session management
Sensitive data exposure
Security misconfigurations
Insecure deserialization
Boundary Cases and Escalation
Some security issues span multiple layers and require coordination:
Example 1: Upstream Package Delayed A critical vulnerability is discovered in Python 3.11.7, but the patch won't be released by the Python maintainers for several days. Platform Engineering must decide:
Wait for the official patch (safest but slower)
Backport the patch manually (faster but requires expertise)
Switch to an alternative Python distribution (complex migration)
This decision requires input from Security (risk assessment) and Development teams (impact assessment).
Example 2: Vulnerability in Shared Dependency OpenSSL is used by both the base OS and application dependencies. A vulnerability is discovered that affects specific usage patterns. Platform Engineering patches the OS-level OpenSSL, but some applications have bundled OpenSSL statically. Coordination is needed to identify and remediate all instances.
Example 3: Zero-Day Exploitation An actively exploited zero-day vulnerability is discovered in a widely-used package. Security team must:
Immediately assess blast radius (which images and deployments affected)
Coordinate emergency patching or mitigation
Potentially take affected services offline temporarily
Fast-track patches through testing and deployment
3. Base Image Creation and Management
3.1 Base Image Selection Criteria
Selecting the right base image is the most important security decision in the container image lifecycle. A poor choice creates technical debt that compounds over time.
3.1.1 Approved Base Image Sources
Official Docker Hub Images (Verified Publishers)
Docker Hub's verified publisher program provides some assurance of image authenticity and maintenance. However, not all official images meet enterprise security standards.
Approved:
ubuntu:22.04- Widely used, well-documented, extensive package ecosystemalpine:3.19- Minimal attack surface, small size, but uses musl libc (compatibility concerns)python:3.11-slim- Official Python builds with minimal OS layersnode:20-alpine- Official Node.js on Alpine basepostgres:16-alpine- Official PostgreSQL builds
Prohibited:
ubuntu:latest- Unpredictable, changes without warning, breaks reproducibilitydebian:unstable- Unstable by definition, not suitable for productionAny image without a verified publisher badge
Red Hat Universal Base Images (UBI)
Red Hat provides UBI images that are freely redistributable and receive enterprise-grade security support:
Benefits:
Predictable release cycle aligned with RHEL
Security errata published promptly
Compliance with enterprise Linux standards
Support available through Red Hat
Drawbacks:
Larger image size than Alpine
Fewer packages available than Debian/Ubuntu
Requires Red Hat-compatible tooling
Google Distroless Images
Distroless images contain only the application and runtime dependencies, removing package managers, shells, and system utilities:
Benefits:
Minimal attack surface (no shell for attackers to use)
Smallest possible image size
Reduced vulnerability count
Forces proper multi-stage builds
Drawbacks:
Debugging requires external tools (ephemeral containers, kubectl debug)
Cannot install packages in running containers
Limited to statically-linked binaries or specific language runtimes
Steeper learning curve for developers
Chainguard Images
Chainguard provides hardened, minimal images with strong supply chain security:
Benefits:
Updated daily with latest patches
Minimal CVE count
SBOM provided for every image
Signed with Sigstore for verification
Drawbacks:
Requires account for private registry access
Less community documentation than official images
Breaking changes possible with frequent updates
3.1.2 Selection Evaluation Criteria
Security Posture Assessment
Before approving a base image, Platform Engineering must evaluate:
Current Vulnerability Count: Use multiple scanners to establish baseline
Update Frequency: Review the image's update history
Security Response Time: Research how quickly security issues are addressed
Review CVE databases for past vulnerabilities
Check mailing lists for security announcements
Examine GitHub issues for security-related bugs
Validate that security fixes are backported to older versions
Provenance and Supply Chain: Verify image authenticity
Maintenance Commitment Analysis
Evaluate the long-term viability of the base image:
Support Lifecycle: Understand the support timeline
Ubuntu LTS: 5 years standard support, 10 years with ESM
Debian: ~5 years per major release
Alpine: ~2 years per minor release
RHEL/UBI: 10 years full support
Vendor Commitment: Assess the organization behind the image
Is there a commercial entity providing support?
Is the project community-driven (risk of maintainer burnout)?
Are security updates contractually guaranteed?
Deprecation Policy: Understand end-of-life procedures
Size and Efficiency Evaluation
Image size affects:
Storage costs in registries
Network transfer time during deployment
Pod startup time in Kubernetes
Cache efficiency in CI/CD pipelines
Compare alternatives:
Analyze layer composition:
License Compliance Review
Ensure all components use acceptable licenses:
Flag problematic licenses:
AGPL (requires source disclosure for network services)
GPL-3.0 with certain interpretations (patent retaliation clauses)
Proprietary licenses requiring explicit approval
Commons Clause and similar source-available licenses
3.2 Image Hardening Standards
3.2.1 Non-Root User Configuration
Containers should never run as root (UID 0). This limits the impact of container escapes and follows the principle of least privilege.
Implementation Pattern:
Validation:
3.2.2 Minimal Package Set
Every package in an image is a potential vulnerability. Remove everything not strictly required.
Analysis Technique:
Package Audit Process:
Prohibited Packages:
Never include in production images:
Shells beyond
/bin/sh(bash, zsh, fish)Text editors (vim, nano, emacs)
Network utilities (telnet, ftp, netcat)
Debuggers (gdb, strace, ltrace)
Compilers (gcc, clang unless required at runtime)
Version control (git, svn)
Package manager databases (can be removed post-install)
Example hardened Dockerfile:
3.2.3 Read-Only Root Filesystem
Making the root filesystem read-only prevents attackers from modifying system files or installing persistence mechanisms.
Implementation:
Kubernetes Deployment:
Testing:
3.2.4 Capability Dropping
Linux capabilities allow fine-grained control over privileges. Drop all capabilities and add back only what's needed.
Common capabilities to always drop:
CAP_SYS_ADMIN- Mount filesystems, load kernel modulesCAP_NET_RAW- Create raw sockets (ping, traceroute)CAP_SYS_PTRACE- Debug processesCAP_SYS_MODULE- Load kernel modulesCAP_DAC_OVERRIDE- Bypass file permissionsCAP_CHOWN- Change file ownershipCAP_SETUID/CAP_SETGID- Change process UID/GID
Verification:
3.2.5 Security Metadata and Labels
Embed security-relevant metadata in images for automated policy enforcement and audit:
Query labels programmatically:
3.3 Image Build Process
3.3.1 Reproducible Builds
Builds must be reproducible: given the same inputs, produce bit-for-bit identical outputs. This enables verification and prevents supply chain attacks.
Techniques for Reproducibility:
Verification:
3.3.2 Multi-Stage Builds
Use multi-stage builds to separate build dependencies from runtime dependencies:
Benefits:
Build stage can include compilers, dev tools (500MB+)
Runtime stage contains only the binary (5-10MB)
Smaller attack surface (no build tools in production)
Faster deployment (smaller images to transfer)
Advanced Multi-Stage Pattern:
3.3.3 Build Caching Strategy
Optimize Docker layer caching to speed up builds:
BuildKit Advanced Caching:
Build with BuildKit:
3.3.4 SBOM Generation During Build
Generate Software Bill of Materials as part of the build process:
Or generate during CI/CD:
4. Container Registry Management
4.1 Registry Architecture
A production-grade container registry requires more than just image storage. It needs security controls, high availability, and integration with scanning tools.
4.1.1 Registry Selection Criteria
Harbor (Recommended for On-Premise)
Harbor is an open-source registry with enterprise features:
Features we leverage:
Role-based access control with LDAP/OIDC integration
Integrated Trivy scanning
Content signing with Notary
Image replication for disaster recovery
Webhook notifications for CI/CD integration
Retention policies for storage management
Audit logging of all operations
AWS ECR (Recommended for AWS Deployments)
For AWS-native deployments, ECR provides tight integration:
Lifecycle policy example:
4.1.2 Registry Organization and Naming
Repository Structure:
Naming Conventions:
Tag Strategy:
4.2 Access Control and Authentication
4.2.1 RBAC Configuration
Harbor Project-Level Permissions:
Kubernetes Service Account:
4.2.2 Automated Credential Rotation
4.3 Image Promotion Workflow
4.3.1 Automated Quality Gates
4.3.2 Policy Enforcement with OPA
5. Security Scanning and Vulnerability Management
5.1 Scanning Tools and Integration
5.1.1 Trivy Deep Dive
Trivy is our primary scanner due to its speed, accuracy, and broad coverage.
Installation and Configuration:
Basic Scanning:
Advanced Usage:
CI/CD Integration:
5.1.2 Grype for Validation
Grype provides a second opinion on vulnerabilities using different data sources:
Why Use Multiple Scanners:
Different scanners have different vulnerability databases and detection heuristics:
Trivy uses its own database aggregated from NVD, Red Hat, Debian, Alpine, etc.
Grype uses Anchore's feed service with additional vulnerability data
Snyk has proprietary vulnerability data from security research
Clair uses data directly from distro security teams
A vulnerability might appear in one scanner days before others, or might be a false positive in one but not another.
5.1.3 Snyk for Developer Integration
Snyk provides IDE integration and developer-friendly workflows:
IDE Integration:
Pre-commit Hook:
5.2 Scanning Frequency and Triggers
5.2.1 Build-Time Scanning
Every image must be scanned before pushing to the registry:
5.2.2 Registry Continuous Scanning
Harbor automatically scans images on schedule:
Scheduled rescanning finds newly-discovered vulnerabilities:
5.2.3 Runtime Scanning
Scan running containers to detect runtime modifications or configuration drift:
Falco Runtime Detection:
5.3 Vulnerability Severity Classification
5.3.1 CVSS Scoring Context
Not all high CVSS scores mean immediate risk. Context matters:
5.3.2 Exploitability Assessment
Not all CVEs are exploitable in your specific context:
Automated Exploitability Checks:
5.4 Vulnerability Response Process
5.4.1 Automated Notification System
5.4.2 Remediation Workflow
6. License Compliance and Open Source Management
6.1 License Scanning Implementation
6.1.1 SBOM Generation
SBOM Structure:
6.1.2 License Policy Enforcement
Integration in CI/CD:
6.2 License Compliance Database
Track all licenses across the organization:
6.3 SBOM Management
6.3.1 Storing and Retrieving SBOMs
6.3.2 SBOM Comparison for Updates
7. Image Lifecycle Management
7.1 Semantic Versioning Implementation
7.1.1 Version Tagging Strategy
7.1.2 Automated Version Bumping
7.2 Automated Update System
7.2.1 Dependency Update Automation
7.2.2 Base Image Update Notification
7.3 Image Deprecation Process
7.3.1 Deprecation Metadata
7.3.2 Automated Deprecation Detection
8. Best Practices and Technical Standards
8.1 Advanced Dockerfile Patterns
8.1.1 Distroless Migration
8.1.2 Argument and Secret Handling
8.1.3 Effective Layer Caching
8.2 Runtime Security Configurations
8.2.1 Pod Security Standards
8.2.2 NetworkPolicy Implementation
9. Implementation Guidance
9.1 Infrastructure Setup
9.1.1 Harbor Installation with High Availability
Install Harbor:
9.1.2 Scanning Infrastructure Setup
9.2 Base Image Build Pipeline
9.3 Admission Control
10. Assessment and Continuous Improvement
10.1 Security Metrics Dashboard
10.2 Continuous Improvement Feedback Loop
11. Appendices
11.1 Appendix A: Dockerfile Template Library
A.1 Python Application
A.2 Node.js Application
A.3 Go Application
11.2 Appendix B: CI/CD Integration Examples
B.1 GitLab CI
B.2 GitHub Actions
See earlier example in section 9.2
B.3 Jenkins Pipeline
10. Developer Guidelines: Anti-Patterns and Best Practices
10.1 Introduction: Using Base Images Correctly
Base images are designed to provide a secure, consistent foundation for applications. However, developers can inadvertently undermine this foundation through common anti-patterns. This section provides clear guidance on what NOT to do, and how to properly use base images to maintain security and operational consistency.
The Golden Rule: Treat base images as immutable building blocks. Add your application on top, but never modify the base layer security configurations.
10.2 Critical Anti-Patterns to Avoid
10.2.1 Anti-Pattern: Running as Root in Application Layer
❌ WRONG: Switching back to root after base image sets non-root user
Why this is dangerous:
Completely negates the security hardening in the base image
Container runs with root privileges, allowing attackers full system access
Violates security policies and will fail compliance scans
Defeats the purpose of using a hardened base image
✅ CORRECT: Stay as non-root user, install dependencies properly
If you absolutely need system packages:
10.2.2 Anti-Pattern: Installing Unnecessary System Packages
❌ WRONG: Installing everything "just in case"
Why this is wrong:
Adds 100+ MB to image size unnecessarily
Introduces dozens of potential vulnerabilities
vim, bash, openssh are debug tools that shouldn't be in production
sudo in a container makes no sense
build-base not needed at runtime
Security impact:
Each package is a potential CVE entry point
Attackers have more tools available if they compromise the container
Larger attack surface to maintain and patch
✅ CORRECT: Minimal runtime dependencies only
Result:
Image size: 450MB → 180MB
Zero unnecessary packages
No debug tools for attackers to abuse
Faster deployments and startup
10.2.3 Anti-Pattern: Modifying Base Image Security Configurations
❌ WRONG: Changing file permissions, adding capabilities, modifying system configs
Why this is dangerous:
chmod 777 allows any user to write anywhere (security nightmare)
chmod +s (setuid) allows privilege escalation attacks
Adding sudo defeats non-root user security
Violates least privilege principle
What happens:
Security scans will flag these violations
Kubernetes Pod Security Standards will reject the pod
Creates security incidents waiting to happen
✅ CORRECT: Work within the security model
If your application truly needs to write outside /app:
10.2.4 Anti-Pattern: Embedding Secrets in Images
❌ WRONG: Secrets in Dockerfile or build arguments
Why this is catastrophic:
Secrets are baked into image layers permanently
Anyone with registry access can extract secrets
docker historyshows all build argumentsImage layers are cached and may be widely distributed
Secrets can't be rotated without rebuilding image
Real attack scenario:
✅ CORRECT: Use proper secret management
Build command:
Runtime secrets - use environment variables or secret stores:
Or use a secret manager:
10.2.5 Anti-Pattern: Using 'latest' or Unpinned Versions
❌ WRONG: Unpredictable base image versions
Why this is problematic:
'latest' tag can point to different images tomorrow
Builds are not reproducible
Can't roll back to previous version reliably
Team members may build different images from same Dockerfile
Production and development may run different code
Real scenario:
✅ CORRECT: Pin exact versions with digests
requirements.txt with hashes:
Generate hashes automatically:
When to update base images:
10.2.6 Anti-Pattern: Bloated Application Images
❌ WRONG: Copying entire project directory
What gets copied (unintentionally):
.git/ directory (10+ MB, contains entire history)
node_modules/ from developer's machine
.env files with local secrets
test/ directory with test fixtures
docs/ directory
.vscode/, .idea/ IDE configurations
*.log files
build artifacts from local builds
Result:
Image size: 800 MB instead of 200 MB
Potential secret leakage
Inconsistent builds (using local node_modules)
Longer build and deployment times
✅ CORRECT: Use .dockerignore and selective COPY
Result:
Image size: 800 MB → 185 MB
No secrets or unnecessary files
Reproducible builds
Faster deployments
10.2.7 Anti-Pattern: Ignoring Base Image Updates
❌ WRONG: Never updating base images
Why this is dangerous:
Accumulating security vulnerabilities
Missing performance improvements
Using deprecated/unsupported software
Compliance violations
Technical debt grows exponentially
What happens:
Security team flags your image with critical CVEs
You're forced to do emergency update during incident
Update is now complex (18 months of changes)
Application breaks due to multiple breaking changes
Weekend spent firefighting instead of gradual updates
✅ CORRECT: Regular base image updates
Establish update cadence:
Response to platform team notifications:
Developer response:
10.3 Best Practices for Using Base Images
10.3.1 Multi-Stage Builds for Clean Production Images
The pattern:
Benefits:
Build stage: 850 MB (with gcc, build tools)
Runtime stage: 180 MB (only runtime dependencies)
No build tools for attackers to abuse
Faster deployments and pod startup
10.3.2 Proper Dependency Management
Pin everything:
Lock files are mandatory:
package-lock.json for npm
yarn.lock for Yarn
poetry.lock for Poetry
Cargo.lock for Rust
go.mod and go.sum for Go
Always commit lock files to git!
10.3.3 Efficient Layer Caching
Order matters:
Cache invalidation example:
10.3.4 Health Checks and Observability
Add proper health checks:
Implement health endpoint in application:
10.3.5 Proper Logging Configuration
Log to stdout/stderr, not files:
Problems:
Log files grow indefinitely, filling up container disk
Can't view logs with
kubectl logsordocker logsLogs lost when container restarts
Need to mount volumes just for logs
Structured logging (even better):
10.4 Testing Your Images
10.4.1 Local Testing Before Push
Always test locally first:
10.4.2 Verify Base Image Compliance
Check that you're using approved base image:
10.5 Common Developer Questions
Q: "The base image doesn't have the package I need. What do I do?"
Option 1: Check if it's really needed at runtime
Option 2: Request it be added to base image
Create a ticket with platform team:
Option 3: Use a specialized base image
If it's unique to your team:
Q: "My application needs to write files. How do I do that with read-only filesystem?"
Use designated writable locations:
Or use Kubernetes volumes:
Q: "Can I use a different base image for local development vs production?"
No. Use the same base image everywhere.
Why:
"Works on my machine" problems
Different vulnerabilities in dev vs prod
Inconsistent behavior
Defeats purpose of containers
✅ CORRECT - Same image everywhere:
10.6 Pre-Commit Checklist
Before committing Dockerfile changes, verify:
10.7 Getting Help
When you're stuck:
Check Documentation: https://docs.company.com/base-images/
Slack Channel: #base-images-support
Office Hours: Every Tuesday 2-3pm
Create Ticket: For feature requests or bugs
Emergency: Page platform-engineering (P1 issues only)
What to include when asking for help:
11. The Container Base Images Team: Structure and Operations
11.1 The Case for Centralization
Organizations that successfully scale container adoption almost universally adopt a centralized approach to base image management. This isn't merely an operational convenience—it's a strategic necessity driven by several factors that become more critical as container usage grows.
11.1.1 The Cost of Decentralization
When individual development teams maintain their own base images, organizations face compounding problems:
Knowledge Fragmentation Security expertise gets diluted across teams. A critical CVE affecting OpenSSL requires coordination across dozens of teams, each maintaining their own fork of Ubuntu or Alpine. Response time measured in weeks instead of hours.
Redundant Effort Ten teams building Node.js images means ten teams researching the same security hardening, ten teams implementing non-root users, ten teams fighting the same dockerfile caching issues. Multiply this across Python, Java, Go, and other runtimes.
Inconsistent Security Posture Team A's images drop all capabilities and use distroless. Team B's images run as root with a full Ubuntu install. Both are "approved" because there's no central standard. Incident responders waste hours understanding each team's custom security model.
Scale Problems With 100 development teams each maintaining 3 images, that's 300 images to track, scan, and update. When a critical vulnerability drops, coordinating remediation across 100 teams is organizational chaos.
Compliance Nightmares Auditors ask "How many of your container images have critical vulnerabilities?" The answer: "We don't know—each team manages their own." SOC 2, ISO 27001, and PCI-DSS audits become exponentially more complex.
11.1.2 The Benefits of Centralization
Industry leaders like Netflix, Google, and Spotify have demonstrated that centralizing base image management delivers measurable benefits:
Netflix uses centralized base images created by their Aminator tool, enabling them to launch three million containers per week with consistent security and operational standards across all workloads.
Single Source of Truth One team maintains the "golden images" that serve as the foundation for all applications. When CVE-2024-12345 hits, one team patches it once, and all consuming teams rebuild. Response time: hours, not weeks.
Expert Focus A dedicated team develops deep expertise in container security, operating system hardening, supply chain security, and vulnerability management. This expertise is difficult to maintain when spread across application teams focused on business logic.
Consistent Security All images follow the same hardening standards: non-root users, minimal packages, dropped capabilities, signed SBOMs. Security tooling knows what to expect. Incident response is streamlined because all images follow known patterns.
Economies of Scale One team maintaining 20 well-crafted base images serves 100 application teams building 500+ application images. The cost of the base images team is amortized across the entire engineering organization.
Faster Developer Onboarding New developers don't need to learn dockerfile best practices, security hardening, or vulnerability management. They start FROM an approved base and focus on application code.
Audit Simplicity "How many critical vulnerabilities in base images?" Answer: "Zero—we have automated scanning with blocking gates." "How do you track software licenses?" Answer: "Every base image has a signed SBOM in our registry."
11.2 Team Structure and Composition
The Container Base Images team (often called Platform Engineering, Developer Experience, or Golden Images team) typically sits at the intersection of infrastructure, security, and developer productivity. The exact structure varies based on organization size, but follows common patterns.
11.2.1 Core Team Roles
Platform Engineering Lead (Technical Lead)
This role owns the strategic direction and technical decisions for the base images program.
Responsibilities:
Define base image strategy and roadmap
Establish security and operational standards
Make technology choices (which base OS, scanning tools, registry platform)
Resolve conflicts between security requirements and developer needs
Represent base images in architecture reviews and security forums
Own relationships with security, compliance, and development leadership
Technical profile:
Deep expertise in containers, Linux, and cloud platforms
Strong security background (CVE analysis, threat modeling)
Experience with large-scale infrastructure (1000+ hosts)
Understanding of software development workflows and pain points
Ability to design systems for 100+ consuming teams
Typical background: Senior infrastructure engineer, former SRE/DevOps lead, or security engineer with platform experience.
Container Platform Engineers (2-4 engineers)
These are the hands-on builders who create, maintain, and improve base images.
Responsibilities:
Build and maintain base images for different runtimes (Python, Node.js, Java, Go)
Implement security hardening (minimal packages, non-root, capabilities)
Automate image builds with CI/CD pipelines
Integrate scanning tools (Trivy, Grype, Syft)
Generate and sign SBOMs
Manage the container registry infrastructure
Respond to security vulnerabilities in base images
Write documentation and runbooks
Provide technical support to development teams
Technical profile:
Strong Linux system administration skills
Proficiency with Docker, Kubernetes, and container runtimes
Scripting and automation (Python, Bash, Go)
CI/CD expertise (GitHub Actions, GitLab CI, Jenkins)
Security tooling experience (vulnerability scanners, SBOM generators)
Typical background: DevOps engineers, infrastructure engineers, or developers with strong ops experience.
Security Engineer (Dedicated or Shared, 0.5-1 FTE)
This role ensures base images meet security standards and responds to vulnerabilities.
Responsibilities:
Define security requirements for base images
Review and approve security hardening configurations
Triage vulnerability scan results
Assess exploitability and business impact of CVEs
Coordinate security incident response for container issues
Conduct security audits of base images
Stay current on container security threats and best practices
Provide security training to platform engineers
Technical profile:
Container security expertise (image scanning, runtime security, admission control)
Vulnerability management experience
Understanding of attack vectors and exploit techniques
Familiarity with compliance frameworks (SOC 2, ISO 27001, PCI-DSS)
Ability to communicate risk to both technical and non-technical audiences
Typical background: Application security engineer, infrastructure security engineer, or security architect.
Developer Experience Engineer (Optional, 0.5-1 FTE)
This role focuses on making base images easy to use and understand for development teams.
Responsibilities:
Create comprehensive documentation and tutorials
Develop example applications demonstrating base image usage
Provide office hours and Slack support
Gather feedback from development teams
Create metrics dashboards showing base image adoption
Run training sessions and workshops
Advocate for developer needs in base image design
Build CLI tools and plugins to simplify common workflows
Technical profile:
Strong technical writing and communication skills
Understanding of developer workflows and pain points
Ability to translate technical concepts for different audiences
Basic to intermediate container knowledge
User research and feedback analysis skills
Typical background: Developer advocate, technical writer, or developer with strong communication skills.
11.2.2 Extended Team and Stakeholders
The base images team doesn't work in isolation. Success requires close collaboration with multiple groups:
Security Team Partnership
The security team provides:
Security requirements and standards
Threat intelligence and vulnerability context
Security audits and penetration testing
Incident response coordination
Compliance requirements interpretation
Integration points:
Weekly sync on new vulnerabilities and remediation status
Monthly security reviews of base images
Quarterly security audits and penetration tests
Joint incident response for container security issues
Security team has read access to base image repositories
Security team receives automated notifications of failed security scans
Application Development Teams (The Customers)
Development teams consume base images and provide feedback:
Use base images as FROM in their Dockerfiles
Report bugs and request new features
Provide feedback on documentation and usability
Participate in beta testing of new base image versions
Attend office hours and training sessions
Communication channels:
Dedicated Slack channel (#base-images-support)
Monthly office hours (Q&A session)
Quarterly all-hands presentation on roadmap and updates
Email distribution list for critical announcements
Self-service documentation portal
Compliance and Legal Teams
These teams ensure base images meet regulatory and legal requirements:
Review license compliance for all included packages
Validate SBOM generation and accuracy
Ensure audit trail for all base image changes
Approve exception requests for non-standard licenses
Participate in external audits (SOC 2, ISO 27001)
Integration points:
Automated SBOM delivery for all base images
Quarterly compliance review meetings
Annual audit preparation and support
License approval workflow integration
Cloud Infrastructure Team
The infrastructure team provides the foundation:
Container registry infrastructure (Harbor, ECR, ACR)
CI/CD platform (Jenkins, GitLab, GitHub Actions)
Monitoring and observability platform
Backup and disaster recovery
Network connectivity and access control
Shared responsibilities:
Registry capacity planning and scaling
Performance optimization
Incident response for registry outages
Cost optimization for storage and bandwidth
11.2.3 Team Scaling Model
Team size scales based on organization size and container adoption:
Small Organization (< 50 developers)
1 Platform Engineering Lead (50% time)
1-2 Platform Engineers
Security Engineer (shared resource, 25% time)
Supports: 5-10 base images, 50-100 application images
Medium Organization (50-500 developers)
1 Platform Engineering Lead (full time)
2-3 Platform Engineers
1 Security Engineer (dedicated, shared with AppSec)
1 Developer Experience Engineer (50% time)
Supports: 15-25 base images, 200-500 application images
Large Organization (500+ developers)
1 Platform Engineering Lead
4-6 Platform Engineers (may specialize by runtime or OS)
1-2 Security Engineers (dedicated)
1 Developer Experience Engineer
1 Site Reliability Engineer (focused on registry operations)
Supports: 30+ base images, 1000+ application images
Netflix's Titus platform team, which manages container infrastructure for the entire company, enables over 10,000 long-running service containers and launches three million containers per week, demonstrating how a focused platform team can support massive scale.
11.3 Responsibilities and Accountability
Clear ownership prevents gaps and duplication. The base images team owns specific layers of the container stack.
11.3.1 What the Base Images Team Owns
Base Operating System Images
Complete responsibility for OS-level base images:
Ubuntu 22.04, Alpine 3.19, Red Hat UBI 9
OS package selection and minimization
Security hardening (sysctl, file permissions, user configuration)
OS vulnerability patching and updates
OS-level compliance (CIS benchmarks, DISA STIGs)
Example: When CVE-2024-XXXX affects glibc in Ubuntu 22.04, the base images team:
Assesses impact (which base images affected, exploitability)
Builds patched base images
Tests for breaking changes
Publishes updated images
Notifies all consuming teams
Tracks adoption and follows up
Language Runtime Images
Complete responsibility for language runtime base images:
Python 3.11, Node.js 20, OpenJDK 21, Go 1.21, .NET 8
Runtime installation and configuration
Runtime security hardening
Runtime vulnerability patching
Best practice examples and documentation
Example: When a vulnerability affects the Node.js HTTP parser, the base images team:
Updates Node.js runtime in all supported versions (Node 18, 20, 22)
Rebuilds and tests base images
Updates documentation with migration notes
Publishes updated images with detailed changelogs
Notifies teams via Slack and email
Image Build Infrastructure
Complete responsibility for the build and publishing pipeline:
CI/CD pipelines for automated builds
Build environment security and compliance
Image signing infrastructure (Cosign, Notary)
SBOM generation automation
Image promotion workflows
Build reproducibility
Registry Infrastructure and Governance
Complete responsibility for the container registry:
Registry infrastructure (Harbor, ECR, ACR deployment)
High availability and disaster recovery
Access control and authentication
Image replication across regions
Storage optimization and garbage collection
Registry monitoring and alerting
Backup and restore procedures
Security Scanning and Vulnerability Management
Complete responsibility for base layer vulnerability management:
Vulnerability scanning infrastructure (Trivy, Grype, Clair)
Scan result analysis and triage
Base layer vulnerability remediation
Security advisory publication
Vulnerability metrics and reporting
Documentation and Developer Support
Complete responsibility for enabling teams to use base images:
Comprehensive usage documentation
Best practices guides
Migration guides for version updates
Troubleshooting guides
Example applications and templates
Office hours and support channels
Training materials and workshops
11.3.2 What the Base Images Team Does NOT Own
Clear boundaries prevent scope creep and confusion.
Application Code and Business Logic
Application teams own:
All application source code
Application-specific logic and features
Application configuration
Application testing and quality assurance
The base images team provides the platform; application teams build on it.
Application Dependencies
Application teams own:
Python packages installed via pip (requirements.txt)
Node.js packages installed via npm (package.json)
Java dependencies from Maven/Gradle
Go modules
Any other application-level dependencies
When a vulnerability exists in Flask, Django, Express, or Spring Boot, the application team must update those dependencies. The base images team may provide guidance, but does not own the remediation.
Application-Specific System Packages
Application teams own packages they add for application needs:
Database clients (postgresql-client, mysql-client)
Media processing libraries (ffmpeg, imagemagick)
Specialized utilities (wkhtmltopdf, pandoc)
The base images team provides minimal base images; application teams add what they specifically need.
Runtime Configuration
Application teams own:
Environment variables and configuration files
Application-specific security policies
Resource limits and requests
Health check endpoints
Logging and monitoring configuration
The base images team provides sensible defaults; application teams customize for their needs.
Kubernetes Manifests and Deployment
Application teams own:
Deployment YAML files
Service definitions
Ingress configurations
ConfigMaps and Secrets
Network policies
Pod security contexts
The base images team may provide best practice examples, but does not own production deployments.
11.3.3 Shared Responsibilities
Some areas require coordination between teams.
Image Rebuilds After Base Updates
Shared responsibility model:
Base Images Team: Publishes updated base images with detailed release notes
Application Teams: Rebuilds their images using updated base within SLA
Both: Coordinate testing and rollout to minimize disruption
SLA example:
Critical vulnerabilities: Application teams must rebuild within 7 days
High vulnerabilities: Application teams must rebuild within 30 days
Routine updates: Application teams should rebuild monthly
Incident Response
Shared responsibility based on incident type:
Container runtime vulnerabilities (runC, containerd): Base Images Team leads
Base OS vulnerabilities: Base Images Team leads
Application vulnerabilities: Application Team leads
Configuration issues: Application Team leads, Base Images Team advises
Registry outages: Infrastructure Team leads, Base Images Team supports
Security Audits and Compliance
Shared responsibility:
Base Images Team: Provides evidence for base image security controls
Application Teams: Provides evidence for application-level controls
Security Team: Conducts audits and validates controls
Compliance Team: Interprets requirements and coordinates audits
11.4 Cross-Team Collaboration Models
Effective collaboration is what makes centralized base images work. Different organizations adopt different models.
11.4.1 Platform-as-a-Product Model
Platform engineering teams treat the platform as a product rather than a project, providing clear guidance to other teams on how to interact via collaboration or self-service interfaces.
In this model, base images are a product with customers (development teams).
Product Management Approach
The base images team acts as a product team:
Maintains a public roadmap of planned features and improvements
Collects feature requests through structured process
Prioritizes work based on customer impact
Conducts user research and feedback sessions
Measures success through adoption metrics and satisfaction scores
Example roadmap:
Self-Service First
Developers should be able to use base images without tickets or approvals:
Comprehensive documentation answers 90% of questions
Example applications demonstrate common patterns
Automated tools (CLI, IDE plugins) simplify workflows
Clear error messages guide developers to solutions
When developers need help:
Check documentation and examples (self-service)
Ask in Slack channel (peer support)
Attend office hours (group support)
Create a ticket (last resort)
Feedback Loops
Regular mechanisms for gathering feedback:
Quarterly surveys measuring satisfaction and pain points
Monthly office hours for Q&A and feedback
Dedicated Slack channel monitored by team
Embedded engineer rotations (team member temporarily joins app team)
Retrospectives after major incidents or changes
SLAs and Commitments
The base images team makes explicit commitments:
Critical vulnerability patches: Published within 24 hours
High vulnerability patches: Published within 7 days
Feature requests: Initial response within 3 business days
Support questions: Response within 1 business day
Registry uptime: 99.9% availability
11.4.2 Embedded Engineer Model
Some organizations embed platform engineers temporarily with application teams.
How It Works
A platform engineer spends 2-4 weeks embedded with an application team:
Sits with the team (physically or virtually)
Participates in standups and planning
Helps migrate applications to approved base images
Identifies pain points and improvement opportunities
Provides training and knowledge transfer
Brings learnings back to platform team
Benefits:
Deep understanding of real developer workflows
Trust building between platform and application teams
Accelerated adoption of base images
Identification of documentation gaps
Real-world testing of platform features
Example rotation schedule:
Week 1-2: Embedded with Team A (payments team)
Week 3-4: Embedded with Team B (recommendations team)
Week 5-6: Back on platform team, incorporating learnings
Repeat with different teams quarterly
11.4.3 Guild or Center of Excellence Model
Team Topologies emphasizes collaboration and community models where platform teams establish communities of practice to share knowledge and standards across the organization.
A Container Guild brings together representatives from multiple teams.
Guild Structure
Meets monthly or quarterly
Members: Representatives from base images team + app teams
Rotating chair from application teams
Open to all interested engineers
Guild Responsibilities
Review and approve base image roadmap
Share knowledge and best practices across teams
Identify common pain points and solutions
Evangelize base images within their teams
Provide feedback on proposals before implementation
Help prioritize feature requests
Example Guild Activities
Lightning talks: Teams share how they use base images
Working groups: Tackle specific problems (multi-arch, air-gapped deployments)
RFC reviews: Comment on proposed changes to base images
Show and tell: Demonstrations of new features
Post-mortem reviews: Learn from incidents together
11.5 Collaboration with Security Team
The relationship with the security team is critical. Done wrong, it creates friction and slow-downs. Done right, it enables speed with confidence.
11.5.1 Security Partnership Model
Security as Enabler, Not Gatekeeper
Modern security teams enable safe velocity rather than blocking releases:
Provide automated tools (scanners, policies) rather than manual reviews
Define clear requirements rather than case-by-case approvals
Offer self-service compliance checks rather than ticket queues
Build guard rails rather than gates
Traditional (Slow):
Modern (Fast):
Joint Ownership of Security Standards
Base Images Team and Security Team collaborate to define standards:
Base Images Team proposes technical implementation
Security Team defines security requirements
Both teams iterate until requirements can be met practically
Security Team audits, Base Images Team implements
Both teams share accountability for security outcomes
Example collaboration on "non-root requirement":
11.5.2 Integration Points
Weekly Vulnerability Triage
Regular sync between Base Images Team and Security Team:
Review new CVEs affecting base images
Assess exploitability and business impact
Prioritize remediation work
Coordinate communication to application teams
Meeting structure (30 minutes):
Review critical CVEs from past week (10 min)
Update status on in-progress remediations (5 min)
Discuss upcoming security changes (10 min)
Review metrics: CVE count, MTTR, compliance rate (5 min)
Quarterly Security Audits
Security Team conducts comprehensive audits:
Review all base images for compliance with security standards
Penetration testing of container runtime environment
Audit of build pipeline security
Review of access controls and authentication
Validate SBOM accuracy and completeness
Output: Audit report with findings and recommendations Follow-up: Base Images Team addresses findings with defined timeline
Joint Incident Response
When container security incidents occur:
Security Team leads investigation and coordination
Base Images Team provides technical expertise on containers
Both teams participate in incident response calls
Base Images Team implements technical remediation
Security Team coordinates communication with stakeholders
Both teams participate in post-incident review
Shared Metrics Dashboard
Real-time dashboard visible to both teams:
Number of base images and application images
CVE count by severity across all images
Mean time to remediation for vulnerabilities
Percentage of images in compliance
Number of images with signed SBOMs
Registry availability and performance
Both teams use same metrics for decision-making and prioritization.
11.5.3 Security Team's Role in Base Images
What Security Team Provides
Security Requirements Definition:
"No critical or high CVEs in production"
"All images must run as non-root"
"All images must have signed SBOM"
"Images must follow CIS benchmarks"
Threat Intelligence:
Context on new vulnerabilities (exploitability, active exploitation)
Information on attack techniques targeting containers
Updates on regulatory requirements affecting containers
Security Tooling Expertise:
Recommendations on scanning tools
Configuration of security policies
Integration with SIEM and SOAR platforms
Audit and Compliance:
Interpretation of compliance requirements
Evidence collection for audits
Attestation of security controls
What Security Team Does NOT Own
Technical Implementation:
Security defines "run as non-root"
Base Images Team implements it in Dockerfiles
Day-to-Day Operations:
Security defines scanning requirements
Base Images Team operates scanners and triages results
Developer Support:
Security defines security training content
Base Images Team delivers training and provides ongoing support
11.6 Governance and Decision Making
Clear governance prevents conflicts and ensures alignment.
11.6.1 Decision Authority
Base Images Team Has Authority Over:
Which base operating systems to support (Ubuntu vs Alpine vs RHEL)
Which language runtimes and versions to provide
Technical implementation details (specific hardening techniques)
Build pipeline and tooling choices
Release schedule and versioning scheme
Registry infrastructure decisions
Security Team Has Authority Over:
Security requirements and standards
Acceptable vulnerability thresholds
Exception approvals for security policy violations
Incident response procedures
Compliance interpretation
Joint Decision Making Required For:
Adding new base image types that deviate from standards
Changes to security scanning thresholds
Major architectural changes affecting security
Exception processes and approval workflows
Application Teams Have Authority Over:
Which approved base image to use for their application
When to rebuild images after base updates (within SLA)
Application-specific configuration and dependencies
11.6.2 RFC (Request for Comments) Process
For significant changes, teams use an RFC process:
The RFC is reviewed by:
Security Team (security implications)
Relevant application teams (usability)
Infrastructure team (registry capacity)
Platform engineering leadership (strategic fit)
Approval requires: Security sign-off + majority support from stakeholders
11.6.3 Exception Process
Sometimes teams need exceptions from standard policies.
When Exceptions Are Needed
Legacy application cannot run on approved base images
Regulatory requirement demands specific OS version not yet supported
Performance requirement necessitates specific optimization
Time-bound workaround while permanent solution is developed
Exception Request Process
11.7 Prerequisites for Centralization
Successfully centralizing base image management requires organizational prerequisites.
11.7.1 Executive Sponsorship
Centralization will disrupt existing workflows. Executive support is essential.
What Leadership Must Provide
Mandate and Authority:
Clear statement that all teams will use centralized base images
Authority for base images team to set standards
Backing when teams push back on changes
Budget for team headcount and tooling
Example executive communication:
What Leadership Must NOT Do
Undermine the base images team when teams complain
Allow individual teams to opt out without valid reason
Cut budget or headcount before the program is mature
Set unrealistic timelines without consulting the team
11.7.2 Organizational Readiness
Cultural Readiness:
Teams must accept that not every team needs custom base images
Willingness to adopt shared standards over team-specific preferences
Trust in platform team to make good technical decisions
Commitment to collaboration over silos
Technical Readiness:
Container registry infrastructure in place
CI/CD pipelines capable of building images
Monitoring and logging infrastructure
Vulnerability scanning tools available
Basic container knowledge across engineering organization
Process Readiness:
Defined software development lifecycle
Incident response procedures
Change management process
Security review process
11.7.3 Initial Investment
Starting a base images program requires upfront investment in tooling, infrastructure, and team resources.
Tooling and Infrastructure
Container Registry:
Harbor, JFrog Artifactory, or cloud provider registry
High availability setup
Backup and disaster recovery configuration
Geographic replication for distributed teams
Security Scanning:
Trivy, Grype, Snyk, or commercial alternatives
Integration with CI/CD and registry
Continuous scanning infrastructure
Vulnerability database maintenance
CI/CD Platform:
GitHub Actions, GitLab CI, Jenkins, or alternatives
Build capacity for image builds
Pipeline templates and automation
Integration with registry and scanning tools
Monitoring and Observability:
Prometheus, Grafana, ELK stack, or alternatives
Metrics collection for base images
Alerting infrastructure
Dashboards for adoption and health metrics
SBOM and Signing Infrastructure:
Syft or CycloneDX for SBOM generation
Cosign or Notary for image signing
Key management infrastructure
Verification systems
Team Headcount
Year 1 (Foundation):
1 Platform Engineering Lead (full time)
2 Platform Engineers (full time)
1 Security Engineer (50% time, shared)
Total: 3.5 FTE
Year 2 (Scaling):
Add 1-2 Platform Engineers
Add Developer Experience Engineer (50% time)
Increase Security Engineer to 75% time
Total: 5-6 FTE
Implementation Timeline
Month 1-2: Hire team, setup infrastructure
Month 3-4: Create first base images, establish processes
Month 5-6: Pilot with 2-3 friendly application teams
Month 7-9: Iterate based on feedback, expand to more teams
Month 10-12: General availability, mandate for new applications
Year 2: Migrate existing applications, achieve critical mass
11.8 Success Metrics
Track these metrics to measure program success.
11.8.1 Security Metrics
Primary Security KPIs
Critical CVEs in base images
0
0
✅ Stable
High CVEs in base images
< 5
3
⬇️ Improving
Mean time to patch (Critical)
< 24 hours
18 hours
✅ Meeting target
Mean time to patch (High)
< 7 days
5 days
✅ Meeting target
% images with signed SBOM
100%
98%
⬆️ Improving
% production images compliant
> 95%
92%
⬆️ Improving
Secondary Security Metrics
Number of security exceptions granted
Average age of security exceptions
Security audit findings (trend over time)
Security incidents related to containers
Time from vulnerability disclosure to patch availability
11.8.2 Adoption Metrics
% teams using approved base images
100%
87%
% production images from approved bases
100%
94%
Number of application images built
-
487
Number of active base images
-
18
Average rebuild frequency (days)
< 30
22
11.8.3 Operational Metrics
Registry uptime
99.9%
99.95%
Average build time (base images)
< 10 min
7 min
Average image size
< 200 MB
156 MB
Storage costs per image
-
$0.12/month
Pull success rate
> 99.5%
99.8%
11.8.4 Developer Experience Metrics
Developer satisfaction score
> 4/5
4.2/5
Documentation helpfulness
> 4/5
3.8/5
Support ticket resolution time
< 2 days
1.5 days
Office hours attendance
-
12 avg
Time to onboard new team
< 1 week
4 days
11.9 Common Pitfalls and How to Avoid Them
Learn from organizations that struggled with centralization.
11.9.1 The "Ivory Tower" Problem
The "Set and Forget" mistake involves failing to update images regularly, leaving vulnerabilities unaddressed, and creating larger risk when maintenance eventually occurs. This leads to developer frustration and shadow IT workarounds.
The Mistake
Base images team becomes disconnected from real developer needs:
Makes decisions without consulting development teams
Prioritizes security over usability without compromise
Ignores feedback from application teams
Operates in a silo with minimal communication
The Result
Developers work around base images (shadow IT)
Low adoption and resistance to mandates
Friction between platform and application teams
Base images team viewed as blocker, not enabler
How to Avoid
Embed platform engineers with application teams regularly
Hold monthly office hours for Q&A and feedback
Include application team representatives in RFC reviews
Measure and track developer satisfaction
Make pragmatic trade-offs between security and usability
Celebrate teams that successfully migrate to base images
11.9.2 The "Boiling the Ocean" Problem
The Mistake
Trying to create perfect base images for every possible use case:
50 different base image variants
Support for every language version ever released
Every possible configuration option exposed
Attempting to satisfy every feature request
The Result
Overwhelming maintenance burden
Slow iteration and feature delivery
Analysis paralysis on decisions
Team burnout
How to Avoid
Start with 3-5 most common base images (Ubuntu, Python, Node.js)
Support only N and N-1 versions of language runtimes
Focus on 80% use case, make exceptions for the 20%
Say "no" to feature requests that benefit only one team
Regular deprecation of unused base images
Clear criteria for adding new base images
11.9.3 The "Perfect Security" Problem
The Mistake
Demanding perfect security at the expense of everything else:
Zero vulnerabilities required (including low/medium)
Blocking all deployments for minor security findings
No exception process, even for valid edge cases
Months-long security reviews for new base images
The Result
Developers circumvent security controls
Business velocity grinds to halt
Security team viewed as blocker
Constant escalations to leadership
How to Avoid
Risk-based approach: prioritize critical and high CVEs
Clear SLAs: critical within 24h, high within 7 days
Exception process with defined criteria
Measure security improvements, not perfection
Automated controls instead of manual reviews
Security team as consultants, not gatekeepers
11.9.4 The "Big Bang Migration" Problem
The Mistake
Mandating all teams migrate immediately:
6-month hard deadline for 100 teams
No grandfathering for legacy applications
Insufficient support for teams during migration
Underestimating complexity of migrations
The Result
Overwhelmed support channels
Missed deadlines and leadership frustration
Poor quality migrations done under pressure
Developer resentment
How to Avoid
Phased rollout: pilot → friendly teams → general availability → mandate
Mandate for new applications, gradual migration for existing
Dedicated migration support (embedded engineers)
Document common migration patterns
Celebrate successful migrations
Realistic timelines (12-18 months for large organizations)
11.10 Case Study: Implementing a Base Images Team
Fictional but realistic example based on common patterns.
Organization Profile
Size: 300 developers across 40 application teams
Platform: AWS with Kubernetes (EKS)
Current state: Teams maintain their own Dockerfiles, mix of Ubuntu/Alpine/random bases
Pain points: 47 critical CVEs across production images, inconsistent security, slow vulnerability response
Phase 1: Foundation (Months 1-3)
Team Formation
Hired Platform Engineering Lead (Sarah) from previous SRE role
Assigned two DevOps engineers (Mike and Priya) to platform team
Security engineer (Tom) allocated 50% time from AppSec team
Infrastructure Setup
Deployed Harbor on EKS for container registry
Integrated Trivy for vulnerability scanning
Set up GitHub Actions for automated image builds
Configured Slack channel #base-images-support
Initial Base Images
Created 5 base images:
Ubuntu 22.04 (minimal)
Python 3.11 (slim)
Node.js 20 (alpine)
OpenJDK 21 (slim)
Go 1.21 (alpine)
Each with:
Non-root user (UID 10001)
Minimal package set
Security hardening
Signed SBOM
Comprehensive documentation
Phase 2: Pilot (Months 4-6)
Selected Pilot Teams
Team A: New greenfield application (easy win)
Team B: Mature Node.js service (real-world test)
Team C: Python data pipeline (batch workload)
Pilot Results
Team A:
Migrated in 2 days
Faster builds due to pre-cached layers
Positive feedback on documentation
Team B:
Found bug in Node.js base image (missing SSL certificates)
Fixed in 1 day, updated docs
40% reduction in image size (450MB → 270MB)
Team C:
Required custom Python packages
Created tutorial for adding packages to base image
Successful migration after minor tweaks
Learnings
Documentation needed more examples
Support response time critical during migration
Teams need migration guide tailored to their stack
Phase 3: Expansion (Months 7-12)
Expanded Base Image Catalog
Added 8 more base images based on demand:
.NET 8
Ruby 3.2
PHP 8.3
Rust 1.75
Nginx (static file serving)
Plus distroless variants for production
Scaled Support
Added Developer Experience Engineer (Lisa, 50% time)
Created 15 example applications showing migration patterns
Started monthly office hours (avg 15 attendees)
Embedded engineer program (2-week rotations)
Adoption Progress
25 teams migrated (62% of teams)
156 application images using approved bases
Zero critical CVEs in base images
98% of teams satisfied with base images
Phase 4: Mandate and Scale (Year 2)
Executive Mandate
CTO announcement:
All new applications must use approved base images (effective immediately)
Existing applications: 12-month migration timeline
Exceptions require CISO approval
Full Team
Platform Engineering Lead (Sarah)
3 Platform Engineers (Mike, Priya, Jun)
Security Engineer (Tom, 75% time)
Developer Experience Engineer (Lisa, full time)
Results After 18 Months
Security Improvements:
Critical CVEs in production: 47 → 0
High CVEs in production: 123 → 8
Mean time to patch critical: 14 days → 18 hours
All images have signed SBOMs
Operational Improvements:
Average image size: 320MB → 180MB
Average build time: 15 min → 8 min
Registry storage efficiency improved significantly
Adoption:
39 of 40 teams using approved base images (98%)
1 legacy team with approved exception
487 application images on approved bases
Zero security exceptions in past 6 months
Developer Experience:
Satisfaction score: 4.2/5
92% would recommend to other teams
89% say base images make them more productive
Impact:
Security incident reduction: 80% fewer container-related incidents
Engineering time saved: Significant reduction in redundant work
Faster time to production for new apps: 2-3 days faster
The program demonstrated clear value through improved security posture, operational efficiency, and developer productivity.
12. References and Further Reading
12.1 Industry Standards and Frameworks
NIST (National Institute of Standards and Technology)
NIST Special Publication 800-190: Application Container Security Guide
https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-190.pdf
Comprehensive guidance on container security threats and countermeasures
NIST Special Publication 800-53: Security and Privacy Controls
https://csrc.nist.gov/publications/detail/sp/800-53/rev-5/final
Defines baseline configurations and security controls for information systems
CIS (Center for Internet Security)
CIS Docker Benchmark
https://www.cisecurity.org/benchmark/docker
Security configuration guidelines for Docker containers
CIS Kubernetes Benchmark
https://www.cisecurity.org/benchmark/kubernetes
Hardening standards for Kubernetes deployments
OWASP (Open Web Application Security Project)
OWASP Docker Security Cheat Sheet
https://cheatsheetseries.owasp.org/cheatsheets/Docker_Security_Cheat_Sheet.html
Practical security guidance for Docker containers
OWASP Kubernetes Security Cheat Sheet
https://cheatsheetseries.owasp.org/cheatsheets/Kubernetes_Security_Cheat_Sheet.html
Security best practices for Kubernetes
CNCF (Cloud Native Computing Foundation)
Software Supply Chain Best Practices
https://github.com/cncf/tag-security/blob/main/supply-chain-security/supply-chain-security-paper/CNCF_SSCP_v1.pdf
Comprehensive guide to securing the software supply chain
12.2 Container Security Tools Documentation
Vulnerability Scanning
Trivy Documentation
https://aquasecurity.github.io/trivy/
Official documentation for Trivy vulnerability scanner
Grype Documentation
https://github.com/anchore/grype
Anchore Grype vulnerability scanner documentation
Snyk Container Documentation
https://docs.snyk.io/products/snyk-container
Snyk's container security scanning platform
Clair Documentation
https://quay.github.io/clair/
Static analysis of vulnerabilities in containers
SBOM Generation
Syft Documentation
https://github.com/anchore/syft
SBOM generation tool from Anchore
CycloneDX Specification
https://cyclonedx.org/
SBOM standard format specification
SPDX Specification
https://spdx.dev/
Software Package Data Exchange standard
Image Signing and Verification
Cosign Documentation
https://docs.sigstore.dev/cosign/overview/
Container image signing and verification
Notary Project
https://notaryproject.dev/
Content signing and verification framework
Sigstore Documentation
https://www.sigstore.dev/
Improving software supply chain security
12.3 Container Registries
Harbor
Harbor Documentation
https://goharbor.io/docs/
Open source container registry with security scanning
Harbor GitHub Repository
https://github.com/goharbor/harbor
Source code and issue tracking
Cloud Provider Registries
AWS Elastic Container Registry (ECR)
https://docs.aws.amazon.com/ecr/
Amazon's container registry service
Azure Container Registry (ACR)
https://docs.microsoft.com/en-us/azure/container-registry/
Microsoft Azure container registry
Google Artifact Registry
https://cloud.google.com/artifact-registry/docs
Google Cloud's artifact management service
JFrog Artifactory
Artifactory Documentation
https://www.jfrog.com/confluence/display/JFROG/JFrog+Artifactory
Universal artifact repository manager
12.4 Base Image Sources
Official Docker Images
Docker Hub Official Images
https://hub.docker.com/search?q=&type=image&image_filter=official
Curated set of Docker repositories
Vendor-Specific Base Images
Red Hat Universal Base Images (UBI)
https://www.redhat.com/en/blog/introducing-red-hat-universal-base-image
Free redistributable container base images
Google Distroless Images
https://github.com/GoogleContainerTools/distroless
Minimal container images from Google
Chainguard Images
https://www.chainguard.dev/chainguard-images
Hardened, minimal container images with daily updates
https://www.chainguard.dev/unchained/why-golden-images-still-matter-and-how-to-secure-them-with-chainguard
White paper on modern golden image strategies
Canonical Ubuntu Images
https://hub.docker.com/_/ubuntu
Official Ubuntu container images
Amazon Linux Container Images
https://gallery.ecr.aws/amazonlinux/amazonlinux
Amazon's Linux distribution for containers
12.5 Industry Case Studies and Best Practices
Netflix
Netflix Open Source
https://netflix.github.io/
Netflix's open source projects and container platform
Titus: Netflix Container Management Platform
https://netflix.github.io/titus/
Documentation for Netflix's container orchestration system
"The Evolution of Container Usage at Netflix"
https://netflixtechblog.com/the-evolution-of-container-usage-at-netflix-3abfc096781b
Netflix Technology Blog article on container adoption
"Titus: Introducing Containers to the Netflix Cloud"
https://queue.acm.org/detail.cfm?id=3158370
ACM Queue article detailing Netflix's container journey
Docker and Platform Engineering
"Building Stronger, Happier Engineering Teams with Team Topologies"
https://www.docker.com/blog/building-stronger-happier-engineering-teams-with-team-topologies/
Docker's approach to organizing engineering teams
Docker Engineering Careers
https://www.docker.com/careers/engineering/
Insights into Docker's engineering team structure
Google Cloud
"Base Images Overview"
https://cloud.google.com/software-supply-chain-security/docs/base-images
Google's approach to base container images
HashiCorp
"Creating a Multi-Cloud Golden Image Pipeline"
https://www.hashicorp.com/en/blog/multicloud-golden-image-pipeline-terraform-cloud-hcp-packer
Enterprise approach to golden image management
Red Hat
"What is a Golden Image?"
https://www.redhat.com/en/topics/linux/what-is-a-golden-image
Comprehensive explanation of golden image concepts
"Automate VM Golden Image Management with OpenShift"
https://developers.redhat.com/articles/2025/06/03/automate-vm-golden-image-management-openshift
Technical implementation of golden image automation
12.6 Platform Engineering Resources
Team Topologies
Team Topologies Website
https://teamtopologies.com/
Framework for organizing business and technology teams
"Team Topologies" by Matthew Skelton and Manuel Pais
Book: https://teamtopologies.com/book
Foundational resource for platform team structure
Platform Engineering Team Structure
"How to Build a Platform Engineering Team" (Spacelift)
https://spacelift.io/blog/how-to-build-a-platform-engineering-team
Guide to building and structuring platform teams
"Platform Engineering Team Structure" (Puppet)
https://www.puppet.com/blog/platform-engineering-teams
DevOps skills and roles for platform engineering
"What is a Platform Engineering Team?" (Harness)
https://www.harness.io/harness-devops-academy/what-is-a-platform-engineering-team
Overview of platform engineering team responsibilities
"Platform Engineering Roles and Responsibilities" (Loft Labs)
https://www.vcluster.com/blog/platform-engineering-roles-and-responsibilities-building-scalable-reliable-and-secure-platform
Detailed breakdown of platform engineering roles
"What Does a Platform Engineer Do?" (Spacelift)
https://spacelift.io/blog/what-is-a-platform-engineer
Role definition and responsibilities
"The Platform Engineer Role Explained" (Splunk)
https://www.splunk.com/en_us/blog/learn/platform-engineer-role-responsibilities.html
Comprehensive guide to platform engineering
12.7 Golden Images and Base Image Management
Concepts and Best Practices
"What is Golden Image?" (NinjaOne)
https://www.ninjaone.com/it-hub/remote-access/what-is-golden-image/
Detailed explanation with NIST references
"A Guide to Golden Images" (SmartDeploy)
https://www.smartdeploy.com/blog/guide-to-golden-images/
Best practices for creating and managing golden images
"What are Golden Images?" (Parallels)
https://www.parallels.com/glossary/golden-images/
Definition and use cases
"What is Golden Image?" (TechTarget)
https://www.techtarget.com/searchitoperations/definition/golden-image
Technical definition and explanation
Implementation Guides
"DevOps Approach to Build Golden Images in AWS"
https://medium.com/@sudhir_thakur/devops-approach-to-build-golden-images-in-aws-part-1-d44588a46d6
Practical implementation guide for AWS environments
"Create an Azure Virtual Desktop Golden Image"
https://learn.microsoft.com/en-us/azure/virtual-desktop/set-up-golden-image
Microsoft's approach to golden images in Azure
12.8 Container Security Research and Analysis
Vulnerability Management
Common Vulnerabilities and Exposures (CVE)
https://cve.mitre.org/
Official CVE database
National Vulnerability Database (NVD)
https://nvd.nist.gov/
U.S. government repository of vulnerability data
Security Scanning Best Practices
"Why Golden Images Still Matter" (Chainguard)
https://www.chainguard.dev/unchained/why-golden-images-still-matter-and-how-to-secure-them-with-chainguard
Modern approach to golden image security and management
12.9 Kubernetes and Container Orchestration
Kubernetes Documentation
Kubernetes Security Best Practices
https://kubernetes.io/docs/concepts/security/
Official Kubernetes security documentation
Pod Security Standards
https://kubernetes.io/docs/concepts/security/pod-security-standards/
Kubernetes pod security policies
Policy Enforcement
Kyverno Documentation
https://kyverno.io/docs/
Kubernetes-native policy management
Open Policy Agent (OPA)
https://www.openpolicyagent.org/docs/latest/
Policy-based control for cloud native environments
Gatekeeper Documentation
https://open-policy-agent.github.io/gatekeeper/website/docs/
OPA constraint framework for Kubernetes
12.10 CI/CD and Automation
GitHub Actions
GitHub Actions Documentation
https://docs.github.com/en/actions
CI/CD automation with GitHub
Aqua Security Trivy Action
https://github.com/aquasecurity/trivy-action
GitHub Action for Trivy scanning
GitLab CI
GitLab CI/CD Documentation
https://docs.gitlab.com/ee/ci/
Continuous integration and delivery with GitLab
Jenkins
Jenkins Documentation
https://www.jenkins.io/doc/
Open source automation server
BuildKit
BuildKit Documentation
https://github.com/moby/buildkit
Concurrent, cache-efficient, and Dockerfile-agnostic builder
12.11 Books and Publications
Container Security
"Container Security" by Liz Rice
O'Reilly Media, 2020
Comprehensive guide to container security fundamentals
"Kubernetes Security and Observability" by Brendan Creane and Amit Gupta
O'Reilly Media, 2021
Security practices for Kubernetes environments
Platform Engineering
"Team Topologies" by Matthew Skelton and Manuel Pais
IT Revolution Press, 2019
Organizing business and technology teams for fast flow
"Building Secure and Reliable Systems" by Google
O'Reilly Media, 2020
Best practices for designing, implementing, and maintaining systems
DevOps and Infrastructure
"The Phoenix Project" by Gene Kim, Kevin Behr, and George Spafford
IT Revolution Press, 2013
Novel about IT, DevOps, and helping your business win
"The DevOps Handbook" by Gene Kim, Jez Humble, Patrick Debois, and John Willis
IT Revolution Press, 2016
How to create world-class agility, reliability, and security
12.12 Community and Forums
Container Community
CNCF Slack
https://slack.cncf.io/
Cloud Native Computing Foundation community discussions
Docker Community Forums
https://forums.docker.com/
Official Docker community support
Kubernetes Slack
https://kubernetes.slack.com/
Kubernetes community discussions
Security Communities
Cloud Native Security Slack
Part of CNCF Slack workspace
Dedicated security discussions
r/kubernetes (Reddit)
https://www.reddit.com/r/kubernetes/
Community discussions and support
r/docker (Reddit)
https://www.reddit.com/r/docker/
Docker community discussions
12.13 Training and Certification
Container Security Training
Kubernetes Security Specialist (CKS)
https://training.linuxfoundation.org/certification/certified-kubernetes-security-specialist/
Official Kubernetes security certification
Docker Certified Associate
https://training.mirantis.com/certification/dca-certification-exam/
Docker platform certification
Cloud Provider Certifications
AWS Certified DevOps Engineer
https://aws.amazon.com/certification/certified-devops-engineer-professional/
AWS DevOps practices and container services
Google Professional Cloud DevOps Engineer
https://cloud.google.com/certification/cloud-devops-engineer
Google Cloud DevOps and container expertise
Microsoft Certified: Azure Solutions Architect Expert
https://docs.microsoft.com/en-us/certifications/azure-solutions-architect/
Azure infrastructure and container services
12.14 Compliance and Regulatory Resources
Compliance Frameworks
SOC 2 Compliance
https://www.aicpa.org/interestareas/frc/assuranceadvisoryservices/aicpasoc2report.html
Service Organization Control 2 reporting
ISO 27001
https://www.iso.org/isoiec-27001-information-security.html
Information security management standard
PCI DSS
https://www.pcisecuritystandards.org/
Payment Card Industry Data Security Standard
GDPR Resources
GDPR Official Text
https://gdpr-info.eu/
General Data Protection Regulation documentation
12.15 Additional Technical Resources
Multi-Platform Builds
Docker Multi-Platform Images
https://docs.docker.com/build/building/multi-platform/
Building images for multiple architectures
Image Optimization
Docker Best Practices
https://docs.docker.com/develop/dev-best-practices/
Official Docker development best practices
Dockerfile Best Practices
https://docs.docker.com/develop/develop-images/dockerfile_best-practices/
Writing efficient and secure Dockerfiles
Container Runtimes
containerd Documentation
https://containerd.io/docs/
Industry-standard container runtime
CRI-O Documentation
https://cri-o.io/
Lightweight container runtime for Kubernetes
13. Document Control
Version History
1.0
Platform Engineering Team
Initial comprehensive policy release with technical details and implementation guidance
Review and Approval
Platform Engineering Lead
Security Team Lead
Chief Information Security Officer
Review Schedule
This policy will be reviewed and updated:
Quarterly Review: Technical standards and tool recommendations
Annual Review: Complete policy review including governance and processes
Event-Driven Review: When significant security incidents occur or new threats emerge
Next Scheduled Review:
This document represents the current state of container security best practices and will evolve as technologies and threats change.
Last updated