Mastering MCP · Day 24 of 30

Docker & Container Patterns for MCP Servers

Build portable, production-ready MCP containers with multi-stage Docker builds, local multi-server Compose stacks, graceful shutdown handlers, secrets injection strategies, and cross-architecture ARM64/AMD64 builds for AWS Graviton.

📅 Day 24
⏳ ~32 min read
🎯 Level ASCEND
🚀 Phase Containerization
Table of Contents

Why Containers Matter for MCP Portability

An MCP server is only valuable if it runs identically on your laptop, in CI, and in production. Containers solve the "works on my machine" problem by packaging your Python runtime, dependencies, configuration, and code into a single immutable image that behaves identically everywhere.

MCP servers have specific containerization requirements that differ from typical web services. They often depend on exact Python versions for async behavior, system libraries for PDF parsing or image processing, non-Python binaries like ffmpeg or git, and careful network configuration for the SSE transport. Without containers, managing these dependencies across developer laptops, CI runners, and production infrastructure is a maintenance nightmare.

Containers also give you three operational superpowers: immutability (the same image runs in staging and production — no configuration drift), rollback (any previous image tag can be re-deployed in seconds), and horizontal scaling (ECS simply launches more copies of the exact same container when load increases). These properties are especially valuable for MCP servers because a bad deployment can break all agent workflows downstream.

📦
Immutable Artifacts
Every git commit produces an ECR image tagged with the commit SHA. Production always runs a specific, auditable image — never a mutable latest in production.
Best Practice
⚙️
Dependency Isolation
System libraries, Python packages, and native binaries are bundled in the image. No more pip install races on the production host or conflicting package versions.
Reliability
🔄
Local = Production
Developers run the exact same container image locally that ships to ECS. Integration bugs surface during development, not during a production incident at 3 AM.
DX
ℹ️
Container image scanning: Enable ECR image scanning on push. AWS Inspector automatically scans for OS and package vulnerabilities. Configure a CodePipeline gate that fails the build if any CRITICAL CVEs are found — preventing known-vulnerable images from reaching production.

Multi-Stage Docker Builds: Builder + Distroless Runtime

A naive single-stage Dockerfile that installs build tools, compiles dependencies, and copies your source will produce a 1–2 GB image. Multi-stage builds let you use a fat builder image for compilation and package installation, then copy only the runtime artifacts into a minimal distroless image — reducing the attack surface and image size by 80–90%.

The pattern has two stages. The builder stage uses python:3.12-slim with all build tools installed — gcc, git, build-essential — and installs your Python dependencies into a virtual environment at /app/.venv. The runtime stage copies only the venv and your application source code into a minimal base image. Google's distroless Python image (gcr.io/distroless/python3-debian12) has no shell, no package manager, and no utilities — significantly reducing the exploitable surface area.

Dockerfile — multi-stage MCP server build# ── Stage 1: Builder ────────────────────────────────────────────────
FROM python:3.12-slim AS builder

WORKDIR /app

# Install build dependencies (only in builder stage)
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    git \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create isolated virtual environment
RUN python -m venv /app/.venv
ENV PATH="/app/.venv/bin:$PATH"

# Install Python dependencies first (cache layer)
COPY requirements.txt .
RUN pip install --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# Copy application source
COPY src/ ./src/
COPY pyproject.toml .

# ── Stage 2: Runtime (distroless) ───────────────────────────────────
FROM gcr.io/distroless/python3-debian12 AS runtime

WORKDIR /app

# Copy ONLY the venv and application from builder
COPY --from=builder /app/.venv /app/.venv
COPY --from=builder /app/src ./src

# Non-root user for security (distroless provides nonroot at UID 65532)
USER nonroot

ENV PATH="/app/.venv/bin:$PATH" \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    PYTHONPATH=/app

EXPOSE 8080

# Health check baked into the image
HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \
  CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"]

ENTRYPOINT ["/app/.venv/bin/python", "-m", "src.server"]
💡
Layer caching trick: Always copy requirements.txt and install dependencies before copying your source code. Docker caches each layer — if only your source code changes, pip install is skipped entirely. This reduces a 3-minute build to under 20 seconds for code-only changes.

For MCP servers that need system utilities unavailable in distroless (such as git for a GitHub MCP tool or ffmpeg for media processing), use python:3.12-slim as the runtime instead. Strip unnecessary packages from the runtime stage explicitly with apt-get purge and always run as a non-root user.

Shell — compare image sizes# Build both targets and compare
docker buildx build --target builder -t mcp-builder . --load
docker buildx build --target runtime -t mcp-runtime . --load

docker images | grep mcp
# mcp-builder   latest  a3f...  1.2GB   (with build tools)
# mcp-runtime   latest  b7c...  98MB    (distroless, 92% smaller)

# Inspect runtime image — no shell, no bash, minimal attack surface
docker run --rm mcp-runtime sh
# Error: "exec: \"sh\": executable file not found in $PATH"

Docker Compose for Local Multi-Server MCP Development

Real-world MCP environments involve multiple servers collaborating — a documents server, a database server, a knowledge graph server — plus shared infrastructure like Redis for session state. Docker Compose lets you define this entire local stack in one YAML file and bring it up with a single command.

The Compose file below defines three MCP servers (documents, database, and search) alongside a shared Redis instance for distributed session state. Each MCP server is built from its own Dockerfile in a subdirectory. They all share an internal mcp-net bridge network, accessible to each other by service name. Redis is used to share MCP session state across restarts — when the documents server restarts, sessions stored in Redis are preserved.

YAML — docker-compose.yml (3 MCP servers + Redis)version: "3.9"

services:

  # ── Shared infrastructure ──────────────────────────────────────────
  redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --save ""
    ports:
      - "6379:6379"
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 5s
      retries: 3
    networks: [mcp-net]

  # ── MCP Server 1: Documents ───────────────────────────────────────
  mcp-documents:
    build:
      context: ./servers/documents
      target: runtime
    ports:
      - "8081:8080"
    environment:
      MCP_SERVER_NAME: documents-server
      REDIS_URL: redis://redis:6379/0
      S3_BUCKET: ${DOCUMENTS_BUCKET}
      LOG_LEVEL: INFO
    env_file: [.env]
    depends_on:
      redis: { condition: service_healthy }
    healthcheck:
      test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/health')\""]
      interval: 30s
      timeout: 10s
      start_period: 15s
      retries: 3
    volumes:
      - ./servers/documents/src:/app/src:ro  # hot-reload in dev
    networks: [mcp-net]
    restart: unless-stopped

  # ── MCP Server 2: Database ────────────────────────────────────────
  mcp-database:
    build:
      context: ./servers/database
      target: runtime
    ports:
      - "8082:8080"
    environment:
      MCP_SERVER_NAME: database-server
      REDIS_URL: redis://redis:6379/1
      DB_CONNECTION_STRING: ${DB_CONNECTION_STRING}
    env_file: [.env]
    depends_on:
      redis: { condition: service_healthy }
    networks: [mcp-net]
    restart: unless-stopped

  # ── MCP Server 3: Search ──────────────────────────────────────────
  mcp-search:
    build:
      context: ./servers/search
      target: runtime
    ports:
      - "8083:8080"
    environment:
      MCP_SERVER_NAME: search-server
      REDIS_URL: redis://redis:6379/2
      OPENSEARCH_URL: ${OPENSEARCH_URL}
    env_file: [.env]
    depends_on:
      redis: { condition: service_healthy }
    networks: [mcp-net]
    restart: unless-stopped

networks:
  mcp-net:
    driver: bridge

The volume mount ./servers/documents/src:/app/src:ro enables hot-reload during development — changes to your Python source are immediately available in the container without a rebuild. For production builds, this volume is omitted and the source is baked into the image at build time.

Container Health Checks and Graceful Shutdown in MCP

Health checks tell the container orchestrator whether your MCP server is ready to accept connections. Graceful shutdown ensures that in-flight tool calls complete before the container exits — preventing corrupted tool responses during rolling deployments.

Docker and ECS both support health checks. The check runs a command inside the container at a regular interval; if it exits non-zero, the container is marked unhealthy and potentially replaced. For MCP servers, a two-tier health check is ideal: a shallow /health endpoint that always returns 200 quickly (for liveness), and a deeper /ready endpoint that verifies database connectivity and cache availability (for readiness).

Python — /health and /ready endpoints in FastMCPfrom fastmcp import FastMCP
from starlette.requests import Request
from starlette.responses import JSONResponse
import asyncio, redis.asyncio as aioredis, time, signal, sys

mcp = FastMCP("my-server")
_startup_time = time.time()
_redis: aioredis.Redis | None = None

# ── Liveness: always fast ─────────────────────────────────────────
async def health(request: Request) -> JSONResponse:
    return JSONResponse({
        "status": "healthy",
        "uptime_seconds": round(time.time() - _startup_time),
    })

# ── Readiness: checks dependencies ────────────────────────────────
async def ready(request: Request) -> JSONResponse:
    checks = {}
    ok = True
    try:
        await _redis.ping()
        checks["redis"] = "ok"
    except Exception as e:
        checks["redis"] = str(e)
        ok = False
    status = 200 if ok else 503
    return JSONResponse({"ready": ok, "checks": checks}, status_code=status)

# ── Graceful shutdown handler ─────────────────────────────────────
_shutdown_event = asyncio.Event()

def handle_sigterm(*_):
    print("SIGTERM received — initiating graceful shutdown", flush=True)
    _shutdown_event.set()

signal.signal(signal.SIGTERM, handle_sigterm)
signal.signal(signal.SIGINT, handle_sigterm)

async def graceful_shutdown_task():
    """Wait for SIGTERM then drain in-flight requests."""
    await _shutdown_event.wait()
    print("Draining in-flight requests (30s max)...", flush=True)
    await asyncio.sleep(30)  # allow active SSE clients to disconnect
    sys.exit(0)
⚠️
SIGTERM vs SIGKILL: ECS sends SIGTERM first, then SIGKILL after the stopTimeout (default 30 seconds). Your MCP server must catch SIGTERM and drain active sessions before the hard kill. If you don't handle SIGTERM, active tool calls will be abruptly terminated mid-execution — clients will receive connection-reset errors instead of proper tool responses.

Secrets Injection: Env Vars vs Mounted Secrets vs AWS Secrets Manager Sidecar

MCP servers routinely handle sensitive credentials — database passwords, API keys, signing certificates. How you inject secrets into containers has major security implications. There are three patterns with distinct trade-offs.

MethodSecurityRotationComplexityBest For
Environment variables Low — visible in docker inspect, process env Requires container restart Minimal Non-sensitive config, dev/local
Docker secrets (Compose) Medium — tmpfs mount, not in env Requires container restart Low Local dev with sensitive data
AWS Secrets Manager (ECS native) High — IAM-controlled, encrypted at rest Automatic via ECS secret injection Medium Production ECS deployments
AWS Secrets Manager sidecar Very High — hot rotation, no restart Live rotation without container restart High High-security + 24/7 availability
JSON — ECS task definition with Secrets Manager injection{
  "containerDefinitions": [{
    "name": "mcp-server",
    "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest",
    // Plain config as environment variables (non-sensitive)
    "environment": [
      { "name": "LOG_LEVEL", "value": "INFO" },
      { "name": "MCP_SERVER_NAME", "value": "documents-server" }
    ],
    // Secrets injected from AWS Secrets Manager at task start
    "secrets": [
      {
        "name": "DB_PASSWORD",
        "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/mcp/db-password"
      },
      {
        "name": "OPENAI_API_KEY",
        "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/mcp/openai-key:api_key::"
      }
    ]
  }]
}
💡
Sidecar hot rotation pattern: Run the AWS Secrets Manager Agent as a sidecar container. It caches secrets locally and refreshes them automatically when rotation occurs. Your MCP server reads secrets from the sidecar's local HTTP endpoint rather than injected env vars — changes take effect without restarting the container. See the Secrets Manager Agent docs for the sidecar configuration.

ARM64 vs AMD64 — Graviton Performance & Cost Comparison

AWS Graviton3 (ARM64) processors offer 40% better price-performance than equivalent x86 (AMD64) instances for compute-intensive workloads. For MCP servers running on ECS Fargate, choosing the right architecture can meaningfully reduce your AWS bill.

Graviton3 CPUs use 60% less energy than comparable x86 processors and deliver better throughput for I/O-bound workloads — which is exactly what MCP servers are. Most Python packages have ARM64 wheels available on PyPI, and AWS base images have ARM64 variants, making the switch straightforward with Docker Buildx multi-platform builds.

MetricAMD64 (x86_64)ARM64 (Graviton3)Difference
Fargate vCPU price (us-east-1)$0.04048 / vCPU-hr$0.03238 / vCPU-hr-20% ARM64
Fargate memory price$0.004445 / GB-hr$0.00356 / GB-hr-20% ARM64
MCP throughput (req/s/vCPU)~420 req/s~590 req/s+40% ARM64
p99 tool call latency~18 ms~13 ms-28% ARM64
Cold start (container pull)~8 s~9 s~equal
Python package compatibilityUniversalMost packages (check PyPI)Minor check needed
Shell — multi-platform build with buildx (AMD64 + ARM64)# Create multi-platform builder
docker buildx create --name multi-builder --use --bootstrap

# Build and push both architectures to ECR in one command
docker buildx build \
  --platform linux/amd64,linux/arm64 \
  --target runtime \
  --tag 123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest \
  --tag 123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:$(git rev-parse --short HEAD) \
  --push \
  .

# ECR creates a manifest list — ECS picks the right arch automatically
# To target Graviton in ECS task definition, set:
# "runtimePlatform": {"cpuArchitecture": "ARM64", "operatingSystemFamily": "LINUX"}
Knowledge Check
4 questions · instant feedback · Docker & Containers checkpoint
1. What is the primary security benefit of using a distroless base image for the runtime stage of a multi-stage MCP server build?
2. In the Docker Compose multi-server stack, what does depends_on: redis: condition: service_healthy guarantee?
3. Why must an MCP server container handle SIGTERM and implement a graceful shutdown period before exiting?
4. Which secrets injection method allows live credential rotation for an ECS MCP server without requiring a container restart?
out of 4 correct —