Build portable, production-ready MCP containers with multi-stage Docker builds, local multi-server Compose stacks, graceful shutdown handlers, secrets injection strategies, and cross-architecture ARM64/AMD64 builds for AWS Graviton.
An MCP server is only valuable if it runs identically on your laptop, in CI, and in production. Containers solve the "works on my machine" problem by packaging your Python runtime, dependencies, configuration, and code into a single immutable image that behaves identically everywhere.
MCP servers have specific containerization requirements that differ from typical web services. They often depend on exact Python versions for async behavior, system libraries for PDF parsing or image processing, non-Python binaries like ffmpeg or git, and careful network configuration for the SSE transport. Without containers, managing these dependencies across developer laptops, CI runners, and production infrastructure is a maintenance nightmare.
Containers also give you three operational superpowers: immutability (the same image runs in staging and production — no configuration drift), rollback (any previous image tag can be re-deployed in seconds), and horizontal scaling (ECS simply launches more copies of the exact same container when load increases). These properties are especially valuable for MCP servers because a bad deployment can break all agent workflows downstream.
latest in production.pip install races on the production host or conflicting package versions.A naive single-stage Dockerfile that installs build tools, compiles dependencies, and copies your source will produce a 1–2 GB image. Multi-stage builds let you use a fat builder image for compilation and package installation, then copy only the runtime artifacts into a minimal distroless image — reducing the attack surface and image size by 80–90%.
The pattern has two stages. The builder stage uses python:3.12-slim with all build tools installed — gcc, git, build-essential — and installs your Python dependencies into a virtual environment at /app/.venv. The runtime stage copies only the venv and your application source code into a minimal base image. Google's distroless Python image (gcr.io/distroless/python3-debian12) has no shell, no package manager, and no utilities — significantly reducing the exploitable surface area.
Dockerfile — multi-stage MCP server build# ── Stage 1: Builder ──────────────────────────────────────────────── FROM python:3.12-slim AS builder WORKDIR /app # Install build dependencies (only in builder stage) RUN apt-get update && apt-get install -y --no-install-recommends \ build-essential \ git \ curl \ && rm -rf /var/lib/apt/lists/* # Create isolated virtual environment RUN python -m venv /app/.venv ENV PATH="/app/.venv/bin:$PATH" # Install Python dependencies first (cache layer) COPY requirements.txt . RUN pip install --upgrade pip && \ pip install --no-cache-dir -r requirements.txt # Copy application source COPY src/ ./src/ COPY pyproject.toml . # ── Stage 2: Runtime (distroless) ─────────────────────────────────── FROM gcr.io/distroless/python3-debian12 AS runtime WORKDIR /app # Copy ONLY the venv and application from builder COPY --from=builder /app/.venv /app/.venv COPY --from=builder /app/src ./src # Non-root user for security (distroless provides nonroot at UID 65532) USER nonroot ENV PATH="/app/.venv/bin:$PATH" \ PYTHONUNBUFFERED=1 \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONPATH=/app EXPOSE 8080 # Health check baked into the image HEALTHCHECK --interval=30s --timeout=10s --start-period=15s --retries=3 \ CMD ["python", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"] ENTRYPOINT ["/app/.venv/bin/python", "-m", "src.server"]
requirements.txt and install dependencies before copying your source code. Docker caches each layer — if only your source code changes, pip install is skipped entirely. This reduces a 3-minute build to under 20 seconds for code-only changes.For MCP servers that need system utilities unavailable in distroless (such as git for a GitHub MCP tool or ffmpeg for media processing), use python:3.12-slim as the runtime instead. Strip unnecessary packages from the runtime stage explicitly with apt-get purge and always run as a non-root user.
Shell — compare image sizes# Build both targets and compare docker buildx build --target builder -t mcp-builder . --load docker buildx build --target runtime -t mcp-runtime . --load docker images | grep mcp # mcp-builder latest a3f... 1.2GB (with build tools) # mcp-runtime latest b7c... 98MB (distroless, 92% smaller) # Inspect runtime image — no shell, no bash, minimal attack surface docker run --rm mcp-runtime sh # Error: "exec: \"sh\": executable file not found in $PATH"
Real-world MCP environments involve multiple servers collaborating — a documents server, a database server, a knowledge graph server — plus shared infrastructure like Redis for session state. Docker Compose lets you define this entire local stack in one YAML file and bring it up with a single command.
The Compose file below defines three MCP servers (documents, database, and search) alongside a shared Redis instance for distributed session state. Each MCP server is built from its own Dockerfile in a subdirectory. They all share an internal mcp-net bridge network, accessible to each other by service name. Redis is used to share MCP session state across restarts — when the documents server restarts, sessions stored in Redis are preserved.
YAML — docker-compose.yml (3 MCP servers + Redis)version: "3.9" services: # ── Shared infrastructure ────────────────────────────────────────── redis: image: redis:7-alpine command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru --save "" ports: - "6379:6379" healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 10s timeout: 5s retries: 3 networks: [mcp-net] # ── MCP Server 1: Documents ─────────────────────────────────────── mcp-documents: build: context: ./servers/documents target: runtime ports: - "8081:8080" environment: MCP_SERVER_NAME: documents-server REDIS_URL: redis://redis:6379/0 S3_BUCKET: ${DOCUMENTS_BUCKET} LOG_LEVEL: INFO env_file: [.env] depends_on: redis: { condition: service_healthy } healthcheck: test: ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/health')\""] interval: 30s timeout: 10s start_period: 15s retries: 3 volumes: - ./servers/documents/src:/app/src:ro # hot-reload in dev networks: [mcp-net] restart: unless-stopped # ── MCP Server 2: Database ──────────────────────────────────────── mcp-database: build: context: ./servers/database target: runtime ports: - "8082:8080" environment: MCP_SERVER_NAME: database-server REDIS_URL: redis://redis:6379/1 DB_CONNECTION_STRING: ${DB_CONNECTION_STRING} env_file: [.env] depends_on: redis: { condition: service_healthy } networks: [mcp-net] restart: unless-stopped # ── MCP Server 3: Search ────────────────────────────────────────── mcp-search: build: context: ./servers/search target: runtime ports: - "8083:8080" environment: MCP_SERVER_NAME: search-server REDIS_URL: redis://redis:6379/2 OPENSEARCH_URL: ${OPENSEARCH_URL} env_file: [.env] depends_on: redis: { condition: service_healthy } networks: [mcp-net] restart: unless-stopped networks: mcp-net: driver: bridge
The volume mount ./servers/documents/src:/app/src:ro enables hot-reload during development — changes to your Python source are immediately available in the container without a rebuild. For production builds, this volume is omitted and the source is baked into the image at build time.
Health checks tell the container orchestrator whether your MCP server is ready to accept connections. Graceful shutdown ensures that in-flight tool calls complete before the container exits — preventing corrupted tool responses during rolling deployments.
Docker and ECS both support health checks. The check runs a command inside the container at a regular interval; if it exits non-zero, the container is marked unhealthy and potentially replaced. For MCP servers, a two-tier health check is ideal: a shallow /health endpoint that always returns 200 quickly (for liveness), and a deeper /ready endpoint that verifies database connectivity and cache availability (for readiness).
Python — /health and /ready endpoints in FastMCPfrom fastmcp import FastMCP from starlette.requests import Request from starlette.responses import JSONResponse import asyncio, redis.asyncio as aioredis, time, signal, sys mcp = FastMCP("my-server") _startup_time = time.time() _redis: aioredis.Redis | None = None # ── Liveness: always fast ───────────────────────────────────────── async def health(request: Request) -> JSONResponse: return JSONResponse({ "status": "healthy", "uptime_seconds": round(time.time() - _startup_time), }) # ── Readiness: checks dependencies ──────────────────────────────── async def ready(request: Request) -> JSONResponse: checks = {} ok = True try: await _redis.ping() checks["redis"] = "ok" except Exception as e: checks["redis"] = str(e) ok = False status = 200 if ok else 503 return JSONResponse({"ready": ok, "checks": checks}, status_code=status) # ── Graceful shutdown handler ───────────────────────────────────── _shutdown_event = asyncio.Event() def handle_sigterm(*_): print("SIGTERM received — initiating graceful shutdown", flush=True) _shutdown_event.set() signal.signal(signal.SIGTERM, handle_sigterm) signal.signal(signal.SIGINT, handle_sigterm) async def graceful_shutdown_task(): """Wait for SIGTERM then drain in-flight requests.""" await _shutdown_event.wait() print("Draining in-flight requests (30s max)...", flush=True) await asyncio.sleep(30) # allow active SSE clients to disconnect sys.exit(0)
stopTimeout (default 30 seconds). Your MCP server must catch SIGTERM and drain active sessions before the hard kill. If you don't handle SIGTERM, active tool calls will be abruptly terminated mid-execution — clients will receive connection-reset errors instead of proper tool responses.MCP servers routinely handle sensitive credentials — database passwords, API keys, signing certificates. How you inject secrets into containers has major security implications. There are three patterns with distinct trade-offs.
| Method | Security | Rotation | Complexity | Best For |
|---|---|---|---|---|
| Environment variables | Low — visible in docker inspect, process env |
Requires container restart | Minimal | Non-sensitive config, dev/local |
| Docker secrets (Compose) | Medium — tmpfs mount, not in env | Requires container restart | Low | Local dev with sensitive data |
| AWS Secrets Manager (ECS native) | High — IAM-controlled, encrypted at rest | Automatic via ECS secret injection | Medium | Production ECS deployments |
| AWS Secrets Manager sidecar | Very High — hot rotation, no restart | Live rotation without container restart | High | High-security + 24/7 availability |
JSON — ECS task definition with Secrets Manager injection{ "containerDefinitions": [{ "name": "mcp-server", "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest", // Plain config as environment variables (non-sensitive) "environment": [ { "name": "LOG_LEVEL", "value": "INFO" }, { "name": "MCP_SERVER_NAME", "value": "documents-server" } ], // Secrets injected from AWS Secrets Manager at task start "secrets": [ { "name": "DB_PASSWORD", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/mcp/db-password" }, { "name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:prod/mcp/openai-key:api_key::" } ] }] }
AWS Graviton3 (ARM64) processors offer 40% better price-performance than equivalent x86 (AMD64) instances for compute-intensive workloads. For MCP servers running on ECS Fargate, choosing the right architecture can meaningfully reduce your AWS bill.
Graviton3 CPUs use 60% less energy than comparable x86 processors and deliver better throughput for I/O-bound workloads — which is exactly what MCP servers are. Most Python packages have ARM64 wheels available on PyPI, and AWS base images have ARM64 variants, making the switch straightforward with Docker Buildx multi-platform builds.
| Metric | AMD64 (x86_64) | ARM64 (Graviton3) | Difference |
|---|---|---|---|
| Fargate vCPU price (us-east-1) | $0.04048 / vCPU-hr | $0.03238 / vCPU-hr | -20% ARM64 |
| Fargate memory price | $0.004445 / GB-hr | $0.00356 / GB-hr | -20% ARM64 |
| MCP throughput (req/s/vCPU) | ~420 req/s | ~590 req/s | +40% ARM64 |
| p99 tool call latency | ~18 ms | ~13 ms | -28% ARM64 |
| Cold start (container pull) | ~8 s | ~9 s | ~equal |
| Python package compatibility | Universal | Most packages (check PyPI) | Minor check needed |
Shell — multi-platform build with buildx (AMD64 + ARM64)# Create multi-platform builder docker buildx create --name multi-builder --use --bootstrap # Build and push both architectures to ECR in one command docker buildx build \ --platform linux/amd64,linux/arm64 \ --target runtime \ --tag 123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest \ --tag 123456789.dkr.ecr.us-east-1.amazonaws.com/mcp-server:$(git rev-parse --short HEAD) \ --push \ . # ECR creates a manifest list — ECS picks the right arch automatically # To target Graviton in ECS task definition, set: # "runtimePlatform": {"cpuArchitecture": "ARM64", "operatingSystemFamily": "LINUX"}
depends_on: redis: condition: service_healthy guarantee?