Day 23 — Production MCP Deployment on AWS

§ 01 — Architecture Decision

ECS Fargate vs AWS Lambda

Before writing a single line of infrastructure code, you must pick your compute substrate. MCP servers have unique characteristics — long-lived SSE connections, stateful session maps, and streaming responses — that make this choice non-trivial. Lambda's 15-minute timeout, cold start penalty, and lack of native SSE support make ECS Fargate the overwhelmingly better choice for most production MCP workloads.

The MCP Streamable HTTP transport (the production-grade transport introduced to replace the legacy SSE transport) maintains a persistent HTTP connection per session. The client opens a connection, the server streams events over it, and both sides keep the TCP socket alive for the duration of the session. AWS Lambda's execution model is fundamentally at odds with this: each Lambda invocation is an independent function call with its own execution context, and Lambda does not support streaming responses in the traditional SSE sense without additional complexity via Lambda Function URLs with RESPONSE_STREAM.

ECS Fargate, by contrast, runs your containerized server as a long-lived process. A single Fargate task can hold thousands of concurrent SSE connections in memory, maintain session maps across requests, and stream data without latency from cold start penalties. The task stays warm between requests — there is no cold start after the initial container pull.

Lambda does make sense for a specific niche: MCP servers that use only request/response tool calls with no streaming and no SSE, where the stateless nature of Lambda is an asset rather than a liability. If your MCP server only exposes tools (no streaming resources, no long-running subscriptions), and your traffic pattern is bursty with long idle periods, Lambda can deliver significant cost savings. But this is the exception, not the rule.

Dimension	ECS Fargate	AWS Lambda	Winner
Cold Start	Container pull: 5–15 s (once per task)	Init: 200 ms–3 s per invocation	Fargate (warm always)
Cost Model	Per-second, min 1 min. ~$0.04/vCPU-hr	Per-request + per-GB-s. Free tier 1M req/mo	Lambda for low volume
Max Duration	Unlimited (process lifecycle)	15 minutes hard limit	Fargate
SSE Support	Native — persistent TCP connection	Requires Function URL + RESPONSE_STREAM	Fargate
Concurrency	Thousands of connections per task	1 request per execution context	Fargate for SSE workloads
Scaling Speed	New task: ~60–90 s to become healthy	Near-instant (provisioned concurrency available)	Lambda for spiky traffic
Ops Overhead	Dockerfile, task def, service, ALB, SG	Just a ZIP / container image	Lambda for simplicity

🔧

Choose ECS Fargate When...

Your MCP server uses SSE or Streamable HTTP transport; sessions last longer than 15 minutes; you need in-memory session state shared across requests; your server handles many concurrent connections; you need zero cold-start latency for interactive AI experiences.

Recommended for most MCP servers

Choose Lambda When...

Your MCP server is purely request/response (tools only, no streaming); traffic is bursty with long idle periods; you want to avoid container management overhead; the 15-minute limit is acceptable; cost per request is the primary driver.

Stateless tools-only servers

ℹ️

Hybrid architecture: Some teams deploy a Lambda "router" that handles MCP session initialization and short-lived tool calls, while routing long-lived SSE connections to Fargate tasks. This adds complexity but can optimize cost for workloads where the majority of requests are short tool calls with occasional long streaming sessions.

§ 02 — Dockerizing Your MCP Server

Multi-Stage Build for Production Images

A production Docker image for an MCP server should be minimal, reproducible, and secure. Multi-stage builds let you install build-time dependencies (compilers, dev tools) in the first stage and copy only the resulting artefacts into a lean runtime stage, dramatically reducing image size and attack surface.

For Python MCP servers built with fastmcp, the approach is: install dependencies into a virtual environment in the build stage, then copy the venv into a clean python:3.12-slim image. The runtime image has no pip, no build tools, and no unnecessary packages — just Python, your venv, and your application code. This typically produces images of 120–180 MB vs 600+ MB for naive single-stage builds.

The .dockerignore file is as important as the Dockerfile. Every file you exclude from the build context reduces the time Docker spends hashing and uploading files to the daemon. More importantly, it prevents secrets (.env files, credentials, SSH keys) from accidentally landing in your image layers where they can be extracted with docker history.

Run your container as a non-root user. The example below creates a mcpuser with UID 1001. This is a mandatory security control for any regulated environment and a strong best practice everywhere else. ECS Fargate supports non-root containers natively — there is no performance penalty.

Set PYTHONDONTWRITEBYTECODE=1 and PYTHONUNBUFFERED=1 as environment variables in the Dockerfile. The first prevents Python from writing .pyc bytecode files into the container filesystem (saves space and avoids permission issues in read-only filesystems). The second forces stdout and stderr to be unbuffered, which is critical for CloudWatch Logs to capture log lines in real time rather than after a buffer flush.

Dockerfile — multi-stage production build# ── Stage 1: Build ─────────────────────────────────────────────────────
FROM python:3.12-slim AS builder

WORKDIR /build

# Install system build deps
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential curl && \
    rm -rf /var/lib/apt/lists/*

# Create isolated venv
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Install dependencies first (layer cache)
COPY requirements.txt .
RUN pip install --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# Copy application source
COPY src/ ./src/

# ── Stage 2: Runtime ───────────────────────────────────────────────────
FROM python:3.12-slim AS runtime

LABEL org.opencontainers.image.source="https://github.com/your-org/mcp-server"
LABEL org.opencontainers.image.description="Production MCP Server"

# Security: run as non-root
RUN groupadd -g 1001 mcpgroup && \
    useradd -u 1001 -g mcpgroup -s /bin/sh -m mcpuser

# Copy venv from builder
COPY --from=builder /opt/venv /opt/venv
COPY --from=builder /build/src /app/src

WORKDIR /app
USER mcpuser

ENV PATH="/opt/venv/bin:$PATH"
ENV PYTHONDONTWRITEBYTECODE="1"
ENV PYTHONUNBUFFERED="1"
ENV PORT="8080"
ENV HOST="0.0.0.0"

EXPOSE 8080

# Health check for ECS
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8080/health')"

ENTRYPOINT ["python", "-m", "src.server"]

.dockerignore# Version control
.git
.gitignore
.gitattributes

# Python artifacts
__pycache__
*.pyc
*.pyo
*.pyd
.Python
*.egg-info
dist/
build/
.eggs/

# Virtual environments
.venv
venv
env

# Secrets — NEVER in image layers
.env
.env.*
*.pem
*.key
secrets/
credentials/

# Testing / CI
tests/
.pytest_cache/
.coverage
htmlcov/
.tox/

# IDE / editor
.vscode/
.idea/
*.swp
*.swo

# Documentation
docs/
*.md
LICENSE

# Docker itself
Dockerfile*
docker-compose*

Shell — build and push to ECR#!/bin/bash
# Set your values
AWS_ACCOUNT_ID="123456789012"
AWS_REGION="us-east-1"
REPO_NAME="mcp-server"
IMAGE_TAG="$(git rev-parse --short HEAD)"
ECR_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com/${REPO_NAME}"

# Authenticate Docker to ECR
aws ecr get-login-password --region $AWS_REGION | \
  docker login --username AWS --password-stdin \
  "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_REGION}.amazonaws.com"

# Multi-platform build (ARM64 for Fargate Graviton cost savings)
docker buildx build \
  --platform linux/arm64 \
  --target runtime \
  --tag "${ECR_URI}:${IMAGE_TAG}" \
  --tag "${ECR_URI}:latest" \
  --push \
  .

# Scan image for vulnerabilities before deploy
aws ecr start-image-scan \
  --repository-name $REPO_NAME \
  --image-id imageTag=$IMAGE_TAG

echo "Pushed: ${ECR_URI}:${IMAGE_TAG}"

💡

Use Graviton (ARM64) for Fargate tasks. Graviton2/3 processors offer up to 20% cost savings vs x86 Fargate tasks with comparable or better performance for Python workloads. Build with --platform linux/arm64 and set runtimePlatform.cpuArchitecture: ARM64 in your task definition. Your fastmcp Python code runs identically on both architectures.

§ 03 — ECS Fargate Task & Service Definition

Task Definition, Service & Service Connect

An ECS task definition is the blueprint for your container: CPU, memory, image, environment variables, log configuration, and health checks. The ECS service is the long-running manager that ensures the desired number of tasks are always running, replaces unhealthy tasks, and integrates with the ALB for traffic routing.

For MCP servers, allocate at least 0.5 vCPU and 1 GB memory per task to start. Each SSE connection holds a small in-memory session object, but the primary memory pressure comes from your tool implementations — if any tool loads large datasets or caches responses, size accordingly. A common production starting point is 1 vCPU / 2 GB, scaling out horizontally rather than vertically. Use Application Auto Scaling with a target tracking policy on ECS service CPU utilization (target 60%) to add or remove tasks automatically.

Secrets management is critical. Never bake credentials (database passwords, API keys, OpenAI keys) into your Docker image or task definition environment variables in plaintext. Use secrets in the task definition pointing to AWS Secrets Manager ARNs or SSM Parameter Store SecureString parameters. ECS injects these as environment variables at task launch time, and they never appear in CloudWatch Logs or the ECS console in plaintext.

Service Connect (the modern replacement for Service Discovery) enables ECS services in the same namespace to call each other by a short DNS name without going through the ALB. This is useful if your MCP server calls other internal microservices — configure those as Service Connect clients and your MCP server as both client and server in the same namespace.

JSON — ECS Task Definition (register-task-definition){
  "family": "mcp-server",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "runtimePlatform": {
    "cpuArchitecture": "ARM64",
    "operatingSystemFamily": "LINUX"
  },
  "executionRoleArn": "arn:aws:iam::123456789012:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789012:role/mcp-server-task-role",
  "containerDefinitions": [
    {
      "name": "mcp-server",
      "image": "123456789012.dkr.ecr.us-east-1.amazonaws.com/mcp-server:latest",
      "portMappings": [
        {
          "containerPort": 8080,
          "hostPort": 8080,
          "protocol": "tcp",
          "name": "mcp-http",
          "appProtocol": "http"
        }
      ],
      "environment": [
        { "name": "PORT", "value": "8080" },
        { "name": "HOST", "value": "0.0.0.0" },
        { "name": "LOG_LEVEL", "value": "INFO" },
        { "name": "AWS_DEFAULT_REGION", "value": "us-east-1" }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789012:secret:mcp/database-url"
        },
        {
          "name": "OPENAI_API_KEY",
          "valueFrom": "arn:aws:ssm:us-east-1:123456789012:parameter/mcp/openai-api-key"
        }
      ],
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/mcp-server",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs",
          "awslogs-create-group": "true"
        }
      },
      "healthCheck": {
        "command": ["CMD-SHELL", "python -c \"import urllib.request; urllib.request.urlopen('http://localhost:8080/health')\""],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 15
      },
      "readonlyRootFilesystem": true,
      "linuxParameters": {
        "initProcessEnabled": true
      }
    }
  ]
}

JSON — ECS Service Definition (create-service){
  "cluster": "mcp-production",
  "serviceName": "mcp-server-svc",
  "taskDefinition": "mcp-server",
  "desiredCount": 2,
  "launchType": "FARGATE",
  "deploymentController": {
    "type": "CODE_DEPLOY"
  },
  "networkConfiguration": {
    "awsvpcConfiguration": {
      "subnets": [
        "subnet-0a1b2c3d4e5f60001",
        "subnet-0a1b2c3d4e5f60002"
      ],
      "securityGroups": ["sg-0mcp1234567890abc"],
      "assignPublicIp": "DISABLED"
    }
  },
  "loadBalancers": [
    {
      "targetGroupArn": "arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/mcp-blue/abc123",
      "containerName": "mcp-server",
      "containerPort": 8080
    }
  ],
  "serviceConnectConfiguration": {
    "enabled": true,
    "namespace": "mcp-production.local",
    "services": [
      {
        "portName": "mcp-http",
        "discoveryName": "mcp-server",
        "clientAliases": [{ "port": 8080, "dnsName": "mcp-server" }]
      }
    ]
  }
}

Terraform — Security Group for MCP tasksresource "aws_security_group" "mcp_tasks" {
  name        = "mcp-server-tasks"
  description = "MCP server ECS tasks — inbound from ALB only"
  vpc_id      = aws_vpc.main.id

  # Allow inbound ONLY from the ALB security group
  ingress {
    from_port       = 8080
    to_port         = 8080
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
    description     = "MCP HTTP from ALB"
  }

  # Allow all outbound (for AWS API calls, external tools)
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = { Name = "mcp-tasks", Environment = "production" }
}

⚠️

Never use assignPublicIp: ENABLED in production. Deploy Fargate tasks into private subnets with assignPublicIp: DISABLED. Use a NAT Gateway for outbound internet access (to call external APIs from your tools) and VPC Endpoints for AWS service access (Secrets Manager, S3, ECR) to keep traffic on the AWS backbone and avoid NAT Gateway data transfer charges.

§ 04 — ALB Configuration for SSE Transport

Application Load Balancer Tuning for Streaming

The Application Load Balancer sits between your MCP clients and ECS tasks. Its default settings are tuned for short-lived HTTP requests, not the long-lived connections that SSE requires. Without careful tuning, you will see mysterious connection drops every 60 seconds — the ALB's default idle timeout terminating your streaming connections.

The most critical setting is the ALB idle timeout. The default is 60 seconds — meaning any connection with no data flowing for 60 seconds is terminated by the ALB. MCP SSE sessions may have periods of silence between client requests. Set the idle timeout to at least 300 seconds (5 minutes) for interactive AI sessions, and up to 3600 seconds (1 hour) for long-running agentic workflows. The corresponding keepalive_timeout on your MCP server's HTTP server must be slightly longer than the ALB idle timeout to prevent the server from closing the connection before the ALB does.

Target group stickiness (session affinity) is essential for MCP Streamable HTTP transport. The session map (mapping session IDs to in-memory state) lives in one specific Fargate task. If two requests from the same MCP client are routed to different tasks, the second task will not have the session — and the request fails with a 404 or session-not-found error. Enable duration-based stickiness with a cookie TTL that matches your expected session duration.

The target group's deregistration delay controls how long the ALB waits before stopping traffic to a deregistering target (e.g., a task being replaced during a deployment). The default is 300 seconds. For MCP servers with active SSE sessions, this gives in-flight sessions 300 seconds to complete before the task is stopped. Pair this with ECS's stopTimeout in the task definition to give the container time to drain gracefully — send a final SSE event: close to all connected clients, then shut down.

Terraform — ALB, Target Groups & HTTPS Listener# ── Application Load Balancer ─────────────────────────────────────────
resource "aws_lb" "mcp" {
  name               = "mcp-server-alb"
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb.id]
  subnets            = [aws_subnet.public_a.id, aws_subnet.public_b.id]
  idle_timeout       = 3600  # 1 hour — supports long SSE sessions

  access_logs {
    bucket  = aws_s3_bucket.alb_logs.bucket
    prefix  = "mcp-alb"
    enabled = true
  }
}

# ── Blue Target Group (active) ────────────────────────────────────────
resource "aws_lb_target_group" "blue" {
  name                       = "mcp-blue"
  port                       = 8080
  protocol                   = "HTTP"
  vpc_id                     = aws_vpc.main.id
  target_type                = "ip"
  deregistration_delay       = 300  # allow sessions to drain
  slow_start                 = 30   # ramp traffic to new tasks over 30 s

  health_check {
    path                = "/health"
    protocol            = "HTTP"
    healthy_threshold   = 2
    unhealthy_threshold = 3
    interval            = 30
    timeout             = 5
    matcher             = "200"
  }

  stickiness {
    type            = "lb_cookie"
    cookie_duration = 86400  # 24 hours — sticky sessions for SSE
    enabled         = true
  }
}

# ── Green Target Group (for blue/green deployments) ───────────────────
resource "aws_lb_target_group" "green" {
  name                 = "mcp-green"
  port                 = 8080
  protocol             = "HTTP"
  vpc_id               = aws_vpc.main.id
  target_type          = "ip"
  deregistration_delay = 300
  slow_start           = 30

  health_check {
    path              = "/health"
    healthy_threshold = 2
    interval          = 30
    matcher           = "200"
  }

  stickiness {
    type            = "lb_cookie"
    cookie_duration = 86400
    enabled         = true
  }
}

# ── HTTPS Listener with ACM Certificate ──────────────────────────────
resource "aws_lb_listener" "https" {
  load_balancer_arn = aws_lb.mcp.arn
  port              = 443
  protocol          = "HTTPS"
  ssl_policy        = "ELBSecurityPolicy-TLS13-1-2-2021-06"
  certificate_arn   = aws_acm_certificate_validation.mcp.certificate_arn

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.blue.arn
  }
}

# ── HTTP -> HTTPS redirect ────────────────────────────────────────────
resource "aws_lb_listener" "http_redirect" {
  load_balancer_arn = aws_lb.mcp.arn
  port              = 80
  protocol          = "HTTP"

  default_action {
    type = "redirect"
    redirect {
      port        = "443"
      protocol    = "HTTPS"
      status_code = "HTTP_301"
    }
  }
}

💡

ACM certificate auto-renewal: Use aws_acm_certificate with validation_method = "DNS" and the aws_acm_certificate_validation resource to automate certificate issuance and renewal. ACM certificates are free and renew automatically 60 days before expiry — there is no reason to bring your own TLS certificate for ALB-terminated HTTPS.

§ 05 — Observability

CloudWatch Structured Logging, X-Ray & Alarms

A production MCP server without observability is a black box. You need structured logs to search and correlate events, distributed traces to understand latency across tool call chains, and alarms to wake you up when things go wrong. CloudWatch and X-Ray provide all three natively without adding third-party agents to your containers.

Structured logging means emitting JSON log lines instead of plain text. Every log line includes a timestamp, log level, trace ID, session ID, tool name, and any domain-specific fields. This structure lets CloudWatch Logs Insights queries answer questions like "what was the p99 latency for the search_documents tool in the last hour?" in seconds, without parsing freeform text. The Python structlog library provides excellent structured logging with minimal boilerplate.

AWS X-Ray traces the full lifecycle of a request through your MCP server. Enable the X-Ray SDK daemon sidecar in your ECS task — add an xray-daemon container that receives trace segments over UDP on port 2000 and forwards them to the X-Ray service. Your application code uses aws-xray-sdk to create segments and subsegments around tool calls, database queries, and external API calls. The resulting service map in the X-Ray console shows exactly where latency is coming from.

CloudWatch Alarms monitor the metrics that matter most for an MCP server: error rate (5XX responses from the ALB), p99 latency (ALB TargetResponseTime), and active connection count. Set alarm thresholds based on your SLA requirements and route alerts to an SNS topic that pages your on-call engineer. Composite alarms let you combine multiple conditions — for example, alert only when both error rate is elevated AND latency is high, avoiding false positives from brief transient errors.

Python — structured logging with structlog + X-Ray tracingimport structlog
import logging
import json
from aws_xray_sdk.core import xray_recorder, patch_all
from aws_xray_sdk.core.context_managers import xray_recorder_context
from fastmcp import FastMCP
from contextlib import asynccontextmanager
import time

# ── X-Ray configuration ───────────────────────────────────────────────
xray_recorder.configure(
    service="mcp-server",
    daemon_address="127.0.0.1:2000",  # X-Ray daemon sidecar
    context_missing="LOG_ERROR",
    sampling=True,
)
# Auto-patch boto3, requests, etc.
patch_all()

# ── Structured logging setup ──────────────────────────────────────────
def setup_logging():
    structlog.configure(
        processors=[
            structlog.contextvars.merge_contextvars,
            structlog.processors.add_log_level,
            structlog.processors.TimeStamper(fmt="iso"),
            structlog.stdlib.add_logger_name,
            structlog.processors.JSONRenderer(),  # emit JSON for CloudWatch
        ],
        wrapper_class=structlog.make_filtering_bound_logger(logging.INFO),
        logger_factory=structlog.PrintLoggerFactory(),
    )

setup_logging()
log = structlog.get_logger()

# ── Tracing middleware for MCP tools ──────────────────────────────────
def traced_tool(tool_name: str):
    """Decorator: wraps a tool handler with X-Ray subsegment + structured log."""
    def decorator(fn):
        async def wrapper(*args, **kwargs):
            start = time.monotonic()
            session_id = kwargs.get("session_id", "unknown")

            with xray_recorder.in_subsegment(f"tool.{tool_name}") as subsegment:
                subsegment.put_metadata("input", str(kwargs), namespace="mcp")
                log.info(
                    "tool.invoked",
                    tool=tool_name,
                    session_id=session_id,
                )
                try:
                    result = await fn(*args, **kwargs)
                    elapsed_ms = (time.monotonic() - start) * 1000
                    log.info(
                        "tool.succeeded",
                        tool=tool_name,
                        session_id=session_id,
                        duration_ms=round(elapsed_ms, 2),
                    )
                    return result
                except Exception as e:
                    elapsed_ms = (time.monotonic() - start) * 1000
                    subsegment.add_error_flag()
                    log.error(
                        "tool.failed",
                        tool=tool_name,
                        session_id=session_id,
                        duration_ms=round(elapsed_ms, 2),
                        error=str(e),
                        exc_info=True,
                    )
                    raise
        return wrapper
    return decorator

# ── Example tool with tracing ─────────────────────────────────────────
mcp = FastMCP("mcp-server")

@mcp.tool()
@traced_tool("search_documents")
async def search_documents(query: str, limit: int = 10) -> list[dict]:
    """Search the document store."""
    # ... implementation
    return []

JSON — CloudWatch Alarms via CloudFormation snippet{
  "MCPHighErrorRate": {
    "Type": "AWS::CloudWatch::Alarm",
    "Properties": {
      "AlarmName": "mcp-server-high-error-rate",
      "AlarmDescription": "MCP server 5XX error rate exceeds 1% over 5 minutes",
      "Namespace": "AWS/ApplicationELB",
      "MetricName": "HTTPCode_Target_5XX_Count",
      "Dimensions": [
        { "Name": "LoadBalancer", "Value": { "Fn::GetAtt": ["MCPAlb", "LoadBalancerFullName"] } }
      ],
      "Statistic": "Sum",
      "Period": 300,
      "EvaluationPeriods": 1,
      "Threshold": 10,
      "ComparisonOperator": "GreaterThanThreshold",
      "TreatMissingData": "notBreaching",
      "AlarmActions": [{ "Ref": "OnCallSNSTopic" }]
    }
  },
  "MCPHighLatency": {
    "Type": "AWS::CloudWatch::Alarm",
    "Properties": {
      "AlarmName": "mcp-server-high-latency",
      "AlarmDescription": "MCP server p99 latency exceeds 5 seconds",
      "Namespace": "AWS/ApplicationELB",
      "MetricName": "TargetResponseTime",
      "Dimensions": [
        { "Name": "LoadBalancer", "Value": { "Fn::GetAtt": ["MCPAlb", "LoadBalancerFullName"] } }
      ],
      "ExtendedStatistic": "p99",
      "Period": 60,
      "EvaluationPeriods": 3,
      "Threshold": 5,
      "ComparisonOperator": "GreaterThanThreshold",
      "TreatMissingData": "notBreaching",
      "AlarmActions": [{ "Ref": "OnCallSNSTopic" }]
    }
  }
}

JSON — CloudWatch Dashboard definition snippet{
  "widgets": [
    {
      "type": "metric", "width": 12, "height": 6,
      "properties": {
        "title": "MCP Tool Invocations & Error Rate",
        "metrics": [
          ["AWS/ApplicationELB", "RequestCount", "LoadBalancer", "mcp-server-alb", { "label": "Total Requests" }],
          ["AWS/ApplicationELB", "HTTPCode_Target_5XX_Count", "LoadBalancer", "mcp-server-alb", { "label": "5XX Errors", "color": "#ef4444" }]
        ],
        "period": 60, "stat": "Sum", "view": "timeSeries"
      }
    },
    {
      "type": "metric", "width": 12, "height": 6,
      "properties": {
        "title": "ALB Target Response Time (p50 / p99)",
        "metrics": [
          ["AWS/ApplicationELB", "TargetResponseTime", "LoadBalancer", "mcp-server-alb", { "stat": "p50", "label": "p50" }],
          ["...", { "stat": "p99", "label": "p99", "color": "#fbbf24" }]
        ],
        "period": 60, "view": "timeSeries", "yAxis": { "left": { "label": "Seconds" } }
      }
    },
    {
      "type": "metric", "width": 12, "height": 6,
      "properties": {
        "title": "ECS Task CPU & Memory Utilization",
        "metrics": [
          ["AWS/ECS", "CPUUtilization", "ServiceName", "mcp-server-svc", "ClusterName", "mcp-production", { "label": "CPU %" }],
          ["AWS/ECS", "MemoryUtilization", "ServiceName", "mcp-server-svc", "ClusterName", "mcp-production", { "label": "Memory %", "color": "#a78bfa" }]
        ],
        "period": 60, "stat": "Average", "view": "timeSeries"
      }
    }
  ]
}

💡

CloudWatch Logs Insights query for tool latency: Use this query to find slow tool calls from structured logs:

fields @timestamp, tool, duration_ms | filter ispresent(tool) | stats avg(duration_ms) as avg_ms, pct(duration_ms, 99) as p99_ms by tool | sort p99_ms desc

. Run it over the /ecs/mcp-server log group to immediately see which tools are your latency bottlenecks.

§ 06 — CI/CD Pipeline

CodePipeline + CodeBuild + Blue/Green Deployment

Manual deployments are error-prone and slow. A fully automated CI/CD pipeline builds your Docker image, runs tests, pushes to ECR, and deploys to ECS using a blue/green strategy that allows instant rollback and zero-downtime releases — all triggered by a git push to your main branch.

The pipeline has four stages. Source watches your CodeCommit or GitHub repository for changes and triggers the pipeline on every push to main. Build runs CodeBuild: installs dependencies, runs unit and integration tests, builds the Docker image, pushes it to ECR, and outputs an imageDetail.json artefact. Test optionally runs a second CodeBuild project that deploys to a staging environment and runs smoke tests. Deploy uses CodeDeploy's blue/green ECS deployment to shift traffic from the old (blue) task set to the new (green) task set.

CodeDeploy blue/green for ECS works as follows: it launches a new set of tasks with the new image (the green deployment group), registers them with the green target group, runs your optional test hook (a Lambda that hits your /health and smoke-test endpoints), and then shifts the ALB listener from blue to green. The old blue tasks stay running for a configurable bake time (e.g., 1 hour), during which you can trigger an immediate rollback with a single API call if you detect issues. After the bake time, the old tasks are terminated.

The buildspec.yml defines the CodeBuild build steps. It caches the pip download directory between builds to speed up dependency installation. It tags the image with both the git commit SHA (for immutable references) and latest (for convenience). The imageDetail.json output artifact is the handshake between CodeBuild and CodeDeploy — it tells CodeDeploy which image to deploy.

YAML — buildspec.yml (CodeBuild)version: "0.2"

env:
  variables:
    AWS_DEFAULT_REGION: "us-east-1"
    ECR_REPOSITORY_NAME: "mcp-server"
    CONTAINER_NAME: "mcp-server"
  parameter-store:
    AWS_ACCOUNT_ID: "/codebuild/aws-account-id"

phases:
  install:
    runtime-versions:
      python: "3.12"
    commands:
      - pip install --upgrade pip
      - pip install -r requirements-dev.txt

  pre_build:
    commands:
      # Authenticate to ECR
      - |
        aws ecr get-login-password --region $AWS_DEFAULT_REGION | \
          docker login --username AWS --password-stdin \
          "${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com"
      # Set image tags
      - IMAGE_TAG=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c1-8)
      - ECR_URI="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com/${ECR_REPOSITORY_NAME}"
      # Run tests
      - pytest tests/unit/ -v --tb=short
      - pytest tests/integration/ -v --tb=short -m "not slow"

  build:
    commands:
      # Multi-platform build targeting Graviton
      - docker buildx build
          --platform linux/arm64
          --target runtime
          --tag "${ECR_URI}:${IMAGE_TAG}"
          --tag "${ECR_URI}:latest"
          --push
          .
      # Create imageDetail.json for CodeDeploy
      - printf '[{"name":"%s","imageUri":"%s"}]' "$CONTAINER_NAME" "${ECR_URI}:${IMAGE_TAG}" > imageDetail.json

  post_build:
    commands:
      # Trigger ECR image vulnerability scan
      - |
        aws ecr start-image-scan \
          --repository-name "${ECR_REPOSITORY_NAME}" \
          --image-id imageTag="${IMAGE_TAG}" || true
      - echo "Build complete. Image ${ECR_URI}:${IMAGE_TAG}"

artifacts:
  files:
    - imageDetail.json
    - appspec.yaml
    - taskdef.json

cache:
  paths:
    - /root/.cache/pip/**/*

YAML — appspec.yaml (CodeDeploy ECS blue/green)version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: "<TASK_DEFINITION>"
        LoadBalancerInfo:
          ContainerName: "mcp-server"
          ContainerPort: 8080
        PlatformVersion: "LATEST"
        NetworkConfiguration:
          AwsvpcConfiguration:
            Subnets:
              - "subnet-0a1b2c3d4e5f60001"
              - "subnet-0a1b2c3d4e5f60002"
            SecurityGroups:
              - "sg-0mcp1234567890abc"
            AssignPublicIp: "DISABLED"

Hooks:
  - BeforeAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:mcp-smoke-test"
  - AfterAllowTraffic: "arn:aws:lambda:us-east-1:123456789012:function:mcp-post-deploy-verify"

Python — Lambda smoke test hook (BeforeAllowTraffic)import boto3, urllib.request, json, os

codedeploy = boto3.client("codedeploy")

def handler(event, context):
    deployment_id = event["DeploymentId"]
    lifecycle_event_hook_execution_id = event["LifecycleEventHookExecutionId"]

    try:
        # The green target group serves on a test listener port during BeforeAllowTraffic
        test_endpoint = os.environ["TEST_ENDPOINT"]  # e.g. http://internal-alb:8081

        # Health check
        req = urllib.request.urlopen(f"{test_endpoint}/health", timeout=10)
        health = json.loads(req.read())
        assert health["status"] == "healthy", f"Health check failed: {health}"

        # MCP protocol smoke test
        init_payload = json.dumps({
            "jsonrpc": "2.0", "id": 1, "method": "initialize",
            "params": {"protocolVersion": "2024-11-05", "capabilities": {},
                        "clientInfo": {"name": "smoke-test", "version": "1.0"}}
        }).encode()
        req2 = urllib.request.Request(
            f"{test_endpoint}/mcp",
            data=init_payload,
            headers={"Content-Type": "application/json"}
        )
        resp = urllib.request.urlopen(req2, timeout=15)
        init_result = json.loads(resp.read())
        assert "result" in init_result, "MCP initialize failed"

        # All checks passed — signal success to CodeDeploy
        codedeploy.put_lifecycle_event_hook_execution_status(
            deploymentId=deployment_id,
            lifecycleEventHookExecutionId=lifecycle_event_hook_execution_id,
            status="Succeeded"
        )
    except Exception as e:
        print(f"Smoke test FAILED: {e}")
        # Signal failure — CodeDeploy will abort and keep traffic on blue
        codedeploy.put_lifecycle_event_hook_execution_status(
            deploymentId=deployment_id,
            lifecycleEventHookExecutionId=lifecycle_event_hook_execution_id,
            status="Failed"
        )
        raise

Git push triggers CodePipeline

A merge to main triggers the Source stage. CodePipeline downloads the source artefact from CodeCommit or GitHub and starts the Build stage.

CodeBuild: test, build, push

Unit and integration tests run. If they pass, Docker builds the multi-stage image for ARM64, pushes it to ECR with the commit SHA tag, and outputs imageDetail.json.

CodeDeploy launches green task set

CodeDeploy creates a new ECS task set using the new image, registers tasks with the green target group, and waits for health checks to pass (all tasks healthy).

BeforeAllowTraffic smoke test hook

The Lambda smoke test fires against the green task set via the test listener. It verifies the /health endpoint and the MCP initialize handshake. Failure aborts the deployment — traffic stays on blue.

ALB listener shifts to green

CodeDeploy updates the HTTPS listener rule to forward traffic to the green target group. New MCP sessions go to the new version. Existing sessions on blue continue until their connection closes or the bake period ends.

Blue tasks deregistered after bake period

After the configured bake time (default: 1 hour), blue tasks are deregistered and terminated. The green deployment becomes the new blue. You can trigger instant rollback at any point during the bake window.

ℹ️

Rollback in 30 seconds: During the bake window after a blue/green deployment, run aws deploy stop-deployment --deployment-id d-XXXXXXXX --auto-rollback-enabled to shift traffic back to the blue task set instantly. No redeployment required — the blue tasks are still running and registered with the original target group.

Knowledge Check

4 questions · instant feedback · Production Deployment checkpoint

1. Why is ECS Fargate preferred over AWS Lambda for hosting MCP servers that use SSE transport?

2. What ALB setting must be increased from its 60-second default to prevent SSE connections from being dropped during quiet periods?

3. Why is target group stickiness (sticky sessions) required for MCP Streamable HTTP servers deployed behind an ALB?

4. In a CodeDeploy blue/green ECS deployment, what happens if the BeforeAllowTraffic Lambda hook signals failure?

out of 4 correct —

← Previous Day

Day 22: Multi-Agent MCP Orchestration

Agent hierarchies, handoff protocols & shared context

Next Day →

Day 24: Docker & Containerization Deep Dive

Compose, multi-service stacks, networking & secrets management