š Day 19ā± 55 minš„ Level 3 ā Ascendš” Transport
Streaming Responses with SSE Transport
stdio makes you wait for the whole answer. SSE streams results token-by-token as they're generated ā critical for long-running tools, real-time data feeds, and LLM-backed tools. This day covers SSE transport from internals to production deployment.
User experience is determined by perceived latency, not actual latency. A tool that streams its first result in 200ms feels faster than one that returns everything in 3 seconds ā even if the total data is identical. SSE streaming in MCP is how you build that experience.
MCP supports multiple transport mechanisms. Choosing the right one affects your architecture, security model, and developer experience. Here's the definitive decision matrix:
Criterion
stdio
SSE (HTTP)
Streamable HTTP
Use case
Local tools, CLI
Remote, multi-user
REST clients
Network access
None (local only)
HTTP/HTTPS
HTTP/HTTPS
Streaming
Full response only
True streaming
True streaming
Auth support
OS-level (file perms)
HTTP headers/OAuth
HTTP headers/OAuth
Multiple clients
One process per client
Many concurrent clients
Many concurrent clients
Deployment
Same machine
ECS/Lambda/K8s
ECS/Lambda/K8s
š”
Rule of thumb
Use stdio for developer tools on a local machine (Claude Desktop). Use SSE for anything deployed to a server, shared with multiple users, or needing real-time streaming updates.
š¬ SSE Internals
SSE Protocol Internals
Server-Sent Events (SSE) is a W3C standard built on top of HTTP. The client opens a persistent HTTP connection and the server pushes text frames down it ā one-directional, text-only, and automatically reconnecting. MCP uses SSE for the server-to-client stream, and a separate HTTP POST endpoint for client-to-server messages.
š» The Radio Station Analogy
SSE is like tuning into a radio station. You connect once (the HTTP request) and then just listen ā the station (server) pushes whatever it wants to broadcast (events) whenever it wants. You don't need to ask "any new songs?" every few seconds. If the signal drops, your radio automatically tries to reconnect and picks up where it left off. In MCP: the radio station is your tool running on the server, and the songs are streamed tool results.
# SSE wire format ā what actually goes over the HTTP connection# Each event is: field: value\n\n (double newline terminates event)data: {"jsonrpc":"2.0","id":1,"result":{"type":"text","text":"Processing"}}data: {"jsonrpc":"2.0","id":1,"result":{"type":"text","text":" step 1"}}data: {"jsonrpc":"2.0","id":1,"result":{"type":"text","text":" of 3 complete"}}event: donedata: {"jsonrpc":"2.0","id":1,"result":{"type":"text","text":"Done!"}}# Client reconnection: Last-Event-ID header tells server where to resumeid: event-42data: {"chunk": 42, ...}# Keepalive (prevents proxy timeouts): empty comment every 15s: keepalive
šļø Build SSE Server
Building an SSE MCP Server
FastMCP makes SSE transport trivially easy to enable. Change one line and you have a full SSE server. The hard parts are deployment (covered in Day 23) and keeping long connections alive through load balancers and proxies.
fromfastmcpimportFastMCPimportasynciomcp = FastMCP("StreamingServer")
@mcp.tool()
async defanalyze_large_dataset(dataset_id: str) -> str:
"""Long-running analysis that streams progress updates."""steps = [
"Loading dataset from S3...",
"Validating schema...",
"Running statistical analysis...",
"Generating visualizations...",
"Writing report...",
]
results = []
fori, stepinenumerate(steps, 1):
awaitasyncio.sleep(0.5) # simulates real workresults.append(f"[{i}/{len(steps)}] {step} ā")
# yield intermediate result ā FastMCP streams this to clientyieldf"[{i}/{len(steps)}] {step}"yieldf"Analysis complete for {dataset_id}!"# Start as SSE server ā one flag change from stdioif__name__ == "__main__":
mcp.run(transport="sse", host="0.0.0.0", port=8080)
ā ļø
ALB timeout setting
AWS Application Load Balancer has a 60-second idle timeout by default. For streaming tools that might pause between chunks, increase the idle timeout to 300s in your ALB settings ā or send keepalive comments (: ping) every 30 seconds to keep the connection alive.
ā” Progress Streaming
Streaming Long-Running Tool Results
The most powerful use of SSE in MCP is streaming progress from genuinely long-running operations: file processing, AI model inference, database migrations, report generation. Instead of making the user wait 30 seconds with no feedback, stream progress at each major milestone.
fromfastmcpimportFastMCP, Contextimportboto3, asynciofromtypingimportAsyncGeneratormcp = FastMCP("ProductionStreamer")
s3 = boto3.client("s3")
bedrock = boto3.client("bedrock-runtime")
@mcp.tool()
async defsummarize_documents(
ctx: Context,
s3_bucket: str,
prefix: str
) -> AsyncGenerator:
"""Stream summaries of all documents in an S3 prefix."""# Step 1: List objects (immediate feedback)yieldf"š Scanning s3://{s3_bucket}/{prefix}..."paginator = s3.get_paginator("list_objects_v2")
objects = []
forpageinpaginator.paginate(Bucket=s3_bucket, Prefix=prefix):
objects.extend(page.get("Contents", []))
yieldf"ā Found {len(objects)} documents. Starting analysis..."# Step 2: Process each document with streaming updatefori, objinenumerate(objects, 1):
key = obj["Key"]
yieldf"š [{i}/{len(objects)}] Processing {key}..."# Fetch and summarize (simplified ā real impl calls Bedrock)body = s3.get_object(Bucket=s3_bucket, Key=key)["Body"].read()
yieldf" ā {key}: {len(body)} bytes read"yieldf"\nš Complete! Processed {len(objects)} documents."
š„ļø Client Consumption
Client-Side SSE Consumption
When you're building a custom client (a web app, a monitoring dashboard, a custom Claude integration), you need to consume SSE streams directly. The browser EventSource API and Node.js eventsource package make this straightforward.
// Browser: consuming an MCP SSE stream from a web appconstsse = newEventSource('https://api.yourdomain.com/mcp/sse', {
headers: { 'Authorization': 'Bearer YOUR_TOKEN' }
});
sse.onmessage = function(event) {
constdata = JSON.parse(event.data);
// Each message is a JSON-RPC result chunkif (data.result?.type === 'text') {
document.getElementById('output').textContent += data.result.text;
}
};
sse.onerror = function(err) {
// EventSource auto-reconnects on error ā this fires during reconnect attemptconsole.warn('SSE reconnecting...', err);
};
sse.addEventListener('done', function() {
sse.close(); // Clean up when tool is completeshowCompletion();
});
š Error Handling
Error Handling & Reconnection Logic
SSE connections drop ā network hiccups, server restarts, ALB timeouts. You need idempotent tool operations (safe to retry), event IDs for resuming streams, and smart backoff logic on the client.
A data analytics company runs 50+ MCP servers on ECS Fargate behind an ALB. Each server handles 200 concurrent SSE connections. ALB timeout is set to 300s. Each server sends : ping every 25s to prevent idle disconnection. CloudWatch monitors connection counts ā auto-scaling kicks in at 150 concurrent connections per task. Result: zero dropped streams in normal operation, graceful reconnect on deploys.
š§ Knowledge Check ā Day 19
4 questions on SSE transport and streaming
QUESTION 01 / 04
Which transport should you choose for a multi-user MCP server deployed on AWS ECS that needs real-time streaming?
Astdio ā it's the default and most compatible
BSSE (HTTP) ā designed for remote, multi-user streaming deployments
CWebSocket ā more powerful than SSE
DgRPC ā lowest latency
ā B. SSE transport is the right choice for remote multi-user deployments. stdio is local-only. MCP doesn't natively support WebSocket or gRPC transports in the current spec.
QUESTION 02 / 04
Why do SSE MCP servers need keepalive comments sent periodically?
ATo increase throughput
BTo prevent AWS ALB and proxy servers from closing idle connections due to timeout
CTo authenticate each event
DTo compress the stream
ā B. AWS ALB has a 60-second idle timeout. If no data is sent in 60 seconds, the connection is closed. Keepalive pings (empty SSE comments) keep the connection "active" even during tool processing delays.
QUESTION 03 / 04
In FastMCP, how do you stream intermediate results from a tool?
AReturn a list of strings
BUse yield statements inside the tool function (async generator)
CCall ctx.send_stream()
DWrite to a shared queue
ā B. Using yield makes the tool an async generator. FastMCP automatically handles streaming each yielded value to the client via the SSE connection. This is the cleanest, most Pythonic way to stream.
QUESTION 04 / 04
What is the purpose of event IDs in SSE streams?
AAuthentication tokens for each event
BSequence numbers that allow clients to resume a broken stream from the last received event
CDatabase primary keys for event storage
DRate limiting identifiers
ā B. When a client reconnects, it sends the Last-Event-ID header. The server can skip already-delivered events and resume from the next one ā giving you resumable, reliable streaming.
Up Next ā Day 20
Multi-Server Orchestration & Composition
Route requests across multiple specialized MCP servers and compose their tools into a unified interface.