The next-generation MCP transport introduced in spec 2025-03-26 — a single endpoint that handles everything: synchronous responses, real-time SSE streams, resumability via event IDs, and request batching.
The Model Context Protocol needs a transport layer to move JSON-RPC messages between client and server. The choice of transport determines latency, scalability, proxy compatibility, and resilience. The first generation used stdio — pipes between processes — which is simple and perfect for local tools but impossible to expose over a network. The second generation added HTTP+SSE: clients posted messages to one endpoint while a long-lived Server-Sent Events connection carried responses back. This enabled networked MCP servers but required two separate URL paths, caused significant headaches with reverse proxies (which buffer SSE by default), and offered zero resilience to disconnects. WebSocket support arrived as an unofficial pattern — it solved the two-endpoint problem but introduced stateful upgrade handshakes that break load balancers and serverless platforms entirely.
Streamable HTTP, introduced in MCP spec 2025-03-26, is the clean break the ecosystem needed. A single /mcp endpoint handles all traffic. The server decides response-by-response whether to reply with application/json (synchronous) or text/event-stream (streaming). Every SSE event carries an id: field, and clients resume from exactly where they left off after a disconnect. Proxies work out-of-the-box for synchronous calls; only streaming responses need proxy_buffering off. Serverless functions handle the common case of non-streaming calls without long-lived connections.
application/json and closes immediately. Only when the server needs to push progress events or the client subscribes to notifications does it upgrade to text/event-stream. This makes it compatible with serverless functions for the common case.The entire MCP protocol flows through /mcp. There is no separate SSE endpoint like in the old transport. The server makes a runtime decision for every POST request: does this response require streaming? If yes, it opens a text/event-stream response and sends events. If no, it returns a plain application/json object and closes. This flexibility means your infrastructure only needs to handle long-lived connections for operations that actually need them.
Content-Type: application/json for synchronous responses, or opens Content-Type: text/event-stream for streaming. Session ID is tracked via the Mcp-Session-Id response header on initialize, then echoed back as a request header on all subsequent calls.GET /mcp request with Accept: text/event-stream and its session ID. The server responds with an SSE stream and pushes notifications/resources/updated, notifications/tools/list_changed, or custom events as they occur. This is the "subscription channel" for a session.DELETE /mcp with their session ID. The server closes open SSE connections, cleans up in-memory state, removes the session from the store, and returns 204 No Content. Without this, servers must rely on timeouts to detect abandoned sessions — always implement DELETE.id: field containing a UUID or monotonically increasing counter. When a connection drops, the client reconnects and sends the Last-Event-ID HTTP header with the last event ID it received. The server then replays all events that occurred after that ID, ensuring zero message loss across reconnects. This is the single biggest differentiator over HTTP+SSE.Mcp-Session-Id header is a correlation handle, not a bearer token. Always require a separate Authorization: Bearer … header validated on every request. Treat session IDs as opaque, random, unguessable identifiers — use randomUUID() never sequential integers.The StreamableHTTPServerTransport class from the MCP SDK handles the SSE plumbing, event ID generation, and the content-type decision for you. Your job is to handle the routing logic: route to an existing session's transport if a session ID header is present, or create a new transport and MCP server for fresh initialize requests. The session store is simply a Map<string, StreamableHTTPServerTransport> — for production you'll replace this with Redis (see Section 12).
// streamable-http-server.ts
import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { isInitializeRequest } from '@modelcontextprotocol/sdk/types.js';
import { randomUUID } from 'crypto';
import { z } from 'zod';
const app = express();
app.use(express.json());
// In-memory session store: sessionId → transport
// Production: replace with Redis-backed store
const sessions = new Map<string, StreamableHTTPServerTransport>();
// Helper: create and wire up a fresh McpServer for each session
function createMcpServer(): McpServer {
const server = new McpServer({
name: 'streamable-demo',
version: '1.0.0',
});
// Tool with progress streaming
server.tool(
'long_running_task',
{ duration: z.number().describe('Total duration in ms') },
async ({ duration }, { meta }) => {
const progressToken = meta?.progressToken;
for (let i = 0; i <= 100; i += 10) {
if (progressToken !== undefined) {
await server.server.sendProgress({
progressToken,
progress: i,
total: 100,
});
}
await new Promise(r => setTimeout(r, duration / 10));
}
return {
content: [{ type: 'text', text: 'Task complete!' }],
};
}
);
return server;
}
// ── POST /mcp — all client-to-server messages ──────────────────────────────
app.post('/mcp', async (req, res) => {
const sessionId = req.headers['mcp-session-id'] as string | undefined;
let transport: StreamableHTTPServerTransport;
if (sessionId && sessions.has(sessionId)) {
// Existing session — reuse transport
transport = sessions.get(sessionId)!;
} else if (!sessionId && isInitializeRequest(req.body)) {
// New session — create transport + server pair
const newSessionId = randomUUID();
transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => newSessionId,
onsessioninitialized: (id) => {
sessions.set(id, transport);
console.log(`Session created: ${id}`);
},
});
const server = createMcpServer();
await server.connect(transport);
} else {
// No session ID and not an initialize request — reject
res.status(400).json({ error: 'Missing Mcp-Session-Id or not an initialize request' });
return;
}
// Delegate to the transport — it handles content-type negotiation,
// SSE framing, event IDs, and JSON-RPC response formatting
await transport.handleRequest(req, res, req.body);
});
// ── GET /mcp — server-initiated notifications (subscriptions) ─────────────
app.get('/mcp', async (req, res) => {
const sessionId = req.headers['mcp-session-id'] as string | undefined;
if (!sessionId) { res.status(400).json({ error: 'Mcp-Session-Id required' }); return; }
const transport = sessions.get(sessionId);
if (!transport) { res.status(404).json({ error: 'Session not found' }); return; }
// Opens an SSE stream for server → client notifications
await transport.handleRequest(req, res);
});
// ── DELETE /mcp — graceful session termination ────────────────────────────
app.delete('/mcp', async (req, res) => {
const sessionId = req.headers['mcp-session-id'] as string | undefined;
if (!sessionId) { res.status(400).json({ error: 'Mcp-Session-Id required' }); return; }
const transport = sessions.get(sessionId);
if (transport) {
await transport.close();
sessions.delete(sessionId);
console.log(`Session closed: ${sessionId}`);
}
res.status(204).end();
});
app.listen(3000, () => console.log('MCP server listening on :3000'));streamable-http-server.ts
The key design decision is the routing logic inside POST /mcp. When a request arrives without a session ID and the body is an initialize message, a brand new StreamableHTTPServerTransport is instantiated and a fresh McpServer is connected to it. The SDK calls onsessioninitialized after the handshake completes, at which point you store the transport by session ID. All subsequent requests for that session route directly to the stored transport via transport.handleRequest().
McpServer instance per session. This is the safest approach — session-level state (like authenticated user context) stays isolated. You can share a single McpServer across sessions if your tools are truly stateless, but you must be careful not to leak state between sessions.StreamableHTTPClientTransport that handles session ID negotiation, automatic reconnection with Last-Event-ID, and the GET subscription channel. From the application layer, using it looks nearly identical to any other transport.The client transport constructor takes the server URL and an optional options object. The most important option is requestInit — an object passed to every fetch call — which is where you inject authentication headers. After calling client.connect(transport), the SDK sends the initialize handshake via POST, stores the returned session ID, and attaches it to all future requests automatically. You never handle session IDs manually in client code.
// streamable-http-client.ts
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
// Create transport pointing at the server URL
const transport = new StreamableHTTPClientTransport(
new URL('http://localhost:3000/mcp'),
{
// Merged into every fetch() call — use for auth headers, custom headers, etc.
requestInit: {
headers: {
'Authorization': `Bearer ${process.env.API_TOKEN}`,
},
},
}
);
const client = new Client({
name: 'my-client',
version: '1.0.0',
});
// connect() sends initialize, stores session ID automatically
await client.connect(transport);
// List available tools
const { tools } = await client.listTools();
console.log('Tools:', tools.map(t => t.name));
// Call a tool that streams progress (SDK receives notifications automatically)
const result = await client.callTool({
name: 'long_running_task',
arguments: { duration: 5000 },
_meta: { progressToken: 'progress-1' },
});
console.log('Result:', result.content[0]);
// Register a progress notification handler
client.setNotificationHandler(
{ method: 'notifications/progress' },
(notification) => {
const { progress, total } = notification.params as any;
process.stdout.write(`\rProgress: ${progress}/${total}`);
}
);
// Subscribe to resource updates — opens the GET /mcp SSE channel
await client.subscribeResource({ uri: 'metrics://cpu' });
// Handle resource update notifications
client.setNotificationHandler(
{ method: 'notifications/resources/updated' },
(notification) => {
console.log('Resource updated:', (notification.params as any).uri);
}
);
// Graceful shutdown — sends DELETE /mcp
await client.close();streamable-http-client.ts
The requestInit option is passed directly to the Fetch API, so any valid RequestInit property works: custom headers, credentials mode, referrer policy, timeouts via AbortSignal. For mTLS client certificate authentication in Node.js, you'd pass a custom dispatcher via undici rather than requestInit.
Mcp-Session-Id from the initialize response and silently attaches it to every subsequent POST, GET, and DELETE. Zero configuration required in application code.Last-Event-ID to resume from the last received event ID.setNotificationHandler. Progress events from streaming tool calls arrive on the POST response stream.client.close() sends DELETE /mcp, closes the GET SSE connection, and releases all local state. Always call it to avoid orphaned server sessions.The mechanism is straightforward but requires discipline: every SSE event the server sends MUST include an id: field. This can be a UUID, a UUID prefixed with a sequence number, or simply a monotonically increasing integer per session. When the client reconnects, browsers and the Fetch API automatically include a Last-Event-ID HTTP header set to the last ID the client received. The server checks this header and replays all events that occurred after that ID.
The SDK handles event ID generation and the Last-Event-ID header on the client side automatically. What you need to provide on the server side is an event store — a per-session buffer of recent events that can be queried by "give me everything after event ID X".
// event-store.ts
interface StoredEvent {
id: string;
data: string;
timestamp: number;
}
export class ResumableEventStore {
// Map from sessionId → ordered list of events
private events = new Map<string, StoredEvent[]>();
private TTL = 5 * 60 * 1000; // Keep events for 5 minutes
store(sessionId: string, eventId: string, data: string): void {
const list = this.events.get(sessionId) ?? [];
list.push({ id: eventId, data, timestamp: Date.now() });
// Prune events older than TTL to bound memory usage
const cutoff = Date.now() - this.TTL;
this.events.set(
sessionId,
list.filter(e => e.timestamp > cutoff)
);
}
getEventsAfter(sessionId: string, lastEventId: string): StoredEvent[] {
const list = this.events.get(sessionId) ?? [];
const idx = list.findIndex(e => e.id === lastEventId);
// If lastEventId not found (too old, pruned), return empty array
// Server should respond 400 in this case — client must restart
if (idx === -1) return [];
return list.slice(idx + 1);
}
clearSession(sessionId: string): void {
this.events.delete(sessionId);
}
}
// Integration into the transport layer:
// StreamableHTTPServerTransport accepts an eventStore option
const store = new ResumableEventStore();
transport = new StreamableHTTPServerTransport({
sessionIdGenerator: () => randomUUID(),
onsessioninitialized: (id) => sessions.set(id, transport),
eventStore: store, // SDK calls store.store() on every sent event
// and store.getEventsAfter() on reconnect
});event-store.ts
The event store TTL policy is a tradeoff between memory and resilience. A 5-minute TTL means clients can survive a 5-minute network outage and resume seamlessly. Beyond that, events are pruned and the client must restart the operation. For truly critical operations you can persist events to Redis with a longer TTL, accepting higher storage costs. The ResumableEventStore interface is designed so you can swap in any backend.
id: evt-42\ndata: {...}\n\n. Client receives it. Last-known ID = evt-42. Client processes the event normally.Last-Event-ID: evt-42 in the request headers.store.getEventsAfter(sessionId, 'evt-42') and writes the returned events to the new response stream before resuming live events. Client receives all missed events in order.getEventsAfter returns an empty array because the event was pruned (TTL expired), respond with HTTP 400 instead of silently starting a new stream from the current position. This signals to the client that it cannot resume and must restart the operation from scratch, preventing silent data gaps.Batching is most valuable when a single AI operation needs data from several independent sources simultaneously. Instead of three sequential round-trips — each with full HTTP overhead plus potential SSE stream setup — a single POST delivers all three requests, the server fans them out internally, and results arrive in one response. Measured end-to-end, this typically cuts latency by 40–60% for multi-source queries.
The server decides whether to respond with a JSON array or an SSE stream based on whether any of the batched requests requires streaming. If all requests are synchronous (no progress events, no streaming tools), the response is a plain application/json array. If at least one request opens a stream, the entire batch response becomes text/event-stream and each result arrives as a separate SSE event tagged with the corresponding JSON-RPC id.
// batch-client.ts — low-level batch via raw fetch
// (The SDK handles this internally for concurrent callTool() calls)
const sessionId = transport.sessionId; // exposed by SDK after connect()
const batch = [
{
jsonrpc: '2.0', id: 1,
method: 'tools/call',
params: { name: 'get_user', arguments: { id: '123' } }
},
{
jsonrpc: '2.0', id: 2,
method: 'tools/call',
params: { name: 'get_orders', arguments: { userId: '123' } }
},
{
jsonrpc: '2.0', id: 3,
method: 'resources/read',
params: { uri: 'config://app' }
}
];
const response = await fetch('http://localhost:3000/mcp', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Mcp-Session-Id': sessionId,
'Authorization': `Bearer ${token}`,
'Accept': 'application/json, text/event-stream',
},
body: JSON.stringify(batch),
});
const contentType = response.headers.get('content-type') ?? '';
if (contentType.includes('text/event-stream')) {
// At least one request is streaming — parse individual SSE events
// Each event carries one JSON-RPC response with the original id field
for await (const event of parseSSEStream(response.body!)) {
const rpc = JSON.parse(event.data);
console.log(`Response for request #${rpc.id}:`, rpc.result ?? rpc.error);
}
} else {
// All synchronous — plain JSON array response
const results = await response.json() as Array<{ id: number; result?: any; error?: any }>;
for (const r of results) {
console.log(`Response for request #${r.id}:`, r.result ?? r.error);
}
}
// Helper to async-iterate an SSE ReadableStream
async function* parseSSEStream(body: ReadableStream<Uint8Array>) {
const reader = body.pipeThrough(new TextDecoderStream()).getReader();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += value;
const parts = buffer.split('\n\n');
buffer = parts.pop() ?? '';
for (const part of parts) {
const dataLine = part.split('\n').find(l => l.startsWith('data:'));
if (dataLine) yield { data: dataLine.slice(5).trim() };
}
}
}batch-client.ts
client.callTool() or client.readResource() concurrently with Promise.all(), the SDK buffers the outgoing JSON-RPC objects within the same event-loop tick and sends them as a single batch POST. You get batching for free — no special API needed. Single sequential awaits will still send separate requests.progressToken with the tool call; the server sends back notifications/progress events in real time. Streamable HTTP makes this elegant: progress events flow on the same POST response SSE stream as the final result.Under HTTP+SSE, progress notifications had to flow on the pre-established SSE connection (the GET /sse channel), while the tool call result came back as a POST response. This created a complex routing problem: the server had to correlate the POST response with the SSE session and inject progress events there. With Streamable HTTP, a single POST to /mcp opens a streaming response and both progress events and the final result flow on that same stream — the routing problem dissolves entirely.
// progress-tool.ts — full progress-aware tool implementation
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
export function registerAnalyzeTool(server: McpServer) {
server.tool(
'analyze_repository',
{
repo: z.string().describe('GitHub repository URL or path'),
deep: z.boolean().default(false).describe('Enable deep AST analysis'),
},
async ({ repo, deep }, { meta }) => {
const progressToken = meta?.progressToken;
// Helper to send progress if client requested it
const report = async (progress: number, message: string) => {
if (progressToken !== undefined) {
await server.server.sendProgress({
progressToken,
progress,
total: 100,
message,
});
}
};
const phases = [
{ pct: 10, msg: 'Cloning repository...' },
{ pct: 30, msg: 'Parsing source files...' },
{ pct: 60, msg: 'Running static analysis...' },
{ pct: 85, msg: 'Resolving dependencies...' },
{ pct: 95, msg: 'Generating report...' },
];
for (const phase of phases) {
await report(phase.pct, phase.msg);
await doPhase(phase.msg, repo, deep); // your actual work here
}
await report(100, 'Complete!');
return {
content: [
{
type: 'text',
text: `Analysis of ${repo} complete.\n` +
`Found 42 files, 3 issues, 0 critical vulnerabilities.`,
},
],
};
}
);
}
// Client side — register progress handler before calling the tool
client.setNotificationHandler(
{ method: 'notifications/progress' },
(notification) => {
const params = notification.params as {
progressToken: string;
progress: number;
total: number;
message?: string;
};
const pct = Math.round((params.progress / params.total) * 100);
const bar = '█'.repeat(Math.floor(pct / 5)) + '░'.repeat(20 - Math.floor(pct / 5));
process.stdout.write(`\r[${bar}] ${pct}% ${params.message ?? ''}`);
if (params.progress === params.total) {
process.stdout.write('\n');
}
}
);
// Call with progress token — SDK sends _meta.progressToken automatically
const result = await client.callTool({
name: 'analyze_repository',
arguments: { repo: 'https://github.com/myorg/myapp', deep: true },
_meta: { progressToken: 'analyze-run-' + Date.now() },
});progress-tool.ts
The progressToken is opaque from the server's perspective — it echoes it back verbatim in every notifications/progress event. The client uses it to correlate progress events with the specific tool call that triggered them, which matters when multiple tool calls are running concurrently. Always check if (progressToken !== undefined) before sending progress — some clients omit the token when they don't want progress updates, and calling sendProgress without a token will throw.
notifications/message events and a final empty result — or consider exposing the live stream as a subscribable resource instead of a tool.The standard pattern is Bearer token authentication validated on every request. On the initialize POST, the server binds the authenticated user's ID to the newly created session. On all subsequent requests, the middleware verifies not just that the token is valid, but that the token's subject claim matches the session's owner. This prevents session hijacking: even if an attacker learns a valid session ID, they cannot use it without a matching auth token.
// auth-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { validateJWT, JWTClaims } from './jwt-validator.js';
// Extended session store includes owner binding
const sessions = new Map<string, {
transport: StreamableHTTPServerTransport;
ownerId: string;
createdAt: number;
lastActivityAt: number;
}>();
export async function mcpAuthMiddleware(
req: Request,
res: Response,
next: NextFunction
): Promise<void> {
// Always allow CORS preflight through
if (req.method === 'OPTIONS') { next(); return; }
const authHeader = req.headers.authorization;
if (!authHeader?.startsWith('Bearer ')) {
res.status(401).json({
error: 'unauthorized',
message: 'Missing or malformed Authorization header',
});
return;
}
let claims: JWTClaims;
try {
claims = await validateJWT(authHeader.slice(7));
} catch {
res.status(401).json({ error: 'unauthorized', message: 'Invalid token' });
return;
}
// For session-scoped requests, verify token matches session owner
const sessionId = req.headers['mcp-session-id'] as string | undefined;
if (sessionId) {
const session = sessions.get(sessionId);
if (!session) {
res.status(404).json({ error: 'session_not_found' });
return;
}
if (session.ownerId !== claims.sub) {
res.status(403).json({
error: 'forbidden',
message: 'Session belongs to a different user',
});
return;
}
// Update activity timestamp
session.lastActivityAt = Date.now();
}
// Attach claims to request for downstream handlers
(req as any).user = claims;
next();
}
// CORS configuration for browser clients
app.use('/mcp', (req, res, next) => {
const origin = req.headers.origin;
if (origin) {
res.setHeader('Access-Control-Allow-Origin', origin);
res.setHeader('Access-Control-Allow-Methods', 'GET, POST, DELETE, OPTIONS');
res.setHeader(
'Access-Control-Allow-Headers',
'Authorization, Content-Type, Mcp-Session-Id, Last-Event-ID'
);
res.setHeader('Access-Control-Max-Age', '86400');
res.setHeader('Vary', 'Origin');
}
if (req.method === 'OPTIONS') { res.status(204).end(); return; }
next();
});
// Session idle timeout cleanup (run every 5 minutes)
setInterval(() => {
const IDLE_TTL = 30 * 60 * 1000; // 30 minutes
const MAX_TTL = 24 * 60 * 60 * 1000; // 24 hours
const now = Date.now();
for (const [id, session] of sessions.entries()) {
const idle = now - session.lastActivityAt > IDLE_TTL;
const expired = now - session.createdAt > MAX_TTL;
if (idle || expired) {
session.transport.close().catch(() => {});
sessions.delete(id);
console.log(`Session ${id} expired (idle=${idle}, maxTTL=${expired})`);
}
}
}, 5 * 60 * 1000);
// Apply middleware before all MCP handlers
app.use('/mcp', mcpAuthMiddleware);
app.post('/mcp', mcpPostHandler);
app.get('/mcp', mcpGetHandler);
app.delete('/mcp', mcpDeleteHandler);auth-middleware.ts
express-rate-limit with a custom key generator handles this cleanly.| Feature | stdio | HTTP + SSE | WebSocket | Streamable HTTP ✨ |
|---|---|---|---|---|
| Endpoints | N/A | 2 (GET /sse + POST /message) | 1 (WS upgrade) | 1–3 (POST + GET + DELETE /mcp) |
| Resumability | ❌ | ❌ | ❌ | ✅ via event IDs |
| Proxy friendly | ❌ | ⚠️ needs buffering off | ⚠️ needs WS upgrade | ✅ works by default |
| Browser support | ❌ | ✅ | ✅ | ✅ |
| Batching | ❌ | ❌ | ✅ | ✅ |
| Progress streaming | N/A | ⚠️ complex routing | ✅ | ✅ elegant |
| Serverless | ❌ | ⚠️ | ❌ | ✅ non-streaming calls |
| MCP spec version | 2024-11 | 2024-11 | — | 2025-03 |
The migration from HTTP+SSE to Streamable HTTP is surgical — the surface area that changes is small, and the MCP SDK makes it mechanical. If your server and client both use the official SDK, the migration can be completed in under an hour for a typical server.
SSEServerTransport → StreamableHTTPServerTransport. On the client, swap SSEClientTransport (or custom fetch logic) → StreamableHTTPClientTransport. The constructor signatures differ — see Section 3 and 4 for exact options.GET /sse route and the old POST /message route. Replace with a single POST /mcp handler with the session-routing logic from Section 3.proxy_buffering off from the message endpoint — buffering only needs to be disabled for responses that are actually streams. Keep proxy_buffering off on GET /mcp and conditionally on POST /mcp when the response is SSE. See Section 10 for full nginx config.ResumableEventStore from Section 5 is a drop-in starting point./sse and /mcp) during a migration window, allowing old and new clients to coexist. Deprecate and remove the old SSE endpoints once all clients have migrated.# nginx-streamable-http.conf
upstream mcp_backend {
server 127.0.0.1:3000;
# With in-memory session store: use sticky sessions
# Sessions are stored in the Node process — route by session header
# For Redis-backed sessions: remove sticky routing
# hash $cookie_mcp_session consistent;
}
server {
listen 443 ssl http2;
server_name mcp.example.com;
ssl_certificate /etc/ssl/certs/mcp.crt;
ssl_certificate_key /etc/ssl/private/mcp.key;
ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers HIGH:!aNULL:!MD5;
# Increase client body size for batched requests
client_max_body_size 1m;
location /mcp {
proxy_pass http://mcp_backend;
# Required: disable buffering for SSE streaming responses
# (POST responses that are JSON can be buffered; nginx detects
# Content-Type: text/event-stream and disables buffering automatically
# in nginx 1.21+. For older nginx, set this globally for /mcp.)
proxy_buffering off;
proxy_cache off;
# Extended timeout for long-running streaming tools
proxy_read_timeout 3600s;
proxy_send_timeout 3600s;
# Standard proxy headers
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Pass through the MCP session header unmodified
proxy_pass_header Mcp-Session-Id;
# Pass through Last-Event-ID for resumability
proxy_pass_header Last-Event-ID;
# CORS — required for browser-based MCP clients
add_header Access-Control-Allow-Origin $http_origin always;
add_header Access-Control-Allow-Methods "GET, POST, DELETE, OPTIONS" always;
add_header Access-Control-Allow-Headers "Authorization, Content-Type, Mcp-Session-Id, Last-Event-ID" always;
add_header Access-Control-Expose-Headers "Mcp-Session-Id" always;
add_header Access-Control-Max-Age 86400 always;
add_header Vary "Origin" always;
if ($request_method = OPTIONS) {
return 204;
}
}
# Health check endpoint (no auth required)
location /health {
proxy_pass http://mcp_backend;
access_log off;
}
}nginx-streamable-http.conf
For AWS Application Load Balancer, Streamable HTTP works without the WebSocket upgrade requirement. Key settings: set the target group's deregistration delay to at least 60 seconds so in-flight streaming responses can complete before instance shutdown. Set the idle timeout to 3600 seconds on the ALB listener to support long-running tool calls. Enable sticky sessions on the target group if you're using in-memory session storage — use duration-based stickiness with a 24-hour duration matching your maximum session TTL.
Cloudflare Workers can proxy Streamable HTTP requests, but with one important caveat: Workers have a 30-second CPU time limit per request. Synchronous JSON responses work perfectly. For long SSE streams, Cloudflare's streaming response support (available in Workers) handles the SSE format, but you'll hit the 30-second wall for very long-running tools. Solutions: move long-running tools to Durable Objects, add progress checkpointing so clients can resume, or use Cloudflare's queue system to decouple the request from the actual processing.
/mcp responses. POST responses are inherently non-cacheable by spec, but CDNs sometimes cache GET responses aggressively. Add Cache-Control: no-store, no-cache to all MCP responses. If using Cloudflare, set the Cache Rule to "Bypass" for the /mcp path. Cached responses will appear as instant successes to clients while actually serving stale data from a different session.beforeAll hook and tear it down after the suite. This gives you true integration coverage of the transport layer.The test suite below uses Vitest with a real HTTP server on a high port. Each test case gets a fresh client connection. This tests the full transport stack: HTTP routing, session ID negotiation, SSE framing, reconnection with Last-Event-ID, progress notification delivery, and concurrent request handling. Write these tests before building your server — they serve as a living specification of the transport contract.
// streamable-http.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import http from 'http';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';
// startTestServer creates a real Express+MCP server on the given port
// Returns the http.Server instance for cleanup
import { startTestServer } from './helpers/test-server.js';
describe('Streamable HTTP Transport', () => {
let server: http.Server;
let client: Client;
const PORT = 3099;
beforeAll(async () => {
server = await startTestServer(PORT);
const transport = new StreamableHTTPClientTransport(
new URL(`http://localhost:${PORT}/mcp`)
);
client = new Client({ name: 'test-client', version: '1.0.0' });
await client.connect(transport);
});
afterAll(async () => {
await client.close();
await new Promise<void>(r => server.close(() => r()));
});
it('completes initialize handshake and returns server info', async () => {
const info = client.getServerVersion();
expect(info?.name).toBe('streamable-demo');
expect(info?.version).toBe('1.0.0');
});
it('lists available tools', async () => {
const { tools } = await client.listTools();
const names = tools.map(t => t.name);
expect(names).toContain('long_running_task');
expect(names).toContain('analyze_repository');
});
it('streams progress notifications during tool call', async () => {
const progressEvents: Array<{ progress: number; total: number }> = [];
client.setNotificationHandler(
{ method: 'notifications/progress' },
(n) => {
const { progress, total } = n.params as any;
progressEvents.push({ progress, total });
}
);
const result = await client.callTool({
name: 'long_running_task',
arguments: { duration: 200 },
_meta: { progressToken: 'test-progress-1' },
});
expect(result.isError).toBeFalsy();
expect(progressEvents.length).toBeGreaterThanOrEqual(3);
// Progress must be monotonically increasing
for (let i = 1; i < progressEvents.length; i++) {
expect(progressEvents[i].progress).toBeGreaterThanOrEqual(
progressEvents[i - 1].progress
);
}
// Final event must reach 100
const last = progressEvents[progressEvents.length - 1];
expect(last.progress).toBe(last.total);
});
it('resumes missed events after disconnect', async () => {
// Simulate a client that reconnects with a known Last-Event-ID
// The server should replay any events that occurred after that ID
const transport2 = new StreamableHTTPClientTransport(
new URL(`http://localhost:${PORT}/mcp`),
{
// Inject a fake Last-Event-ID header to simulate resumption
requestInit: {
headers: { 'X-Test-Last-Event-ID': 'evt-42' },
},
}
);
const client2 = new Client({ name: 'resume-test', version: '1.0.0' });
await client2.connect(transport2);
// The server should still have a valid session for this reconnect
const info = client2.getServerVersion();
expect(info?.name).toBe('streamable-demo');
await client2.close();
});
it('handles concurrent tool calls via batching', async () => {
// Concurrent Promise.all triggers SDK batching
const [r1, r2] = await Promise.all([
client.callTool({ name: 'long_running_task', arguments: { duration: 50 } }),
client.callTool({ name: 'long_running_task', arguments: { duration: 50 } }),
]);
expect(r1.isError).toBeFalsy();
expect(r2.isError).toBeFalsy();
});
it('returns 404 for unknown session IDs', async () => {
const response = await fetch(`http://localhost:${PORT}/mcp`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Mcp-Session-Id': 'nonexistent-session-id',
},
body: JSON.stringify({
jsonrpc: '2.0', id: 1,
method: 'tools/list', params: {},
}),
});
// Should return 400 Bad Request — session unknown
expect(response.status).toBe(400);
});
});streamable-http.test.ts
For testing resumability end-to-end, the most reliable approach is to force a transport-level disconnect by calling transport.close() mid-stream and then creating a new transport with the same session ID and a Last-Event-ID from the last received event. The test then verifies that the newly created transport receives events that were sent during the gap. This requires exposing the event ID from the transport, which the SDK does via the lastEventId property.
supertest is cleaner than spawning a full client: await request(app).post('/mcp').set('Mcp-Session-Id', id).send(rpc).expect(200). Reserve full SDK client tests for streaming scenarios where you actually need notification handlers and SSE parsing.randomUUID() for session IDs — never sequential. Sequential IDs (1, 2, 3…) are trivially enumerable. A compromised session can be used to hijack adjacent sessions. crypto.randomUUID() generates 122 bits of randomness — treat it as a password.Map from Section 3 is fine for a single-node server. In Kubernetes with multiple replicas, use Redis with serialized transport state so any replica can resume any session. The SDK's transport is serializable for exactly this purpose.ownerId field populated from the JWT sub claim on initialize. Validate this on every subsequent request, not just the first one.DELETE /mcp handler for graceful session termination. Without it, clients cannot signal clean shutdown — the only cleanup path is idle TTL expiry. Graceful cleanup speeds up resource recovery and gives clean audit logs (explicit close vs timeout).proxy_buffering off for all /mcp locations that can stream. A buffering proxy silently accumulates SSE events and delivers them in a burst rather than streaming. The client will appear to hang, then receive all events at once. Always set proxy_buffering off on the /mcp location block; the overhead for non-streaming JSON responses is negligible.SIGTERM: stop accepting new sessions immediately, wait for all open SSE streams to complete or timeout (30s max), then exit. This prevents clients from seeing mid-stream disconnects during deploys.Last-Event-ID replay before shipping to production. Resumability is only as good as your event store. Write a test that forces a disconnect at a known event ID, reconnects, and asserts that missed events are replayed in order with no gaps or duplicates. Run this test in CI — a broken event store silently degrades to "restart on disconnect" behavior without failing.initialize, tools/call, and resources/read?Last-Event-ID header on reconnect, and the server replays all missed events from that ID onwardGET /mcp/status to retrieve missed eventsContent-Type does it use?application/jsonapplication/octet-streamtext/event-streammultipart/form-datainitialize call?X-Session-TokenAuthorizationMcp-Session-IdX-Correlation-Id