Day 13 — Streamable HTTP Transport: Bidirectional Streaming & Resumability

Section 1

The Transport Evolution

MCP has gone through four distinct transport generations. Each iteration solved real pain points experienced in production deployments. Streamable HTTP is not merely an incremental improvement — it fundamentally rethinks how client and server communicate over HTTP.

The Model Context Protocol needs a transport layer to move JSON-RPC messages between client and server. The choice of transport determines latency, scalability, proxy compatibility, and resilience. The first generation used stdio — pipes between processes — which is simple and perfect for local tools but impossible to expose over a network. The second generation added HTTP+SSE: clients posted messages to one endpoint while a long-lived Server-Sent Events connection carried responses back. This enabled networked MCP servers but required two separate URL paths, caused significant headaches with reverse proxies (which buffer SSE by default), and offered zero resilience to disconnects. WebSocket support arrived as an unofficial pattern — it solved the two-endpoint problem but introduced stateful upgrade handshakes that break load balancers and serverless platforms entirely.

Streamable HTTP, introduced in MCP spec 2025-03-26, is the clean break the ecosystem needed. A single /mcp endpoint handles all traffic. The server decides response-by-response whether to reply with application/json (synchronous) or text/event-stream (streaming). Every SSE event carries an id: field, and clients resume from exactly where they left off after a disconnect. Proxies work out-of-the-box for synchronous calls; only streaming responses need proxy_buffering off. Serverless functions handle the common case of non-streaming calls without long-lived connections.

🔌

stdio

stdin/stdout

Bidirectional pipe. Simplest possible transport. Process-to-process only.

Best for: local CLI tools, Claude Desktop

📡

HTTP + SSE

POST + GET/sse

Two endpoints. POST sends, SSE streams back. Proxy buffering issues. No resumability.

Best for: legacy MCP 2024-11 servers

🔁

WebSocket

WS upgrade

Persistent full-duplex connection. Requires WS upgrade support in all proxies. Incompatible with serverless.

Best for: real-time apps with WS infra

🌊
Streamable HTTP
POST /mcp (+ GET + DELETE)
Single endpoint. JSON or SSE response. Event IDs for resumability. Serverless-friendly. MCP 2025-03.
Best for: all new MCP deployments

HTTP+SSE (legacy) vs Streamable HTTP (2025)

sequenceDiagram participant C as Client participant S as Server Note over C,S: HTTP+SSE (MCP 2024-11) — TWO endpoints C->>S: GET /sse (open SSE stream, keep-alive) S-->>C: event: endpoint ← POST URL C->>S: POST /message { jsonrpc, method } S-->>C: data: { result } (via SSE stream) C->>S: POST /message { another call } S-->>C: data: { result } (via same SSE stream) Note over C,S: Streamable HTTP (MCP 2025-03) — ONE endpoint C->>S: POST /mcp { initialize } S-->>C: 200 application/json { result, Mcp-Session-Id: abc } C->>S: POST /mcp { tools/call } Mcp-Session-Id: abc S-->>C: 200 text/event-stream id:1 data:{progress}... C->>S: GET /mcp Mcp-Session-Id: abc (subscribe) S-->>C: text/event-stream id:2 data:{notification}

💡

Why "Streamable"? Because the transport is optionally streaming — not always streaming. A simple tool call with a synchronous response returns application/json and closes immediately. Only when the server needs to push progress events or the client subscribes to notifications does it upgrade to text/event-stream. This makes it compatible with serverless functions for the common case.

Section 2

Protocol Fundamentals

Streamable HTTP is built on three HTTP methods against a single URL. Understanding what each method does, what response types are possible, and how sessions are tracked is the foundation for everything else in this chapter.

The entire MCP protocol flows through /mcp. There is no separate SSE endpoint like in the old transport. The server makes a runtime decision for every POST request: does this response require streaming? If yes, it opens a text/event-stream response and sends events. If no, it returns a plain application/json object and closes. This flexibility means your infrastructure only needs to handle long-lived connections for operations that actually need them.

POST /mcp — All client-to-server messages

Every MCP request (initialize, tools/list, tools/call, resources/read, etc.) uses POST. The body is a JSON-RPC 2.0 object or array. The server replies with Content-Type: application/json for synchronous responses, or opens Content-Type: text/event-stream for streaming. Session ID is tracked via the Mcp-Session-Id response header on initialize, then echoed back as a request header on all subsequent calls.

GET /mcp — Server-initiated notifications (subscriptions)

When a client wants to receive asynchronous server-push events — resource change notifications, log streams, heartbeats — it opens a GET /mcp request with Accept: text/event-stream and its session ID. The server responds with an SSE stream and pushes notifications/resources/updated, notifications/tools/list_changed, or custom events as they occur. This is the "subscription channel" for a session.

DELETE /mcp — Graceful session termination

Clients signal clean shutdown by sending DELETE /mcp with their session ID. The server closes open SSE connections, cleans up in-memory state, removes the session from the store, and returns 204 No Content. Without this, servers must rely on timeouts to detect abandoned sessions — always implement DELETE.

Event IDs and Last-Event-ID — Resumability

Every SSE event MUST include an id: field containing a UUID or monotonically increasing counter. When a connection drops, the client reconnects and sends the Last-Event-ID HTTP header with the last event ID it received. The server then replays all events that occurred after that ID, ensuring zero message loss across reconnects. This is the single biggest differentiator over HTTP+SSE.

Full Streamable HTTP session lifecycle

sequenceDiagram participant C as Client participant S as Server C->>S: POST /mcp { initialize, clientInfo } S-->>C: 200 JSON { result } Mcp-Session-Id: sess-abc123 C->>S: POST /mcp { tools/list } Mcp-Session-Id: sess-abc123 S-->>C: 200 JSON { tools: [...] } C->>S: POST /mcp { tools/call, _meta.progressToken } Mcp-Session-Id: sess-abc123 S-->>C: 200 text/event-stream S-->>C: id:evt-1 data:{ notifications/progress, progress:25 } S-->>C: id:evt-2 data:{ notifications/progress, progress:75 } S-->>C: id:evt-3 data:{ result: "Done!" } Note over C: Connection drops — reconnect C->>S: GET /mcp Mcp-Session-Id: sess-abc123 Last-Event-ID: evt-2 S-->>C: 200 text/event-stream (replays evt-3 onward) S-->>C: id:evt-3 data:{ result: "Done!" } C->>S: DELETE /mcp Mcp-Session-Id: sess-abc123 S-->>C: 204 No Content

⚠️

Session IDs are not auth tokens. The Mcp-Session-Id header is a correlation handle, not a bearer token. Always require a separate Authorization: Bearer … header validated on every request. Treat session IDs as opaque, random, unguessable identifiers — use randomUUID() never sequential integers.

Section 3

Server Implementation

A complete TypeScript Express server implementing the Streamable HTTP transport. This covers session management, routing all three HTTP methods, registering tools with progress support, and the session lifecycle from creation to cleanup.

The StreamableHTTPServerTransport class from the MCP SDK handles the SSE plumbing, event ID generation, and the content-type decision for you. Your job is to handle the routing logic: route to an existing session's transport if a session ID header is present, or create a new transport and MCP server for fresh initialize requests. The session store is simply a Map<string, StreamableHTTPServerTransport> — for production you'll replace this with Redis (see Section 12).

// streamable-http-server.ts
import express from 'express';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StreamableHTTPServerTransport } from '@modelcontextprotocol/sdk/server/streamableHttp.js';
import { isInitializeRequest } from '@modelcontextprotocol/sdk/types.js';
import { randomUUID } from 'crypto';
import { z } from 'zod';

const app = express();
app.use(express.json());

// In-memory session store: sessionId → transport
// Production: replace with Redis-backed store
const sessions = new Map<string, StreamableHTTPServerTransport>();

// Helper: create and wire up a fresh McpServer for each session
function createMcpServer(): McpServer {
  const server = new McpServer({
    name: 'streamable-demo',
    version: '1.0.0',
  });

  // Tool with progress streaming
  server.tool(
    'long_running_task',
    { duration: z.number().describe('Total duration in ms') },
    async ({ duration }, { meta }) => {
      const progressToken = meta?.progressToken;

      for (let i = 0; i <= 100; i += 10) {
        if (progressToken !== undefined) {
          await server.server.sendProgress({
            progressToken,
            progress: i,
            total: 100,
          });
        }
        await new Promise(r => setTimeout(r, duration / 10));
      }

      return {
        content: [{ type: 'text', text: 'Task complete!' }],
      };
    }
  );

  return server;
}

// ── POST /mcp — all client-to-server messages ──────────────────────────────
app.post('/mcp', async (req, res) => {
  const sessionId = req.headers['mcp-session-id'] as string | undefined;

  let transport: StreamableHTTPServerTransport;

  if (sessionId && sessions.has(sessionId)) {
    // Existing session — reuse transport
    transport = sessions.get(sessionId)!;

  } else if (!sessionId && isInitializeRequest(req.body)) {
    // New session — create transport + server pair
    const newSessionId = randomUUID();

    transport = new StreamableHTTPServerTransport({
      sessionIdGenerator: () => newSessionId,
      onsessioninitialized: (id) => {
        sessions.set(id, transport);
        console.log(`Session created: ${id}`);
      },
    });

    const server = createMcpServer();
    await server.connect(transport);

  } else {
    // No session ID and not an initialize request — reject
    res.status(400).json({ error: 'Missing Mcp-Session-Id or not an initialize request' });
    return;
  }

  // Delegate to the transport — it handles content-type negotiation,
  // SSE framing, event IDs, and JSON-RPC response formatting
  await transport.handleRequest(req, res, req.body);
});

// ── GET /mcp — server-initiated notifications (subscriptions) ─────────────
app.get('/mcp', async (req, res) => {
  const sessionId = req.headers['mcp-session-id'] as string | undefined;
  if (!sessionId) { res.status(400).json({ error: 'Mcp-Session-Id required' }); return; }

  const transport = sessions.get(sessionId);
  if (!transport) { res.status(404).json({ error: 'Session not found' }); return; }

  // Opens an SSE stream for server → client notifications
  await transport.handleRequest(req, res);
});

// ── DELETE /mcp — graceful session termination ────────────────────────────
app.delete('/mcp', async (req, res) => {
  const sessionId = req.headers['mcp-session-id'] as string | undefined;
  if (!sessionId) { res.status(400).json({ error: 'Mcp-Session-Id required' }); return; }

  const transport = sessions.get(sessionId);
  if (transport) {
    await transport.close();
    sessions.delete(sessionId);
    console.log(`Session closed: ${sessionId}`);
  }

  res.status(204).end();
});

app.listen(3000, () => console.log('MCP server listening on :3000'));streamable-http-server.ts

The key design decision is the routing logic inside POST /mcp. When a request arrives without a session ID and the body is an initialize message, a brand new StreamableHTTPServerTransport is instantiated and a fresh McpServer is connected to it. The SDK calls onsessioninitialized after the handshake completes, at which point you store the transport by session ID. All subsequent requests for that session route directly to the stored transport via transport.handleRequest().

🔒

One McpServer per session, or one shared server? The pattern above creates one McpServer instance per session. This is the safest approach — session-level state (like authenticated user context) stays isolated. You can share a single McpServer across sessions if your tools are truly stateless, but you must be careful not to leak state between sessions.

Section 4

Client Implementation

The MCP SDK ships a StreamableHTTPClientTransport that handles session ID negotiation, automatic reconnection with Last-Event-ID, and the GET subscription channel. From the application layer, using it looks nearly identical to any other transport.

The client transport constructor takes the server URL and an optional options object. The most important option is requestInit — an object passed to every fetch call — which is where you inject authentication headers. After calling client.connect(transport), the SDK sends the initialize handshake via POST, stores the returned session ID, and attaches it to all future requests automatically. You never handle session IDs manually in client code.

// streamable-http-client.ts
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';

// Create transport pointing at the server URL
const transport = new StreamableHTTPClientTransport(
  new URL('http://localhost:3000/mcp'),
  {
    // Merged into every fetch() call — use for auth headers, custom headers, etc.
    requestInit: {
      headers: {
        'Authorization': `Bearer ${process.env.API_TOKEN}`,
      },
    },
  }
);

const client = new Client({
  name: 'my-client',
  version: '1.0.0',
});

// connect() sends initialize, stores session ID automatically
await client.connect(transport);

// List available tools
const { tools } = await client.listTools();
console.log('Tools:', tools.map(t => t.name));

// Call a tool that streams progress (SDK receives notifications automatically)
const result = await client.callTool({
  name: 'long_running_task',
  arguments: { duration: 5000 },
  _meta: { progressToken: 'progress-1' },
});
console.log('Result:', result.content[0]);

// Register a progress notification handler
client.setNotificationHandler(
  { method: 'notifications/progress' },
  (notification) => {
    const { progress, total } = notification.params as any;
    process.stdout.write(`\rProgress: ${progress}/${total}`);
  }
);

// Subscribe to resource updates — opens the GET /mcp SSE channel
await client.subscribeResource({ uri: 'metrics://cpu' });

// Handle resource update notifications
client.setNotificationHandler(
  { method: 'notifications/resources/updated' },
  (notification) => {
    console.log('Resource updated:', (notification.params as any).uri);
  }
);

// Graceful shutdown — sends DELETE /mcp
await client.close();streamable-http-client.ts

The requestInit option is passed directly to the Fetch API, so any valid RequestInit property works: custom headers, credentials mode, referrer policy, timeouts via AbortSignal. For mTLS client certificate authentication in Node.js, you'd pass a custom dispatcher via undici rather than requestInit.

🔑

Session ID Auto-Management

The client extracts Mcp-Session-Id from the initialize response and silently attaches it to every subsequent POST, GET, and DELETE. Zero configuration required in application code.

🔌

Reconnection with Resume

If the SSE stream drops (network blip, proxy timeout), the transport reconnects automatically and sends Last-Event-ID to resume from the last received event ID.

📬

Notification Routing

Notifications arriving on the GET SSE channel are dispatched to registered handlers via setNotificationHandler. Progress events from streaming tool calls arrive on the POST response stream.

🛑

Graceful Close

client.close() sends DELETE /mcp, closes the GET SSE connection, and releases all local state. Always call it to avoid orphaned server sessions.

Section 5

Resumability & Event IDs

Resumability is Streamable HTTP's most important differentiator. When a long-running tool call or subscription stream is interrupted by a network failure, the client reconnects and picks up exactly where it left off — no events lost, no need to restart the operation.

The mechanism is straightforward but requires discipline: every SSE event the server sends MUST include an id: field. This can be a UUID, a UUID prefixed with a sequence number, or simply a monotonically increasing integer per session. When the client reconnects, browsers and the Fetch API automatically include a Last-Event-ID HTTP header set to the last ID the client received. The server checks this header and replays all events that occurred after that ID.

The SDK handles event ID generation and the Last-Event-ID header on the client side automatically. What you need to provide on the server side is an event store — a per-session buffer of recent events that can be queried by "give me everything after event ID X".

// event-store.ts
interface StoredEvent {
  id: string;
  data: string;
  timestamp: number;
}

export class ResumableEventStore {
  // Map from sessionId → ordered list of events
  private events = new Map<string, StoredEvent[]>();
  private TTL = 5 * 60 * 1000; // Keep events for 5 minutes

  store(sessionId: string, eventId: string, data: string): void {
    const list = this.events.get(sessionId) ?? [];
    list.push({ id: eventId, data, timestamp: Date.now() });

    // Prune events older than TTL to bound memory usage
    const cutoff = Date.now() - this.TTL;
    this.events.set(
      sessionId,
      list.filter(e => e.timestamp > cutoff)
    );
  }

  getEventsAfter(sessionId: string, lastEventId: string): StoredEvent[] {
    const list = this.events.get(sessionId) ?? [];
    const idx = list.findIndex(e => e.id === lastEventId);

    // If lastEventId not found (too old, pruned), return empty array
    // Server should respond 400 in this case — client must restart
    if (idx === -1) return [];

    return list.slice(idx + 1);
  }

  clearSession(sessionId: string): void {
    this.events.delete(sessionId);
  }
}

// Integration into the transport layer:
// StreamableHTTPServerTransport accepts an eventStore option
const store = new ResumableEventStore();

transport = new StreamableHTTPServerTransport({
  sessionIdGenerator: () => randomUUID(),
  onsessioninitialized: (id) => sessions.set(id, transport),
  eventStore: store,  // SDK calls store.store() on every sent event
                      // and store.getEventsAfter() on reconnect
});event-store.ts

The event store TTL policy is a tradeoff between memory and resilience. A 5-minute TTL means clients can survive a 5-minute network outage and resume seamlessly. Beyond that, events are pruned and the client must restart the operation. For truly critical operations you can persist events to Redis with a longer TTL, accepting higher storage costs. The ResumableEventStore interface is designed so you can swap in any backend.

Normal flow: event IDs are attached to every event

Server sends id: evt-42\ndata: {...}\n\n. Client receives it. Last-known ID = evt-42. Client processes the event normally.

Disconnect: network drops between evt-42 and evt-45

Events evt-43, evt-44, evt-45 are stored in the server's event store but never delivered to the client. The SSE stream closes abruptly.

Reconnect: client sends Last-Event-ID: evt-42

The SDK opens a new GET or POST connection. The browser's SSE API and the Fetch-based transport both automatically include Last-Event-ID: evt-42 in the request headers.

Server replays: evt-43, evt-44, evt-45 are resent

The server calls store.getEventsAfter(sessionId, 'evt-42') and writes the returned events to the new response stream before resuming live events. Client receives all missed events in order.

⚠️

Return 400 when Last-Event-ID is too old. If getEventsAfter returns an empty array because the event was pruned (TTL expired), respond with HTTP 400 instead of silently starting a new stream from the current position. This signals to the client that it cannot resume and must restart the operation from scratch, preventing silent data gaps.

Section 6

Request Batching

Streamable HTTP inherits JSON-RPC 2.0's batching capability: you can send an array of requests in a single POST body. The server processes them concurrently and returns results either as a JSON array or as individual SSE events on a single stream — eliminating the N+1 request problem.

Batching is most valuable when a single AI operation needs data from several independent sources simultaneously. Instead of three sequential round-trips — each with full HTTP overhead plus potential SSE stream setup — a single POST delivers all three requests, the server fans them out internally, and results arrive in one response. Measured end-to-end, this typically cuts latency by 40–60% for multi-source queries.

The server decides whether to respond with a JSON array or an SSE stream based on whether any of the batched requests requires streaming. If all requests are synchronous (no progress events, no streaming tools), the response is a plain application/json array. If at least one request opens a stream, the entire batch response becomes text/event-stream and each result arrives as a separate SSE event tagged with the corresponding JSON-RPC id.

// batch-client.ts — low-level batch via raw fetch
// (The SDK handles this internally for concurrent callTool() calls)

const sessionId = transport.sessionId; // exposed by SDK after connect()

const batch = [
  {
    jsonrpc: '2.0', id: 1,
    method: 'tools/call',
    params: { name: 'get_user', arguments: { id: '123' } }
  },
  {
    jsonrpc: '2.0', id: 2,
    method: 'tools/call',
    params: { name: 'get_orders', arguments: { userId: '123' } }
  },
  {
    jsonrpc: '2.0', id: 3,
    method: 'resources/read',
    params: { uri: 'config://app' }
  }
];

const response = await fetch('http://localhost:3000/mcp', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Mcp-Session-Id': sessionId,
    'Authorization': `Bearer ${token}`,
    'Accept': 'application/json, text/event-stream',
  },
  body: JSON.stringify(batch),
});

const contentType = response.headers.get('content-type') ?? '';

if (contentType.includes('text/event-stream')) {
  // At least one request is streaming — parse individual SSE events
  // Each event carries one JSON-RPC response with the original id field
  for await (const event of parseSSEStream(response.body!)) {
    const rpc = JSON.parse(event.data);
    console.log(`Response for request #${rpc.id}:`, rpc.result ?? rpc.error);
  }
} else {
  // All synchronous — plain JSON array response
  const results = await response.json() as Array<{ id: number; result?: any; error?: any }>;
  for (const r of results) {
    console.log(`Response for request #${r.id}:`, r.result ?? r.error);
  }
}

// Helper to async-iterate an SSE ReadableStream
async function* parseSSEStream(body: ReadableStream<Uint8Array>) {
  const reader = body.pipeThrough(new TextDecoderStream()).getReader();
  let buffer = '';
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    buffer += value;
    const parts = buffer.split('\n\n');
    buffer = parts.pop() ?? '';
    for (const part of parts) {
      const dataLine = part.split('\n').find(l => l.startsWith('data:'));
      if (dataLine) yield { data: dataLine.slice(5).trim() };
    }
  }
}batch-client.ts

💡

When does the SDK automatically batch? When you call multiple client.callTool() or client.readResource() concurrently with Promise.all(), the SDK buffers the outgoing JSON-RPC objects within the same event-loop tick and sends them as a single batch POST. You get batching for free — no special API needed. Single sequential awaits will still send separate requests.

⚠️

Batch size limits. Very large batches (50+ requests) can cause the server to hold a response open for a long time while it fans out internally. Implement a max-batch-size on the server (reject arrays longer than 20–25 items with 400) and consider whether a batch that large would be better served by parallel connections or a dedicated aggregation tool.

Section 7

Progress Notifications & Streaming Tools

Progress notifications are how long-running tools keep the user informed. The client sends a progressToken with the tool call; the server sends back notifications/progress events in real time. Streamable HTTP makes this elegant: progress events flow on the same POST response SSE stream as the final result.

Under HTTP+SSE, progress notifications had to flow on the pre-established SSE connection (the GET /sse channel), while the tool call result came back as a POST response. This created a complex routing problem: the server had to correlate the POST response with the SSE session and inject progress events there. With Streamable HTTP, a single POST to /mcp opens a streaming response and both progress events and the final result flow on that same stream — the routing problem dissolves entirely.

// progress-tool.ts — full progress-aware tool implementation
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

export function registerAnalyzeTool(server: McpServer) {
  server.tool(
    'analyze_repository',
    {
      repo: z.string().describe('GitHub repository URL or path'),
      deep: z.boolean().default(false).describe('Enable deep AST analysis'),
    },
    async ({ repo, deep }, { meta }) => {
      const progressToken = meta?.progressToken;

      // Helper to send progress if client requested it
      const report = async (progress: number, message: string) => {
        if (progressToken !== undefined) {
          await server.server.sendProgress({
            progressToken,
            progress,
            total: 100,
            message,
          });
        }
      };

      const phases = [
        { pct: 10, msg: 'Cloning repository...' },
        { pct: 30, msg: 'Parsing source files...' },
        { pct: 60, msg: 'Running static analysis...' },
        { pct: 85, msg: 'Resolving dependencies...' },
        { pct: 95, msg: 'Generating report...' },
      ];

      for (const phase of phases) {
        await report(phase.pct, phase.msg);
        await doPhase(phase.msg, repo, deep); // your actual work here
      }

      await report(100, 'Complete!');

      return {
        content: [
          {
            type: 'text',
            text: `Analysis of ${repo} complete.\n` +
                  `Found 42 files, 3 issues, 0 critical vulnerabilities.`,
          },
        ],
      };
    }
  );
}

// Client side — register progress handler before calling the tool
client.setNotificationHandler(
  { method: 'notifications/progress' },
  (notification) => {
    const params = notification.params as {
      progressToken: string;
      progress: number;
      total: number;
      message?: string;
    };

    const pct = Math.round((params.progress / params.total) * 100);
    const bar = '█'.repeat(Math.floor(pct / 5)) + '░'.repeat(20 - Math.floor(pct / 5));
    process.stdout.write(`\r[${bar}] ${pct}% ${params.message ?? ''}`);

    if (params.progress === params.total) {
      process.stdout.write('\n');
    }
  }
);

// Call with progress token — SDK sends _meta.progressToken automatically
const result = await client.callTool({
  name: 'analyze_repository',
  arguments: { repo: 'https://github.com/myorg/myapp', deep: true },
  _meta: { progressToken: 'analyze-run-' + Date.now() },
});progress-tool.ts

The progressToken is opaque from the server's perspective — it echoes it back verbatim in every notifications/progress event. The client uses it to correlate progress events with the specific tool call that triggered them, which matters when multiple tool calls are running concurrently. Always check if (progressToken !== undefined) before sending progress — some clients omit the token when they don't want progress updates, and calling sendProgress without a token will throw.

🌊

Streaming text output is different from progress notifications. Progress notifications report how far along an operation is. If you want to stream the actual output of a tool incrementally (like a live log tail), use a tool that sends multiple notifications/message events and a final empty result — or consider exposing the live stream as a subscribable resource instead of a tool.

Section 8

Authentication Patterns

Streamable HTTP authentication is cleaner than HTTP+SSE because there is only one URL to protect. Every request — POST, GET, and DELETE — passes through the same Express middleware chain. JWT validation, session ownership binding, and CORS all live in a single middleware function.

The standard pattern is Bearer token authentication validated on every request. On the initialize POST, the server binds the authenticated user's ID to the newly created session. On all subsequent requests, the middleware verifies not just that the token is valid, but that the token's subject claim matches the session's owner. This prevents session hijacking: even if an attacker learns a valid session ID, they cannot use it without a matching auth token.

// auth-middleware.ts
import { Request, Response, NextFunction } from 'express';
import { validateJWT, JWTClaims } from './jwt-validator.js';

// Extended session store includes owner binding
const sessions = new Map<string, {
  transport: StreamableHTTPServerTransport;
  ownerId: string;
  createdAt: number;
  lastActivityAt: number;
}>();

export async function mcpAuthMiddleware(
  req: Request,
  res: Response,
  next: NextFunction
): Promise<void> {

  // Always allow CORS preflight through
  if (req.method === 'OPTIONS') { next(); return; }

  const authHeader = req.headers.authorization;
  if (!authHeader?.startsWith('Bearer ')) {
    res.status(401).json({
      error: 'unauthorized',
      message: 'Missing or malformed Authorization header',
    });
    return;
  }

  let claims: JWTClaims;
  try {
    claims = await validateJWT(authHeader.slice(7));
  } catch {
    res.status(401).json({ error: 'unauthorized', message: 'Invalid token' });
    return;
  }

  // For session-scoped requests, verify token matches session owner
  const sessionId = req.headers['mcp-session-id'] as string | undefined;
  if (sessionId) {
    const session = sessions.get(sessionId);
    if (!session) {
      res.status(404).json({ error: 'session_not_found' });
      return;
    }
    if (session.ownerId !== claims.sub) {
      res.status(403).json({
        error: 'forbidden',
        message: 'Session belongs to a different user',
      });
      return;
    }
    // Update activity timestamp
    session.lastActivityAt = Date.now();
  }

  // Attach claims to request for downstream handlers
  (req as any).user = claims;
  next();
}

// CORS configuration for browser clients
app.use('/mcp', (req, res, next) => {
  const origin = req.headers.origin;
  if (origin) {
    res.setHeader('Access-Control-Allow-Origin', origin);
    res.setHeader('Access-Control-Allow-Methods', 'GET, POST, DELETE, OPTIONS');
    res.setHeader(
      'Access-Control-Allow-Headers',
      'Authorization, Content-Type, Mcp-Session-Id, Last-Event-ID'
    );
    res.setHeader('Access-Control-Max-Age', '86400');
    res.setHeader('Vary', 'Origin');
  }
  if (req.method === 'OPTIONS') { res.status(204).end(); return; }
  next();
});

// Session idle timeout cleanup (run every 5 minutes)
setInterval(() => {
  const IDLE_TTL = 30 * 60 * 1000; // 30 minutes
  const MAX_TTL = 24 * 60 * 60 * 1000; // 24 hours
  const now = Date.now();

  for (const [id, session] of sessions.entries()) {
    const idle = now - session.lastActivityAt > IDLE_TTL;
    const expired = now - session.createdAt > MAX_TTL;
    if (idle || expired) {
      session.transport.close().catch(() => {});
      sessions.delete(id);
      console.log(`Session ${id} expired (idle=${idle}, maxTTL=${expired})`);
    }
  }
}, 5 * 60 * 1000);

// Apply middleware before all MCP handlers
app.use('/mcp', mcpAuthMiddleware);
app.post('/mcp', mcpPostHandler);
app.get('/mcp', mcpGetHandler);
app.delete('/mcp', mcpDeleteHandler);auth-middleware.ts

🔐

Rate limiting per session, not per IP. For MCP servers, rate limiting by IP is the wrong granularity — a single corporate NAT might have many legitimate users. Implement rate limiting keyed by session ID (or by the authenticated user's sub claim). Allow a higher burst for initial tool discovery calls and a lower steady-state rate for long-running operations. express-rate-limit with a custom key generator handles this cleanly.

Section 9

Comparison Table & Migration Guide

Before migrating, understand exactly what you gain and what changes. The comparison table shows feature-by-feature differences across all four transports. The migration guide is a step-by-step path from HTTP+SSE to Streamable HTTP.

Feature	stdio	HTTP + SSE	WebSocket	Streamable HTTP ✨
Endpoints	N/A	2 (GET /sse + POST /message)	1 (WS upgrade)	1–3 (POST + GET + DELETE /mcp)
Resumability	❌	❌	❌	✅ via event IDs
Proxy friendly	❌	⚠️ needs buffering off	⚠️ needs WS upgrade	✅ works by default
Browser support	❌	✅	✅	✅
Batching	❌	❌	✅	✅
Progress streaming	N/A	⚠️ complex routing	✅	✅ elegant
Serverless	❌	⚠️	❌	✅ non-streaming calls
MCP spec version	2024-11	2024-11	—	2025-03

The migration from HTTP+SSE to Streamable HTTP is surgical — the surface area that changes is small, and the MCP SDK makes it mechanical. If your server and client both use the official SDK, the migration can be completed in under an hour for a typical server.

01 →

Replace the transport class. On the server, swap SSEServerTransport → StreamableHTTPServerTransport. On the client, swap SSEClientTransport (or custom fetch logic) → StreamableHTTPClientTransport. The constructor signatures differ — see Section 3 and 4 for exact options.

02 →

Merge two Express routes into one POST handler. Delete the old GET /sse route and the old POST /message route. Replace with a single POST /mcp handler with the session-routing logic from Section 3.

03 →

Add the GET /mcp subscription handler. This replaces the old persistent SSE connection. It's a new route, not a rename — clients now explicitly open a GET request when they want to subscribe, rather than the server pushing over a pre-established SSE connection.

04 →

Add the DELETE /mcp cleanup handler. This is new and has no HTTP+SSE equivalent. Clients will send it on clean shutdown; without it, sessions can only be cleaned up via idle TTL.

05 →

Update Nginx/proxy configuration. Remove the blanket proxy_buffering off from the message endpoint — buffering only needs to be disabled for responses that are actually streams. Keep proxy_buffering off on GET /mcp and conditionally on POST /mcp when the response is SSE. See Section 10 for full nginx config.

06 →

Implement the event store for resumability. This is optional but highly recommended for any streaming tools. Without it, disconnected clients must restart operations from scratch. The ResumableEventStore from Section 5 is a drop-in starting point.

✅

Backward compatibility note. The MCP SDK supports both HTTP+SSE and Streamable HTTP simultaneously. You can run both transports on different paths (/sse and /mcp) during a migration window, allowing old and new clients to coexist. Deprecate and remove the old SSE endpoints once all clients have migrated.

Section 10

Nginx & Deployment Configuration

Deploying Streamable HTTP in production requires correctly configured reverse proxies, TLS termination, and appropriate timeouts. Unlike HTTP+SSE, most of the configuration only applies to SSE-mode responses — synchronous JSON responses work with zero special proxy config.

# nginx-streamable-http.conf

upstream mcp_backend {
    server 127.0.0.1:3000;

    # With in-memory session store: use sticky sessions
    # Sessions are stored in the Node process — route by session header
    # For Redis-backed sessions: remove sticky routing
    # hash $cookie_mcp_session consistent;
}

server {
    listen 443 ssl http2;
    server_name mcp.example.com;

    ssl_certificate     /etc/ssl/certs/mcp.crt;
    ssl_certificate_key /etc/ssl/private/mcp.key;
    ssl_protocols       TLSv1.2 TLSv1.3;
    ssl_ciphers         HIGH:!aNULL:!MD5;

    # Increase client body size for batched requests
    client_max_body_size 1m;

    location /mcp {
        proxy_pass http://mcp_backend;

        # Required: disable buffering for SSE streaming responses
        # (POST responses that are JSON can be buffered; nginx detects
        #  Content-Type: text/event-stream and disables buffering automatically
        #  in nginx 1.21+. For older nginx, set this globally for /mcp.)
        proxy_buffering off;
        proxy_cache off;

        # Extended timeout for long-running streaming tools
        proxy_read_timeout 3600s;
        proxy_send_timeout 3600s;

        # Standard proxy headers
        proxy_set_header Host              $host;
        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;

        # Pass through the MCP session header unmodified
        proxy_pass_header Mcp-Session-Id;

        # Pass through Last-Event-ID for resumability
        proxy_pass_header Last-Event-ID;

        # CORS — required for browser-based MCP clients
        add_header Access-Control-Allow-Origin  $http_origin always;
        add_header Access-Control-Allow-Methods "GET, POST, DELETE, OPTIONS" always;
        add_header Access-Control-Allow-Headers "Authorization, Content-Type, Mcp-Session-Id, Last-Event-ID" always;
        add_header Access-Control-Expose-Headers "Mcp-Session-Id" always;
        add_header Access-Control-Max-Age       86400 always;
        add_header Vary                         "Origin" always;

        if ($request_method = OPTIONS) {
            return 204;
        }
    }

    # Health check endpoint (no auth required)
    location /health {
        proxy_pass http://mcp_backend;
        access_log off;
    }
}nginx-streamable-http.conf

For AWS Application Load Balancer, Streamable HTTP works without the WebSocket upgrade requirement. Key settings: set the target group's deregistration delay to at least 60 seconds so in-flight streaming responses can complete before instance shutdown. Set the idle timeout to 3600 seconds on the ALB listener to support long-running tool calls. Enable sticky sessions on the target group if you're using in-memory session storage — use duration-based stickiness with a 24-hour duration matching your maximum session TTL.

Cloudflare Workers can proxy Streamable HTTP requests, but with one important caveat: Workers have a 30-second CPU time limit per request. Synchronous JSON responses work perfectly. For long SSE streams, Cloudflare's streaming response support (available in Workers) handles the SSE format, but you'll hit the 30-second wall for very long-running tools. Solutions: move long-running tools to Durable Objects, add progress checkpointing so clients can resume, or use Cloudflare's queue system to decouple the request from the actual processing.

⚠️

CDN caching and MCP don't mix. Never let a CDN cache /mcp responses. POST responses are inherently non-cacheable by spec, but CDNs sometimes cache GET responses aggressively. Add Cache-Control: no-store, no-cache to all MCP responses. If using Cloudflare, set the Cache Rule to "Bypass" for the /mcp path. Cached responses will appear as instant successes to clients while actually serving stale data from a different session.

Section 11

Testing Streamable HTTP

Testing Streamable HTTP transports requires a real HTTP server — you can't use in-process transport pairs like you would for unit tests. The good news is that the SDK makes it easy to spin up a real server in a beforeAll hook and tear it down after the suite. This gives you true integration coverage of the transport layer.

The test suite below uses Vitest with a real HTTP server on a high port. Each test case gets a fresh client connection. This tests the full transport stack: HTTP routing, session ID negotiation, SSE framing, reconnection with Last-Event-ID, progress notification delivery, and concurrent request handling. Write these tests before building your server — they serve as a living specification of the transport contract.

// streamable-http.test.ts
import { describe, it, expect, beforeAll, afterAll } from 'vitest';
import http from 'http';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js';

// startTestServer creates a real Express+MCP server on the given port
// Returns the http.Server instance for cleanup
import { startTestServer } from './helpers/test-server.js';

describe('Streamable HTTP Transport', () => {
  let server: http.Server;
  let client: Client;
  const PORT = 3099;

  beforeAll(async () => {
    server = await startTestServer(PORT);

    const transport = new StreamableHTTPClientTransport(
      new URL(`http://localhost:${PORT}/mcp`)
    );
    client = new Client({ name: 'test-client', version: '1.0.0' });
    await client.connect(transport);
  });

  afterAll(async () => {
    await client.close();
    await new Promise<void>(r => server.close(() => r()));
  });

  it('completes initialize handshake and returns server info', async () => {
    const info = client.getServerVersion();
    expect(info?.name).toBe('streamable-demo');
    expect(info?.version).toBe('1.0.0');
  });

  it('lists available tools', async () => {
    const { tools } = await client.listTools();
    const names = tools.map(t => t.name);
    expect(names).toContain('long_running_task');
    expect(names).toContain('analyze_repository');
  });

  it('streams progress notifications during tool call', async () => {
    const progressEvents: Array<{ progress: number; total: number }> = [];

    client.setNotificationHandler(
      { method: 'notifications/progress' },
      (n) => {
        const { progress, total } = n.params as any;
        progressEvents.push({ progress, total });
      }
    );

    const result = await client.callTool({
      name: 'long_running_task',
      arguments: { duration: 200 },
      _meta: { progressToken: 'test-progress-1' },
    });

    expect(result.isError).toBeFalsy();
    expect(progressEvents.length).toBeGreaterThanOrEqual(3);

    // Progress must be monotonically increasing
    for (let i = 1; i < progressEvents.length; i++) {
      expect(progressEvents[i].progress).toBeGreaterThanOrEqual(
        progressEvents[i - 1].progress
      );
    }

    // Final event must reach 100
    const last = progressEvents[progressEvents.length - 1];
    expect(last.progress).toBe(last.total);
  });

  it('resumes missed events after disconnect', async () => {
    // Simulate a client that reconnects with a known Last-Event-ID
    // The server should replay any events that occurred after that ID
    const transport2 = new StreamableHTTPClientTransport(
      new URL(`http://localhost:${PORT}/mcp`),
      {
        // Inject a fake Last-Event-ID header to simulate resumption
        requestInit: {
          headers: { 'X-Test-Last-Event-ID': 'evt-42' },
        },
      }
    );
    const client2 = new Client({ name: 'resume-test', version: '1.0.0' });
    await client2.connect(transport2);

    // The server should still have a valid session for this reconnect
    const info = client2.getServerVersion();
    expect(info?.name).toBe('streamable-demo');

    await client2.close();
  });

  it('handles concurrent tool calls via batching', async () => {
    // Concurrent Promise.all triggers SDK batching
    const [r1, r2] = await Promise.all([
      client.callTool({ name: 'long_running_task', arguments: { duration: 50 } }),
      client.callTool({ name: 'long_running_task', arguments: { duration: 50 } }),
    ]);

    expect(r1.isError).toBeFalsy();
    expect(r2.isError).toBeFalsy();
  });

  it('returns 404 for unknown session IDs', async () => {
    const response = await fetch(`http://localhost:${PORT}/mcp`, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        'Mcp-Session-Id': 'nonexistent-session-id',
      },
      body: JSON.stringify({
        jsonrpc: '2.0', id: 1,
        method: 'tools/list', params: {},
      }),
    });

    // Should return 400 Bad Request — session unknown
    expect(response.status).toBe(400);
  });
});streamable-http.test.ts

For testing resumability end-to-end, the most reliable approach is to force a transport-level disconnect by calling transport.close() mid-stream and then creating a new transport with the same session ID and a Last-Event-ID from the last received event. The test then verifies that the newly created transport receives events that were sent during the gap. This requires exposing the event ID from the transport, which the SDK does via the lastEventId property.

🧪

Use supertest for synchronous response testing. For the simple JSON-response path (non-streaming tool calls, listTools, listResources), supertest is cleaner than spawning a full client: await request(app).post('/mcp').set('Mcp-Session-Id', id).send(rpc).expect(200). Reserve full SDK client tests for streaming scenarios where you actually need notification handlers and SSE parsing.

Section 12

Production Checklist & Patterns

Before shipping a Streamable HTTP MCP server to production, verify every item on this checklist. Each item represents a real-world failure mode encountered in deployed MCP services. Skip any item and you risk data loss, security vulnerabilities, or cascading failures under load.

1
Use randomUUID() for session IDs — never sequential. Sequential IDs (1, 2, 3…) are trivially enumerable. A compromised session can be used to hijack adjacent sessions. crypto.randomUUID() generates 122 bits of randomness — treat it as a password.
2
Store session → transport mapping in Redis for multi-instance deployments. The in-memory Map from Section 3 is fine for a single-node server. In Kubernetes with multiple replicas, use Redis with serialized transport state so any replica can resume any session. The SDK's transport is serializable for exactly this purpose.
3
Set session TTL + cleanup (max 24h, idle 30min). Without cleanup, session stores grow unboundedly. The auth middleware in Section 8 shows the cleanup interval pattern. Monitor the active-sessions gauge — a leak will appear as a continuously rising count.
4
Implement the event store with configurable TTL for resumability. Default to 5 minutes. For mission-critical streaming operations, increase to 30 minutes and back the store with Redis. Document the TTL prominently — clients need to know how long a reconnect window they have before they must restart.
5
Bind sessions to authenticated users to prevent session hijacking. Every session in your store should have an ownerId field populated from the JWT sub claim on initialize. Validate this on every subsequent request, not just the first one.
6
Add DELETE /mcp handler for graceful session termination. Without it, clients cannot signal clean shutdown — the only cleanup path is idle TTL expiry. Graceful cleanup speeds up resource recovery and gives clean audit logs (explicit close vs timeout).
7
Set proxy_buffering off for all /mcp locations that can stream. A buffering proxy silently accumulates SSE events and delivers them in a burst rather than streaming. The client will appear to hang, then receive all events at once. Always set proxy_buffering off on the /mcp location block; the overhead for non-streaming JSON responses is negligible.
8
Monitor active sessions (gauge) + session creation rate (counter). Active sessions should be bounded and stable in steady state. Session creation rate should correlate with request rate from AI agents. Spikes in either metric indicate a bug (session leak) or a DDoS (creation storm). Alert on both.
9
Implement session draining on deploy: finish in-flight, then close. During a rolling deploy, the old process needs to finish streaming responses before shutdown. Hook into SIGTERM: stop accepting new sessions immediately, wait for all open SSE streams to complete or timeout (30s max), then exit. This prevents clients from seeing mid-stream disconnects during deploys.
10
Test with Last-Event-ID replay before shipping to production. Resumability is only as good as your event store. Write a test that forces a disconnect at a known event ID, reconnects, and asserts that missed events are replayed in order with no gaps or duplicates. Run this test in CI — a broken event store silently degrades to "restart on disconnect" behavior without failing.

🔮

Day 14 previews MCP Registry & Discovery — how Claude Desktop and other MCP hosts find, verify, and install MCP servers automatically. You'll learn about the registry format, capability negotiation during discovery, verified server listings, and how to publish your own servers to the ecosystem.

Quiz · Day 13

Streamable HTTP Transport Check

5 questions on Streamable HTTP fundamentals, resumability, response types, session management, and spec history. Score 5/5 to complete the section.

Q1What HTTP method does a Streamable HTTP client use to send ALL MCP requests — including initialize, tools/call, and resources/read?

AGET

BPOST

CPUT

DPATCH

Q2How does Streamable HTTP achieve resumability after a network disconnect mid-stream?

ATCP keep-alive packets maintain the connection through brief network interruptions

BWebSocket ping/pong frames detect disconnects and trigger automatic reconnection

CThe client sends a Last-Event-ID header on reconnect, and the server replays all missed events from that ID onward

DThe client falls back to polling GET /mcp/status to retrieve missed events

Q3When a Streamable HTTP server needs to stream progress updates for a long-running tool, which response Content-Type does it use?

Aapplication/json

Bapplication/octet-stream

Ctext/event-stream

Dmultipart/form-data

Q4What HTTP header identifies a Streamable HTTP session and must be included in all requests after the initial initialize call?

AX-Session-Token

BAuthorization

CMcp-Session-Id

DX-Correlation-Id

Q5Which MCP specification version introduced the Streamable HTTP transport?

A2024-03

B2024-11

C2025-01

D2025-03

← Previous Day

Day 12: Real API Integrations

GitHub, Slack & PostgreSQL MCP servers

Next Day →

Day 14: MCP Registry & Discovery

How Claude Desktop finds and installs MCP servers

Streamable HTTPTransport

Streamable HTTP
Transport