Level 2 · Day 15 of 90

Advanced Tool Patterns

Annotations, cursor-based pagination, content types (text, image, resource), middleware composition, progress tokens, idempotency keys, and versioning strategies for production-grade MCP tools.

📍 Phase Spark
🎯 Level 2 of 9
⏱ Read time
🔧 Topic Advanced Tool Patterns

The Four Hint Fields

MCP defines four boolean annotation fields that travel alongside a tool's schema. They are hints, not enforcement mechanisms — the server declares them, and the host uses them to decide how to present or gate the tool.

Before annotations existed, every tool looked equally dangerous to a host. A tool that fetches read-only data from a public API was treated the same as one that permanently deletes database records. Hosts had no signal to differentiate them, so they either blocked everything behind confirmation dialogs or trusted everything blindly. Annotations solve this by letting server authors express the tool's intent in a machine-readable way.

The four fields live inside an annotations object that is passed as part of the tool's registration options. All four are optional; unset fields fall back to their specified defaults. This means the most important ones — destructiveHint and openWorldHint — default to true, erring on the side of caution.

readOnlyHint
default: false
When true, declares that this tool makes no side effects — it only reads data. The host can safely auto-approve such calls without user confirmation. Set this for any tool that is purely observational: fetching data, reading files, querying APIs in read-only mode.
destructiveHint
default: true
When true, warns the host that this tool may delete or irreversibly modify data. Claude Desktop surfaces a confirmation dialog. Tools that delete files, drop tables, revoke credentials, or send non-retractable messages should set this to true.
idempotentHint
default: false
When true, signals that calling the tool twice with the same arguments has the same observable effect as calling it once. Pure computation tools and idempotent REST operations (like PUT or DELETE on a resource by ID) qualify. Hosts can safely retry calls on transient failures.
openWorldHint
default: true
When false, declares that the tool is pure computation with no external interaction — no network calls, no file I/O, no randomness. Set this to false for Fibonacci calculators, text formatters, JSON parsers, and similar deterministic utilities.

Here is a concrete example showing both extremes — a destructive GitHub branch deletion tool with all the dangerous flags, and a pure Fibonacci calculator that opts into all the safe flags:

TypeScript — annotation examplesimport { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'demo', version: '1.0.0' });

// ── Dangerous tool: destructive, external, non-idempotent ──────
server.tool(
  'github_delete_branch',
  { repo: z.string(), branch: z.string() },
  async ({ repo, branch }) => { /* ... */ },
  {
    description: 'Permanently delete a branch from a GitHub repository.',
    annotations: {
      destructiveHint: true,   // host will show a confirmation dialog
      idempotentHint: false,   // deleting twice could error on second call
      readOnlyHint: false,     // clearly modifies state
      openWorldHint: true      // makes a network call to GitHub
    }
  }
);

// ── Safe tool: read-only, pure, idempotent ────────────────────
server.tool(
  'calculate_fibonacci',
  { n: z.number().int().min(0).max(90) },
  async ({ n }) => {
    const fib = (x: number): number => x <= 1 ? x : fib(x-1) + fib(x-2);
    return { content: [{ type: 'text', text: String(fib(n)) }] };
  },
  {
    description: 'Calculate the nth Fibonacci number (pure computation).',
    annotations: {
      readOnlyHint: true,      // no side effects whatsoever
      idempotentHint: true,    // fib(10) always returns 55
      openWorldHint: false,    // no network, no I/O — pure math
      destructiveHint: false   // cannot destroy anything
    }
  }
);

Claude Desktop's behavior with these annotations is immediate and visible. Tools annotated with destructiveHint: true show an amber warning dialog asking the user to confirm before execution. Tools with readOnlyHint: true may be auto-approved depending on the host's trust settings. Understanding this UX helps you write annotations that match the actual risk profile of your tools.

AnnotationDescriptionDefaultSet true when…Example tools
readOnlyHintNo side effects; safe to call anytimefalseTool only fetches / reads datagithub_get_repo, db_query
destructiveHintModifies or deletes data irreversiblytrueTool deletes, drops, revokes, or sendsgithub_delete_branch, db_drop_table
idempotentHintSame args → same result; safe to retryfalseTool is a pure function or idempotent REST opcalculate_fibonacci, format_json
openWorldHintInteracts with the external worldtrueSet false for pure computation onlycalculate_fibonacci (false), get_weather (true)
💡
Annotations are promises, not proofs. The SDK does not enforce them — they are purely informational hints. If you declare readOnlyHint: true but your handler writes to a database, the host won't catch it. Treat annotations as documentation that travels with the schema, and make sure they match your implementation.

Beyond Basic z.string()

Zod's surface area goes far deeper than simple string and number validators. For tools that accept complex or conditional input, Zod provides cross-field validation, discriminated unions, and input transformation that runs before your handler sees the data.

The most powerful feature for multi-mode tools is z.discriminatedUnion(). Rather than accepting a grab-bag of optional parameters and writing conditional logic to validate them manually, you declare each mode as its own schema variant. Zod validates exactly the right fields for the chosen mode and rejects any cross-contamination. The resulting TypeScript types are also fully narrowed inside switch cases — no type assertions needed.

The .refine() method attaches arbitrary validation logic to any schema node. It receives the parsed value and returns true or false. When false, Zod emits the error message you provide. This is perfect for semantic constraints that cannot be expressed purely in types — like validating that a string is a compilable regex, or that a URL uses HTTPS, or that a file extension matches an expected list.

For cross-field constraints — where validity depends on the relationship between multiple fields — .superRefine() gives you access to the full object and lets you attach errors to specific paths. This produces error messages that point to the exact field that is wrong, not just the top-level object.

TypeScript — discriminated union with refine and superRefineimport { z } from 'zod';

// ── Discriminated union: search type determines valid params ──
const SearchParams = z.discriminatedUnion('type', [
  z.object({
    type: z.literal('text'),
    query: z.string().min(1),
    caseSensitive: z.boolean().default(false)
  }),
  z.object({
    type: z.literal('regex'),
    pattern: z.string().refine(p => {
      try { new RegExp(p); return true; }
      catch { return false; }
    }, 'Invalid regex pattern'),
    flags: z.string().regex(/^[gimsuy]*$/).optional()
  }),
  z.object({
    type: z.literal('semantic'),
    query: z.string(),
    threshold: z.number().min(0).max(1).default(0.7),
    limit: z.number().int().min(1).max(100).default(10)
  })
]);

server.tool('db_search', SearchParams, async (params) => {
  switch (params.type) {
    case 'text':     return textSearch(params.query, params.caseSensitive);
    case 'regex':    return regexSearch(params.pattern, params.flags);
    case 'semantic': return semanticSearch(params.query, params.threshold, params.limit);
  }
});

// ── Cross-field validation with superRefine ──────────────────
const DateRangeParams = z.object({
  startDate: z.string().datetime(),
  endDate: z.string().datetime(),
  limit: z.number().int().min(1).max(1000).default(100)
}).superRefine((data, ctx) => {
  if (new Date(data.endDate) <= new Date(data.startDate)) {
    ctx.addIssue({
      code: z.ZodIssueCode.custom,
      message: 'endDate must be after startDate',
      path: ['endDate']
    });
  }
});

The .transform() method is your escape hatch for input normalization. It runs after validation passes and before your handler receives the value. Use it to coerce string dates into Date objects, trim and lowercase strings, parse JSON embedded in a string field, or convert a comma-separated list into an actual array. Transforms keep your handler code clean — it deals with the right types from the start, not raw strings.

Nested objects and arrays work exactly as you would expect: z.object({...}) for objects and z.array(z.string()) for lists. Recursive schemas — like a tree of nested filters — use z.lazy() to break the circular reference. For flexible types where a field could be a string or a number, z.union([z.string(), z.number()]) creates a type-safe union that Zod validates left-to-right.

ℹ️
Zod errors become MCP errors automatically. When input fails schema validation, the TypeScript SDK catches the Zod error and returns it as a tool call result with isError: true, including the human-readable error message. You do not need to add try/catch around your Zod schemas — just let the SDK handle it.

Paginating Large Result Sets Safely

When a tool might return dozens, hundreds, or thousands of items, returning everything in a single response is both expensive and impractical. Cursor-based pagination lets the AI agent work through results incrementally, one page at a time.

There are two common pagination approaches: offset-based and cursor-based. Offset-based pagination uses numeric page numbers or offsets — give me items 21–40. It is simple but has a subtle failure mode with live data: if a record is inserted or deleted between two page fetches, items shift position. Page 2 might repeat an item from page 1, or skip one entirely. For static datasets this rarely matters; for live APIs it is a real problem.

Cursor-based pagination avoids this by using a stable pointer into the dataset — typically the ID or sort key of the last item returned. The next page starts from that marker, regardless of what has been inserted or deleted elsewhere. The cursor is opaque to the caller: it is encoded as a base64 string that the caller passes back verbatim. The server decodes it to recover whatever state it needs to continue.

A critical design principle: the cursor must contain everything needed to reconstruct the query. If the tool accepts a language filter, the cursor must embed that filter. Otherwise, a caller who passes a cursor from a filtered request alongside different filter args would get inconsistent pages. Encoding the full query context into the cursor makes pagination self-contained and stateless on the server side.

TypeScript — cursor-based pagination implementationinterface PageCursor {
  offset: number;
  filter: string;
  sortKey: string;
}

function encodeCursor(cursor: PageCursor): string {
  return Buffer.from(JSON.stringify(cursor)).toString('base64url');
}
function decodeCursor(encoded: string): PageCursor {
  return JSON.parse(Buffer.from(encoded, 'base64url').toString());
}

server.tool(
  'github_list_repositories',
  {
    org: z.string(),
    language: z.string().optional(),
    cursor: z.string().optional().describe('Pagination cursor from previous call'),
    pageSize: z.number().int().min(1).max(100).default(20)
  },
  async ({ org, language, cursor, pageSize }) => {
    const page = cursor
      ? decodeCursor(cursor)
      : { offset: 0, filter: language ?? '', sortKey: 'updated' };

    const items = await fetchRepos(org, { offset: page.offset, limit: pageSize, language });
    const total = await countRepos(org, { language });

    const nextCursor = page.offset + pageSize < total
      ? encodeCursor({ offset: page.offset + pageSize, filter: page.filter, sortKey: page.sortKey })
      : null;

    const lines = [
      `Repositories (${page.offset + 1}–${Math.min(page.offset + items.length, total)} of ${total}):`,
      ...items.map(r => `- ${r.name}: ${r.description ?? 'no description'} (⭐${r.stars})`),
      nextCursor ? `\n[Next page cursor: ${nextCursor}]` : '\n[End of results]'
    ];

    return { content: [{ type: 'text', text: lines.join('\n') }] };
  }
);

Some external APIs — GitHub, Slack, Linear — provide their own cursor or token strings in their responses. You do not need to decode or interpret these; just wrap them opaquely. Store the API's own pagination token inside your cursor struct and forward it on the next call. This way your pagination layer is a thin adapter, not a re-implementation of the upstream API's pagination logic.

The AI agent's workflow with pagination is straightforward: it calls the tool without a cursor to get the first page, reads the result, checks for a next-page cursor in the response text, and calls again with that cursor if it needs more results. Writing the cursor clearly into the response — like [Next page cursor: abc123] — makes it easy for the model to extract and reuse it.

⚠️
Always validate decoded cursors. A cursor is user-supplied input, and a malicious or malformed cursor could crash your handler or expose unintended data. Wrap decodeCursor() in a try/catch and return an isError: true response if decoding fails. Never trust cursor content without validation.

Text, Image, and Embedded Resources

A tool result's content array can hold three distinct item types: text for markdown and plain text, image for base64-encoded binary images, and resource for embedded resource references. Choosing the right type matters because hosts and models render them differently.

The text type is the workhorse. It accepts any string — plain prose, Markdown tables, code blocks, JSON. Claude renders Markdown natively, so returning a well-structured Markdown string with headers and tables is far more readable than a JSON blob. Use text for everything that is fundamentally textual: summaries, lists, structured reports, code excerpts, error messages.

The image type sends binary image data as a base64 string with an accompanying MIME type (image/png, image/jpeg, image/webp). This is how screenshot tools work — they launch a headless browser, capture the page, and return the raw PNG bytes encoded as base64. Claude receives the image as a vision input and can describe, analyze, or extract text from it. The key constraint is size: very large images should be compressed or resized before encoding.

The resource type embeds an existing resource inline, identified by its URI and MIME type. It is particularly useful when a tool produces output that is naturally a resource — a generated file, a config blob, a diff. Instead of serializing the content as a text blob, you embed it as a first-class resource that the host can route to appropriate handlers. Resources can contain either text (for text MIME types) or blob (base64-encoded binary MIME types) content.

TypeScript — all three content types// 1. Text content — markdown supported
return {
  content: [{
    type: 'text',
    text: '## Search Results\n\n| Repo | Stars |\n|---|---|\n| my-repo | 1234 |'
  }]
};

// 2. Image content — screenshot tool using Playwright
import { chromium } from 'playwright';

server.tool('capture_screenshot', { url: z.string().url() }, async ({ url }) => {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  await page.goto(url, { waitUntil: 'networkidle' });
  const screenshot = await page.screenshot({ type: 'png' });
  await browser.close();
  return {
    content: [{
      type: 'image',
      data: screenshot.toString('base64'),
      mimeType: 'image/png'
    }]
  };
});

// 3. Embedded resource — return a file inline
server.tool('get_config_file', { path: z.string() }, async ({ path }) => {
  const content = await fs.readFile(path, 'utf-8');
  return {
    content: [{
      type: 'resource',
      resource: {
        uri: `file://${path}`,
        mimeType: path.endsWith('.json') ? 'application/json' : 'text/plain',
        text: content
      }
    }]
  };
});

The isError flag is a top-level field on the tool result object (not inside a content item). Setting isError: true signals that the tool call failed, even though the response is technically well-formed. This is the correct pattern for expected errors — API rate limits, resource not found, invalid input that slipped past Zod. The model sees the error content and can decide how to proceed, rather than treating the failure as an unrecoverable exception.

Type
text
  • Markdown, plain text, code
  • Structured tables and lists
  • Error messages
  • JSON/YAML as strings
Type
image
  • Screenshots (PNG/JPEG)
  • Generated charts/graphs
  • OCR source images
  • Thumbnails and previews
Type
resource
  • Config files (JSON, YAML)
  • Diffs and patches
  • Generated file content
  • Binary blobs with URI identity

Rich Multi-Part Tool Responses

A single tool call can return an array containing multiple content items — any mix of text, images, and embedded resources. This enables tools that produce genuinely rich, structured output rather than cramming everything into one text blob.

The motivating example is a PR analysis tool. A comprehensive PR review needs at least two parts: a summary of metadata (author, status, review count, changed files) and the actual diff for Claude to analyze. These are fundamentally different kinds of content. The summary is natural text. The diff is a structured artifact — it has its own syntax, its own identity as a resource, and potentially its own rendering in diff viewers. Squashing them together into one Markdown string loses the structure.

When the model receives a multi-item response, it processes each content item according to its type. Text items are read as conversation context. Image items enter the vision pipeline. Resource items can be routed to appropriate handlers or displayed inline. The model can reference and reason about all of them together, treating the response as a coherent multi-modal package.

TypeScript — multi-item PR analysis toolserver.tool(
  'analyze_pr',
  { owner: z.string(), repo: z.string(), pr: z.number() },
  async ({ owner, repo, pr }) => {
    const [pullRequest, reviews, diff] = await Promise.all([
      octokit.pulls.get({ owner, repo, pull_number: pr }),
      octokit.pulls.listReviews({ owner, repo, pull_number: pr }),
      octokit.pulls.get({ owner, repo, pull_number: pr, mediaType: { format: 'diff' } })
    ]);

    return {
      content: [
        {
          type: 'text',
          text: `## PR #${pr}: ${pullRequest.data.title}\n\n` +
            `**Author:** ${pullRequest.data.user?.login}\n` +
            `**Status:** ${pullRequest.data.state}\n` +
            `**Reviews:** ${reviews.data.length} ` +
            `(${reviews.data.filter(r => r.state === 'APPROVED').length} approved)\n` +
            `**Changed files:** ${pullRequest.data.changed_files}\n` +
            `**+${pullRequest.data.additions} / -${pullRequest.data.deletions}`
        },
        {
          type: 'resource',
          resource: {
            uri: `github://pr/${owner}/${repo}/${pr}/diff`,
            mimeType: 'text/x-diff',
            text: String(diff.data)
          }
        }
      ]
    };
  }
);

There is no hard limit on the number of content items in a response. A code generation tool might return a text explanation, three separate file resources (one per generated file), and an image of the resulting architecture diagram — all in a single call. The practical limit is the host's context window and the user's patience waiting for all the data to arrive.

Content typeBest forSDK field namesExample
textSummaries, tables, code, errorstype, textPR summary, search results, error message
imageVisual data, screenshots, chartstype, data, mimeTypeBrowser screenshot, matplotlib graph
resourceStructured files, diffs, blobstype, resource{uri, mimeType, text|blob}Git diff, JSON config, generated source file
💡
Order matters for readability. Put the most important content item first. Models tend to weight earlier context more heavily when reasoning. If your tool returns a text summary plus a large diff, lead with the summary. The model can always refer to the diff afterward, but the summary frames how it should interpret what follows.

Composing Cross-Cutting Concerns

Production servers need logging, caching, rate limiting, timeout handling, and error tracking — but duplicating that logic in every tool handler is a maintenance nightmare. The middleware pattern wraps handlers with reusable behavior without modifying the underlying logic.

The idea is borrowed from HTTP middleware (Express, Koa, Hono): a middleware is a function that takes a handler and returns a new handler with the same signature. Because each middleware is a pure function returning another function, you can compose them freely with simple function calls. The outermost wrapper runs first, can observe inputs and outputs, and can short-circuit with a cached result before the real handler is even invoked.

Logging middleware is the most universally useful. Every production tool should log its name, its arguments, and how long it took to complete — whether it succeeded or failed. Writing this once as a composable wrapper means you can add it to any handler with a single function call, and remove it just as easily during testing.

Caching middleware is appropriate for read-only tools that hit expensive upstream APIs. A weather tool that fetches from an external service on every call will quickly exhaust your API quota. Wrapping it in a 60-second cache reduces API calls dramatically while keeping results fresh. The cache key is the JSON-serialized arguments, so different inputs get different cache entries.

1
withLogging
Records tool name, arguments, duration, and outcome to stderr
2
withTimeout
Aborts if the handler takes longer than the specified limit
3
withCache
Returns cached result if available and not expired; populates on miss
4
actualHandler
Your real tool implementation — only reached on a cache miss
TypeScript — composable middleware// middleware.ts
type ToolHandler<T> = (args: T) => Promise<{ content: any[] }>;

// Logging middleware
function withLogging<T>(name: string, handler: ToolHandler<T>): ToolHandler<T> {
  return async (args) => {
    const start = Date.now();
    console.error(`[${name}] called with: ${JSON.stringify(args)}`);
    try {
      const result = await handler(args);
      console.error(`[${name}] completed in ${Date.now() - start}ms`);
      return result;
    } catch (error) {
      console.error(`[${name}] failed after ${Date.now() - start}ms:`, error);
      throw error;
    }
  };
}

// Caching middleware
function withCache<T>(handler: ToolHandler<T>, ttlMs = 30_000): ToolHandler<T> {
  const cache = new Map<string, { result: any; expires: number }>();
  return async (args) => {
    const key = JSON.stringify(args);
    const cached = cache.get(key);
    if (cached && cached.expires > Date.now()) return cached.result;
    const result = await handler(args);
    cache.set(key, { result, expires: Date.now() + ttlMs });
    return result;
  };
}

// Timeout middleware
function withTimeout<T>(handler: ToolHandler<T>, timeoutMs = 10_000): ToolHandler<T> {
  return async (args) => {
    const controller = new AbortController();
    const timer = setTimeout(() => controller.abort(), timeoutMs);
    try {
      return await handler(args);
    } finally {
      clearTimeout(timer);
    }
  };
}

// Compose: logging → timeout → cache → actual handler
const getWeatherHandler = withLogging('get_weather',
  withTimeout(
    withCache(
      async ({ location }) => fetchWeather(location),
      60_000
    )
  )
);

server.tool('get_weather', { location: z.string() }, getWeatherHandler);

Beyond these three, the pattern extends naturally to rate limiting (track call counts per time window, return an error if exceeded), authentication (check a token or permission before forwarding), retry logic (automatically retry on transient network errors), and metrics collection (report latency and error counts to your observability stack). Each concern is isolated in its own composable function.

💡
Compose outer to inner, but think inner to outer. withLogging(withTimeout(withCache(handler))) — logging is outermost so it captures total wall time including cache lookups. Timeout wraps the cache so a very slow cache backend doesn't hang forever. Cache wraps the actual handler so it only runs on misses. Think about which concern you want to be the last line of defense before understanding the composition order.

Keeping Users Informed During Long Operations

When a tool takes more than a few seconds to complete — processing files, running analyses, invoking slow APIs — the user sees nothing until the result arrives. Progress tokens let the server send incremental status updates back to the client while the tool is still running.

The mechanism is a lightweight pub-sub pattern embedded in the MCP protocol. The client includes an optional _meta.progressToken field in the tool call request. This token is an opaque identifier — typically a string or integer chosen by the client. The server, as it works through its task, sends notifications/progress events that reference the same token. The client matches these events to the in-flight call and can update a progress bar, a spinner, or a status message accordingly.

The progress notification payload includes a progress field (current step), an optional total (maximum steps, for percentage calculation), and an optional message string describing the current operation. Not all clients support progress notifications, so always check whether the token was provided before attempting to send updates — a missing token means the client does not want progress events.

Multiple concurrent tool calls can each have their own progress token. The client uses the token to route incoming progress notifications to the right call. If you have two parallel file-processing tools running simultaneously, each has a unique token, and their progress events are independent. The server simply sends each notification with its respective token.

TypeScript — bulk file processing with progress notificationsserver.tool(
  'bulk_process_files',
  { directory: z.string(), pattern: z.string().default('**/*.ts') },
  async ({ directory, pattern }, { meta }) => {
    const files = await glob(pattern, { cwd: directory });
    const progressToken = meta?.progressToken;
    const results: string[] = [];

    for (let i = 0; i < files.length; i++) {
      const file = files[i];

      // Send progress notification only if the client wants it
      if (progressToken !== undefined) {
        await server.server.sendProgress({
          progressToken,
          progress: i + 1,
          total: files.length,
          message: `Processing ${file} (${i + 1}/${files.length})`
        });
      }

      const analysis = await analyzeFile(path.join(directory, file));
      results.push(
        `${file}: ${analysis.linesOfCode} lines, complexity ${analysis.cyclomaticComplexity}`
      );
    }

    return {
      content: [{
        type: 'text',
        text: `Processed ${files.length} files:\n\n${results.join('\n')}`
      }]
    };
  }
);

From the client side, progress tokens are created by the caller and injected into the request's _meta field. In practice with Claude Desktop and similar hosts, the host manages token generation and display automatically. When building your own MCP client, you allocate tokens from a counter or UUID generator, subscribe to progress notifications filtered by that token, and update your UI accordingly. When the tool call completes (whether success or error), the token's lifecycle ends.

ℹ️
Progress tokens are a capability, not a requirement. The host may not provide a token even for long-running tools. Write your handlers to work correctly whether or not a token is present. The pattern if (progressToken !== undefined) { await sendProgress(...); } is the correct guard — not a null check, because an integer 0 is a valid token.

Safe Retries for State-Mutating Operations

Network failures happen at the worst possible moments — after a request has been sent but before the response arrives. For read-only tools this is harmless. For tools that send messages, create records, or charge customers, a naive retry causes duplicate side effects. Idempotency keys solve this.

The pattern is simple: the caller includes an optional unique key with each state-mutating request. The server stores the key alongside the result of the first execution. On subsequent calls with the same key, the server returns the stored result immediately without executing the operation again. The caller cannot tell from the response whether it got a fresh execution or a cached replay — both look identical.

The key lifetime must be long enough to cover realistic retry windows. Payment systems typically keep keys for 24–72 hours. For MCP tools, 24 hours is a reasonable default — it covers any plausible retry scenario while keeping the in-memory store manageable. In production, this store should be backed by Redis or a database, not an in-process Map, to survive server restarts.

Optionally, return a wasNew flag in the response text so callers know whether their call caused a side effect. This is useful for debugging and auditing. A monitoring dashboard can track what fraction of calls are replays versus first executions.

TypeScript — idempotency store and tool// idempotency-store.ts
class IdempotencyStore {
  private store = new Map<string, { result: any; createdAt: number }>();
  private TTL = 24 * 60 * 60 * 1000; // 24 hours

  async getOrExecute(
    key: string,
    fn: () => Promise<any>
  ): Promise<{ result: any; wasNew: boolean }> {
    const existing = this.store.get(key);
    if (existing && existing.createdAt + this.TTL > Date.now()) {
      return { result: existing.result, wasNew: false };
    }
    const result = await fn();
    this.store.set(key, { result, createdAt: Date.now() });
    return { result, wasNew: true };
  }
}

const idempotency = new IdempotencyStore();

server.tool(
  'slack_send_message',
  {
    channel: z.string(),
    text: z.string(),
    idempotencyKey: z.string().optional()
      .describe('Optional key to prevent duplicate messages on retry')
  },
  async ({ channel, text, idempotencyKey }) => {
    const key = idempotencyKey ?? crypto.randomUUID();
    const { result, wasNew } = await idempotency.getOrExecute(key, () =>
      slackClient.chat.postMessage({ channel, text })
    );
    return {
      content: [{
        type: 'text',
        text: wasNew
          ? `Message sent (ts: ${result.ts})`
          : `Duplicate request — original message (ts: ${result.ts})`
      }]
    };
  },
  { annotations: { idempotentHint: false } }
);

Notice that idempotentHint is set to false in the annotations, even though idempotency keys are supported. The annotation describes the tool's default behavior without a key — sending the same message twice without a key would create two Slack messages, which is inherently non-idempotent. The key is an opt-in mechanism, not the default mode.

⚠️
Generate fresh keys for genuinely new operations. If the caller omits an idempotency key (idempotencyKey: undefined), generate a fresh crypto.randomUUID() as the key. This ensures that uncached calls are always executed, while still giving the idempotency store a well-defined key for each operation. Never reuse keys across different intended operations.

Building Complex Tools from Simple Functions

Large, monolithic tool handlers are difficult to test, hard to reason about, and fragile when any one dependency fails. Composing tools from small, pure data-fetching functions produces more testable, more resilient, and more maintainable code.

The key insight is to separate the data-fetching layer from the orchestration layer. Pure functions that fetch a single piece of data — getRepo(), getOpenIssues(), getRecentCommits() — are trivially testable in isolation. You can unit-test each one with a mocked HTTP client without spinning up an MCP server. The orchestrating tool handler then calls these pure functions and formats the results, keeping its own complexity low.

Promise.allSettled() is the composing function of choice here. Unlike Promise.all(), which rejects the entire batch if any promise fails, Promise.allSettled() waits for every promise and reports individual outcomes. This means a composed tool can still return partial results even if one sub-call fails — maybe the commit history API is down, but the repo metadata and issues are fine. The user gets useful information instead of a complete failure.

TypeScript — composed GitHub repo summary tool// lib/github.ts — pure data-fetching functions (each independently testable)
export async function getRepo(octokit: Octokit, owner: string, repo: string) { /* ... */ }
export async function getOpenIssues(octokit: Octokit, owner: string, repo: string, limit = 10) { /* ... */ }
export async function getRecentCommits(octokit: Octokit, owner: string, repo: string, days = 7) { /* ... */ }
export async function getContributors(octokit: Octokit, owner: string, repo: string) { /* ... */ }

// tools.ts — orchestrating tool handler
server.tool(
  'github_repo_summary',
  { owner: z.string(), repo: z.string() },
  async ({ owner, repo }) => {
    // Run all sub-calls concurrently; partial failures are handled gracefully
    const [repoData, issues, commits, contributors] = await Promise.allSettled([
      getRepo(octokit, owner, repo),
      getOpenIssues(octokit, owner, repo),
      getRecentCommits(octokit, owner, repo),
      getContributors(octokit, owner, repo)
    ]);

    const lines: string[] = [];

    if (repoData.status === 'fulfilled') {
      const r = repoData.value;
      lines.push(
        `# ${r.full_name}`,
        `${r.description ?? 'No description'}`,
        `⭐${r.stargazers_count} · 🍴${r.forks_count} · ${r.language ?? 'unknown'}`
      );
    }
    if (issues.status === 'fulfilled') {
      lines.push(`\n## Open Issues (${issues.value.length})`);
      issues.value.forEach(i => lines.push(`- #${i.number}: ${i.title}`));
    }
    if (commits.status === 'fulfilled') {
      lines.push(`\n## Recent Commits (last 7 days: ${commits.value.length})`);
    }
    if (contributors.status === 'fulfilled') {
      lines.push(
        `\n## Top Contributors: ${contributors.value.slice(0,3).map(c => c.login).join(', ')}`
      );
    }

    return { content: [{ type: 'text', text: lines.join('\n') }] };
  }
);

The test story for composed tools is excellent. Each pure fetcher function can be tested with a mocked octokit object that returns fixture data. The orchestrating handler can be tested end-to-end by mocking the fetchers themselves. Because each layer has a clean boundary, you can replace a real API call with a mock at exactly the right level of abstraction.

Concurrency is another win. Because the sub-calls are independent, they run in parallel with Promise.allSettled(). A tool that makes four API calls sequentially might take 2–4 seconds. Running them concurrently brings that down to the time of the slowest single call — often under a second. For tools that compose many data sources, the concurrency benefit compounds.

Evolving Tools Without Breaking Callers

AI models and host applications may cache tool schemas between sessions. When you rename a tool, change a required parameter, or alter a return shape, you risk breaking callers that were working the day before. Safe tool evolution requires a clear versioning strategy.

Not all changes are equal. Adding an optional parameter is safe — existing callers do not pass it, and the schema still validates their calls. Adding a new tool entirely is safe — nothing breaks because nothing was relying on the new tool before it existed. These are backwards-compatible changes you can ship freely.

Renaming a tool is a breaking change. Any model or application that learned the old name will fail to find it. The right migration path is to keep the old name as a deprecated wrapper that delegates to the new implementation, mark it clearly in its description, log a warning to stderr when it is called, and remove it only after a full major version cycle. This gives users time to update their configurations and prompts.

Removing a required parameter or changing a parameter's type are both breaking. If you remove a parameter, callers that still pass it will either fail validation or silently pass unexpected data. If you change a type — say, from z.string() to z.number() for a port field — existing callers passing strings will fail. These changes require a major version bump and an explicit migration period.

Change typeSafe?Strategy
Add optional parameter✅ SafeShip immediately; existing calls still validate
Add a new tool✅ SafeShip immediately; no existing caller depends on it
Improve tool description✅ SafeShip immediately; better descriptions help the model
Rename a tool⚠️ BreakingKeep old name as deprecated wrapper for one major version
Remove a required parameter❌ BreakingMake it optional first, deprecate, then remove in next major
Change a parameter type❌ BreakingAdd new param with new type, deprecate old, remove in next major
TypeScript — deprecated wrapper forwarding to new tool// Deprecated v1 tool — kept for one major version cycle
server.tool(
  'search',  // old name without service prefix
  { q: z.string() },
  async ({ q }) => {
    console.error('[DEPRECATED] "search" called — update to "github_search_repos"');
    return githubSearchRepos({ query: q, limit: 10 });  // delegate to new implementation
  },
  {
    description: '[DEPRECATED] Use github_search_repos instead. Will be removed in v2.0.',
    annotations: { readOnlyHint: true }
  }
);

// Current v2 tool — properly namespaced
server.tool(
  'github_search_repos',
  { query: z.string(), limit: z.number().int().min(1).max(30).default(10) },
  async (args) => githubSearchRepos(args),
  { description: 'Search GitHub repositories by keyword, language, stars, or owner.' }
);

The deprecation message in the tool description serves a dual purpose. It informs human operators who read the tool list, and it also appears in the model's context when it is deciding which tool to call. A well-written deprecation message — "[DEPRECATED] Use github_search_repos instead" — will cause the model to prefer the new tool in future calls even without any other configuration change.

Verifying Annotations, Pagination, and Idempotency

The advanced patterns covered in this lesson each require specialized test strategies. Pagination must be tested as a round-trip — encode cursor, decode cursor, verify no overlap between pages. Idempotency keys must be tested for both the first-call and the replay cases. Content types need to be verified for correct structure.

In-process testing is the fastest approach for tool logic. The TypeScript SDK provides a client/server pair that communicates in memory, with no network overhead. You spin up your server, connect a client, and call tools exactly as a real host would. The full MCP protocol is exercised — schema validation, error handling, content type serialization — without the latency of a real network or the complexity of a subprocess.

For the pagination test specifically, the key assertion is that there is no overlap between pages. Extract the cursor from page 1's response text using a regex, use it to fetch page 2, and verify that the items in page 2 do not appear in page 1. Also verify that the last page has no next-cursor in its response — the [End of results] marker.

Idempotency tests need to call the tool twice with the same key and verify that the second call returns the Duplicate request marker, not a fresh execution result. This proves the store is working. A third test should verify that two calls with different keys each produce fresh results — the store is not accidentally caching everything.

TypeScript — advanced-tools.test.ts (Vitest)import { describe, it, expect, vi, beforeEach } from 'vitest';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { createTestTransportPair } from './test-utils.js';

describe('Advanced tool patterns', () => {
  let client: Client;
  let server: McpServer;

  beforeEach(async () => {
    server = createServer();
    client = await connectInProcess(server);
  });

  it('returns correct content type for screenshot tool', async () => {
    const result = await client.callTool({
      name: 'capture_screenshot',
      arguments: { url: 'https://example.com' }
    });
    expect(result.content).toHaveLength(1);
    expect(result.content[0].type).toBe('image');
    expect(result.content[0].mimeType).toBe('image/png');
    expect(result.content[0].data).toMatch(/^[A-Za-z0-9+/]+=*$/); // valid base64
  });

  it('paginates correctly with cursor', async () => {
    const page1 = await client.callTool({
      name: 'github_list_repositories',
      arguments: { org: 'test-org', pageSize: 5 }
    });
    const page1Text = page1.content[0].text as string;
    const cursorMatch = page1Text.match(/Next page cursor: ([\w-]+)/);
    expect(cursorMatch).toBeTruthy();

    const page2 = await client.callTool({
      name: 'github_list_repositories',
      arguments: { org: 'test-org', cursor: cursorMatch![1], pageSize: 5 }
    });
    const page2Text = page2.content[0].text as string;
    // Ensure no items from page 1 appear on page 2
    expect(page2Text).not.toContain(page1Text.split('\n')[1]);
  });

  it('deduplicates with idempotency key', async () => {
    const key = crypto.randomUUID();
    const r1 = await client.callTool({
      name: 'slack_send_message',
      arguments: { channel: '#test', text: 'hello', idempotencyKey: key }
    });
    const r2 = await client.callTool({
      name: 'slack_send_message',
      arguments: { channel: '#test', text: 'hello', idempotencyKey: key }
    });
    expect(r1.content[0].text).toContain('Message sent');
    expect(r2.content[0].text).toContain('Duplicate request');
  });

  it('validates discriminated union correctly', async () => {
    const result = await client.callTool({
      name: 'db_search',
      arguments: { type: 'regex', pattern: '[invalid' }
    });
    expect(result.isError).toBe(true);
    expect(result.content[0].text).toContain('Invalid regex');
  });
});

For middleware testing, test each middleware function independently before testing the composed stack. A withCache test should verify that a second call with the same arguments does not invoke the underlying handler. A withTimeout test should mock a slow handler and verify that the timeout fires. Composing already-tested middleware gives you confidence that the composition is correct without rerunning the underlying logic tests.

Advanced Tool Patterns Summary

This lesson covered the full advanced toolkit for MCP tool authors. Before shipping any production tool, run through the checklist below. Each item corresponds to a pattern covered in detail above.

Annotate all tools with readOnlyHint/destructiveHint Every tool should declare its side-effect profile. Missing annotations default to the conservative (destructive, open-world) interpretation.
Use discriminated unions for multi-mode tools If a tool has multiple operating modes (text vs regex vs semantic search), use z.discriminatedUnion() rather than a flat schema with many optionals.
Implement cursor pagination for any list > 20 items Never return unbounded result sets. Cursors keep memory usage predictable and let agents work through large datasets incrementally.
Use image content type for visual data Screenshots, charts, and diagrams belong in type: 'image' fields, not as ASCII art in text fields. The model's vision capability works far better with real images.
Apply logging middleware to every tool Every production tool should log its name, arguments, duration, and outcome. This is non-negotiable for debugging failures in production.
Support progressToken for operations > 2 seconds Any tool that might take more than a couple of seconds to complete should check for a _meta.progressToken and send incremental notifications.
Use idempotency keys for state-mutating tools Any tool that sends messages, creates records, or triggers payments should accept an optional idempotencyKey parameter and deduplicate against it.
Compose from pure functions for testability Split data-fetching logic into small pure functions. Test them independently. The orchestrating tool handler should be thin — just calling fetchers and formatting results.
Keep deprecated tools with forwarding wrappers through one major version When renaming or replacing a tool, keep the old name active for a full major version cycle, delegating to the new implementation and logging a deprecation warning.
Test pagination cursor round-trips explicitly Write a specific test that fetches page 1, extracts the cursor, fetches page 2, and asserts zero overlap between the two result sets.

Quick Reference: Content Types

Content typeUse caseSDK field namesExample
textMarkdown summaries, tables, code, errors, plain prosetype: 'text', text: stringPR summary, search results table, error message
imageScreenshots, charts, diagrams, OCR source materialtype: 'image', data: string (base64), mimeType: stringPlaywright screenshot, matplotlib graph, camera capture
resourceConfig files, diffs, generated source, binary blobs with identitytype: 'resource', resource: { uri, mimeType, text|blob }Git diff, JSON config, TypeScript source file
🔮
Day 16 preview: Resources Deep Dive — URI templates, subscriptions, live change notifications, and embedding resources in tool results. You have seen resources used as content items today — tomorrow we go deep on the full Resources API: how to define URI templates, how subscriptions and live updates work, and how hosts cache and invalidate resource content.
Section Quiz
5 questions · instant feedback · Level 2 checkpoint
1. Which annotation tells the host this tool makes no side effects?
2. What encoding format should pagination cursors use to stay opaque to callers?
3. Which content type would you use to return a PNG screenshot from a tool?
4. What Zod method validates cross-field constraints (e.g., endDate must be after startDate)?
5. When should you use Promise.allSettled() instead of Promise.all() in a composed tool?
out of 5 correct —