Annotations, cursor-based pagination, content types (text, image, resource), middleware composition, progress tokens, idempotency keys, and versioning strategies for production-grade MCP tools.
MCP defines four boolean annotation fields that travel alongside a tool's schema. They are hints, not enforcement mechanisms — the server declares them, and the host uses them to decide how to present or gate the tool.
Before annotations existed, every tool looked equally dangerous to a host. A tool that fetches read-only data from a public API was treated the same as one that permanently deletes database records. Hosts had no signal to differentiate them, so they either blocked everything behind confirmation dialogs or trusted everything blindly. Annotations solve this by letting server authors express the tool's intent in a machine-readable way.
The four fields live inside an annotations object that is passed as part of the tool's registration options. All four are optional; unset fields fall back to their specified defaults. This means the most important ones — destructiveHint and openWorldHint — default to true, erring on the side of caution.
true, declares that this tool makes no side effects — it only reads data. The host can safely auto-approve such calls without user confirmation. Set this for any tool that is purely observational: fetching data, reading files, querying APIs in read-only mode.true, warns the host that this tool may delete or irreversibly modify data. Claude Desktop surfaces a confirmation dialog. Tools that delete files, drop tables, revoke credentials, or send non-retractable messages should set this to true.true, signals that calling the tool twice with the same arguments has the same observable effect as calling it once. Pure computation tools and idempotent REST operations (like PUT or DELETE on a resource by ID) qualify. Hosts can safely retry calls on transient failures.false, declares that the tool is pure computation with no external interaction — no network calls, no file I/O, no randomness. Set this to false for Fibonacci calculators, text formatters, JSON parsers, and similar deterministic utilities.Here is a concrete example showing both extremes — a destructive GitHub branch deletion tool with all the dangerous flags, and a pure Fibonacci calculator that opts into all the safe flags:
TypeScript — annotation examplesimport { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({ name: 'demo', version: '1.0.0' });
// ── Dangerous tool: destructive, external, non-idempotent ──────
server.tool(
'github_delete_branch',
{ repo: z.string(), branch: z.string() },
async ({ repo, branch }) => { /* ... */ },
{
description: 'Permanently delete a branch from a GitHub repository.',
annotations: {
destructiveHint: true, // host will show a confirmation dialog
idempotentHint: false, // deleting twice could error on second call
readOnlyHint: false, // clearly modifies state
openWorldHint: true // makes a network call to GitHub
}
}
);
// ── Safe tool: read-only, pure, idempotent ────────────────────
server.tool(
'calculate_fibonacci',
{ n: z.number().int().min(0).max(90) },
async ({ n }) => {
const fib = (x: number): number => x <= 1 ? x : fib(x-1) + fib(x-2);
return { content: [{ type: 'text', text: String(fib(n)) }] };
},
{
description: 'Calculate the nth Fibonacci number (pure computation).',
annotations: {
readOnlyHint: true, // no side effects whatsoever
idempotentHint: true, // fib(10) always returns 55
openWorldHint: false, // no network, no I/O — pure math
destructiveHint: false // cannot destroy anything
}
}
);
Claude Desktop's behavior with these annotations is immediate and visible. Tools annotated with destructiveHint: true show an amber warning dialog asking the user to confirm before execution. Tools with readOnlyHint: true may be auto-approved depending on the host's trust settings. Understanding this UX helps you write annotations that match the actual risk profile of your tools.
| Annotation | Description | Default | Set true when… | Example tools |
|---|---|---|---|---|
| readOnlyHint | No side effects; safe to call anytime | false | Tool only fetches / reads data | github_get_repo, db_query |
| destructiveHint | Modifies or deletes data irreversibly | true | Tool deletes, drops, revokes, or sends | github_delete_branch, db_drop_table |
| idempotentHint | Same args → same result; safe to retry | false | Tool is a pure function or idempotent REST op | calculate_fibonacci, format_json |
| openWorldHint | Interacts with the external world | true | Set false for pure computation only | calculate_fibonacci (false), get_weather (true) |
readOnlyHint: true but your handler writes to a database, the host won't catch it. Treat annotations as documentation that travels with the schema, and make sure they match your implementation.Zod's surface area goes far deeper than simple string and number validators. For tools that accept complex or conditional input, Zod provides cross-field validation, discriminated unions, and input transformation that runs before your handler sees the data.
The most powerful feature for multi-mode tools is z.discriminatedUnion(). Rather than accepting a grab-bag of optional parameters and writing conditional logic to validate them manually, you declare each mode as its own schema variant. Zod validates exactly the right fields for the chosen mode and rejects any cross-contamination. The resulting TypeScript types are also fully narrowed inside switch cases — no type assertions needed.
The .refine() method attaches arbitrary validation logic to any schema node. It receives the parsed value and returns true or false. When false, Zod emits the error message you provide. This is perfect for semantic constraints that cannot be expressed purely in types — like validating that a string is a compilable regex, or that a URL uses HTTPS, or that a file extension matches an expected list.
For cross-field constraints — where validity depends on the relationship between multiple fields — .superRefine() gives you access to the full object and lets you attach errors to specific paths. This produces error messages that point to the exact field that is wrong, not just the top-level object.
TypeScript — discriminated union with refine and superRefineimport { z } from 'zod';
// ── Discriminated union: search type determines valid params ──
const SearchParams = z.discriminatedUnion('type', [
z.object({
type: z.literal('text'),
query: z.string().min(1),
caseSensitive: z.boolean().default(false)
}),
z.object({
type: z.literal('regex'),
pattern: z.string().refine(p => {
try { new RegExp(p); return true; }
catch { return false; }
}, 'Invalid regex pattern'),
flags: z.string().regex(/^[gimsuy]*$/).optional()
}),
z.object({
type: z.literal('semantic'),
query: z.string(),
threshold: z.number().min(0).max(1).default(0.7),
limit: z.number().int().min(1).max(100).default(10)
})
]);
server.tool('db_search', SearchParams, async (params) => {
switch (params.type) {
case 'text': return textSearch(params.query, params.caseSensitive);
case 'regex': return regexSearch(params.pattern, params.flags);
case 'semantic': return semanticSearch(params.query, params.threshold, params.limit);
}
});
// ── Cross-field validation with superRefine ──────────────────
const DateRangeParams = z.object({
startDate: z.string().datetime(),
endDate: z.string().datetime(),
limit: z.number().int().min(1).max(1000).default(100)
}).superRefine((data, ctx) => {
if (new Date(data.endDate) <= new Date(data.startDate)) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'endDate must be after startDate',
path: ['endDate']
});
}
});
The .transform() method is your escape hatch for input normalization. It runs after validation passes and before your handler receives the value. Use it to coerce string dates into Date objects, trim and lowercase strings, parse JSON embedded in a string field, or convert a comma-separated list into an actual array. Transforms keep your handler code clean — it deals with the right types from the start, not raw strings.
Nested objects and arrays work exactly as you would expect: z.object({...}) for objects and z.array(z.string()) for lists. Recursive schemas — like a tree of nested filters — use z.lazy() to break the circular reference. For flexible types where a field could be a string or a number, z.union([z.string(), z.number()]) creates a type-safe union that Zod validates left-to-right.
isError: true, including the human-readable error message. You do not need to add try/catch around your Zod schemas — just let the SDK handle it.When a tool might return dozens, hundreds, or thousands of items, returning everything in a single response is both expensive and impractical. Cursor-based pagination lets the AI agent work through results incrementally, one page at a time.
There are two common pagination approaches: offset-based and cursor-based. Offset-based pagination uses numeric page numbers or offsets — give me items 21–40. It is simple but has a subtle failure mode with live data: if a record is inserted or deleted between two page fetches, items shift position. Page 2 might repeat an item from page 1, or skip one entirely. For static datasets this rarely matters; for live APIs it is a real problem.
Cursor-based pagination avoids this by using a stable pointer into the dataset — typically the ID or sort key of the last item returned. The next page starts from that marker, regardless of what has been inserted or deleted elsewhere. The cursor is opaque to the caller: it is encoded as a base64 string that the caller passes back verbatim. The server decodes it to recover whatever state it needs to continue.
A critical design principle: the cursor must contain everything needed to reconstruct the query. If the tool accepts a language filter, the cursor must embed that filter. Otherwise, a caller who passes a cursor from a filtered request alongside different filter args would get inconsistent pages. Encoding the full query context into the cursor makes pagination self-contained and stateless on the server side.
TypeScript — cursor-based pagination implementationinterface PageCursor {
offset: number;
filter: string;
sortKey: string;
}
function encodeCursor(cursor: PageCursor): string {
return Buffer.from(JSON.stringify(cursor)).toString('base64url');
}
function decodeCursor(encoded: string): PageCursor {
return JSON.parse(Buffer.from(encoded, 'base64url').toString());
}
server.tool(
'github_list_repositories',
{
org: z.string(),
language: z.string().optional(),
cursor: z.string().optional().describe('Pagination cursor from previous call'),
pageSize: z.number().int().min(1).max(100).default(20)
},
async ({ org, language, cursor, pageSize }) => {
const page = cursor
? decodeCursor(cursor)
: { offset: 0, filter: language ?? '', sortKey: 'updated' };
const items = await fetchRepos(org, { offset: page.offset, limit: pageSize, language });
const total = await countRepos(org, { language });
const nextCursor = page.offset + pageSize < total
? encodeCursor({ offset: page.offset + pageSize, filter: page.filter, sortKey: page.sortKey })
: null;
const lines = [
`Repositories (${page.offset + 1}–${Math.min(page.offset + items.length, total)} of ${total}):`,
...items.map(r => `- ${r.name}: ${r.description ?? 'no description'} (⭐${r.stars})`),
nextCursor ? `\n[Next page cursor: ${nextCursor}]` : '\n[End of results]'
];
return { content: [{ type: 'text', text: lines.join('\n') }] };
}
);
Some external APIs — GitHub, Slack, Linear — provide their own cursor or token strings in their responses. You do not need to decode or interpret these; just wrap them opaquely. Store the API's own pagination token inside your cursor struct and forward it on the next call. This way your pagination layer is a thin adapter, not a re-implementation of the upstream API's pagination logic.
The AI agent's workflow with pagination is straightforward: it calls the tool without a cursor to get the first page, reads the result, checks for a next-page cursor in the response text, and calls again with that cursor if it needs more results. Writing the cursor clearly into the response — like [Next page cursor: abc123] — makes it easy for the model to extract and reuse it.
decodeCursor() in a try/catch and return an isError: true response if decoding fails. Never trust cursor content without validation.A tool result's content array can hold three distinct item types: text for markdown and plain text, image for base64-encoded binary images, and resource for embedded resource references. Choosing the right type matters because hosts and models render them differently.
The text type is the workhorse. It accepts any string — plain prose, Markdown tables, code blocks, JSON. Claude renders Markdown natively, so returning a well-structured Markdown string with headers and tables is far more readable than a JSON blob. Use text for everything that is fundamentally textual: summaries, lists, structured reports, code excerpts, error messages.
The image type sends binary image data as a base64 string with an accompanying MIME type (image/png, image/jpeg, image/webp). This is how screenshot tools work — they launch a headless browser, capture the page, and return the raw PNG bytes encoded as base64. Claude receives the image as a vision input and can describe, analyze, or extract text from it. The key constraint is size: very large images should be compressed or resized before encoding.
The resource type embeds an existing resource inline, identified by its URI and MIME type. It is particularly useful when a tool produces output that is naturally a resource — a generated file, a config blob, a diff. Instead of serializing the content as a text blob, you embed it as a first-class resource that the host can route to appropriate handlers. Resources can contain either text (for text MIME types) or blob (base64-encoded binary MIME types) content.
TypeScript — all three content types// 1. Text content — markdown supported
return {
content: [{
type: 'text',
text: '## Search Results\n\n| Repo | Stars |\n|---|---|\n| my-repo | 1234 |'
}]
};
// 2. Image content — screenshot tool using Playwright
import { chromium } from 'playwright';
server.tool('capture_screenshot', { url: z.string().url() }, async ({ url }) => {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
const screenshot = await page.screenshot({ type: 'png' });
await browser.close();
return {
content: [{
type: 'image',
data: screenshot.toString('base64'),
mimeType: 'image/png'
}]
};
});
// 3. Embedded resource — return a file inline
server.tool('get_config_file', { path: z.string() }, async ({ path }) => {
const content = await fs.readFile(path, 'utf-8');
return {
content: [{
type: 'resource',
resource: {
uri: `file://${path}`,
mimeType: path.endsWith('.json') ? 'application/json' : 'text/plain',
text: content
}
}]
};
});
The isError flag is a top-level field on the tool result object (not inside a content item). Setting isError: true signals that the tool call failed, even though the response is technically well-formed. This is the correct pattern for expected errors — API rate limits, resource not found, invalid input that slipped past Zod. The model sees the error content and can decide how to proceed, rather than treating the failure as an unrecoverable exception.
A single tool call can return an array containing multiple content items — any mix of text, images, and embedded resources. This enables tools that produce genuinely rich, structured output rather than cramming everything into one text blob.
The motivating example is a PR analysis tool. A comprehensive PR review needs at least two parts: a summary of metadata (author, status, review count, changed files) and the actual diff for Claude to analyze. These are fundamentally different kinds of content. The summary is natural text. The diff is a structured artifact — it has its own syntax, its own identity as a resource, and potentially its own rendering in diff viewers. Squashing them together into one Markdown string loses the structure.
When the model receives a multi-item response, it processes each content item according to its type. Text items are read as conversation context. Image items enter the vision pipeline. Resource items can be routed to appropriate handlers or displayed inline. The model can reference and reason about all of them together, treating the response as a coherent multi-modal package.
TypeScript — multi-item PR analysis toolserver.tool(
'analyze_pr',
{ owner: z.string(), repo: z.string(), pr: z.number() },
async ({ owner, repo, pr }) => {
const [pullRequest, reviews, diff] = await Promise.all([
octokit.pulls.get({ owner, repo, pull_number: pr }),
octokit.pulls.listReviews({ owner, repo, pull_number: pr }),
octokit.pulls.get({ owner, repo, pull_number: pr, mediaType: { format: 'diff' } })
]);
return {
content: [
{
type: 'text',
text: `## PR #${pr}: ${pullRequest.data.title}\n\n` +
`**Author:** ${pullRequest.data.user?.login}\n` +
`**Status:** ${pullRequest.data.state}\n` +
`**Reviews:** ${reviews.data.length} ` +
`(${reviews.data.filter(r => r.state === 'APPROVED').length} approved)\n` +
`**Changed files:** ${pullRequest.data.changed_files}\n` +
`**+${pullRequest.data.additions} / -${pullRequest.data.deletions}`
},
{
type: 'resource',
resource: {
uri: `github://pr/${owner}/${repo}/${pr}/diff`,
mimeType: 'text/x-diff',
text: String(diff.data)
}
}
]
};
}
);
There is no hard limit on the number of content items in a response. A code generation tool might return a text explanation, three separate file resources (one per generated file), and an image of the resulting architecture diagram — all in a single call. The practical limit is the host's context window and the user's patience waiting for all the data to arrive.
| Content type | Best for | SDK field names | Example |
|---|---|---|---|
| text | Summaries, tables, code, errors | type, text | PR summary, search results, error message |
| image | Visual data, screenshots, charts | type, data, mimeType | Browser screenshot, matplotlib graph |
| resource | Structured files, diffs, blobs | type, resource{uri, mimeType, text|blob} | Git diff, JSON config, generated source file |
Production servers need logging, caching, rate limiting, timeout handling, and error tracking — but duplicating that logic in every tool handler is a maintenance nightmare. The middleware pattern wraps handlers with reusable behavior without modifying the underlying logic.
The idea is borrowed from HTTP middleware (Express, Koa, Hono): a middleware is a function that takes a handler and returns a new handler with the same signature. Because each middleware is a pure function returning another function, you can compose them freely with simple function calls. The outermost wrapper runs first, can observe inputs and outputs, and can short-circuit with a cached result before the real handler is even invoked.
Logging middleware is the most universally useful. Every production tool should log its name, its arguments, and how long it took to complete — whether it succeeded or failed. Writing this once as a composable wrapper means you can add it to any handler with a single function call, and remove it just as easily during testing.
Caching middleware is appropriate for read-only tools that hit expensive upstream APIs. A weather tool that fetches from an external service on every call will quickly exhaust your API quota. Wrapping it in a 60-second cache reduces API calls dramatically while keeping results fresh. The cache key is the JSON-serialized arguments, so different inputs get different cache entries.
TypeScript — composable middleware// middleware.ts
type ToolHandler<T> = (args: T) => Promise<{ content: any[] }>;
// Logging middleware
function withLogging<T>(name: string, handler: ToolHandler<T>): ToolHandler<T> {
return async (args) => {
const start = Date.now();
console.error(`[${name}] called with: ${JSON.stringify(args)}`);
try {
const result = await handler(args);
console.error(`[${name}] completed in ${Date.now() - start}ms`);
return result;
} catch (error) {
console.error(`[${name}] failed after ${Date.now() - start}ms:`, error);
throw error;
}
};
}
// Caching middleware
function withCache<T>(handler: ToolHandler<T>, ttlMs = 30_000): ToolHandler<T> {
const cache = new Map<string, { result: any; expires: number }>();
return async (args) => {
const key = JSON.stringify(args);
const cached = cache.get(key);
if (cached && cached.expires > Date.now()) return cached.result;
const result = await handler(args);
cache.set(key, { result, expires: Date.now() + ttlMs });
return result;
};
}
// Timeout middleware
function withTimeout<T>(handler: ToolHandler<T>, timeoutMs = 10_000): ToolHandler<T> {
return async (args) => {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
try {
return await handler(args);
} finally {
clearTimeout(timer);
}
};
}
// Compose: logging → timeout → cache → actual handler
const getWeatherHandler = withLogging('get_weather',
withTimeout(
withCache(
async ({ location }) => fetchWeather(location),
60_000
)
)
);
server.tool('get_weather', { location: z.string() }, getWeatherHandler);
Beyond these three, the pattern extends naturally to rate limiting (track call counts per time window, return an error if exceeded), authentication (check a token or permission before forwarding), retry logic (automatically retry on transient network errors), and metrics collection (report latency and error counts to your observability stack). Each concern is isolated in its own composable function.
withLogging(withTimeout(withCache(handler))) — logging is outermost so it captures total wall time including cache lookups. Timeout wraps the cache so a very slow cache backend doesn't hang forever. Cache wraps the actual handler so it only runs on misses. Think about which concern you want to be the last line of defense before understanding the composition order.When a tool takes more than a few seconds to complete — processing files, running analyses, invoking slow APIs — the user sees nothing until the result arrives. Progress tokens let the server send incremental status updates back to the client while the tool is still running.
The mechanism is a lightweight pub-sub pattern embedded in the MCP protocol. The client includes an optional _meta.progressToken field in the tool call request. This token is an opaque identifier — typically a string or integer chosen by the client. The server, as it works through its task, sends notifications/progress events that reference the same token. The client matches these events to the in-flight call and can update a progress bar, a spinner, or a status message accordingly.
The progress notification payload includes a progress field (current step), an optional total (maximum steps, for percentage calculation), and an optional message string describing the current operation. Not all clients support progress notifications, so always check whether the token was provided before attempting to send updates — a missing token means the client does not want progress events.
Multiple concurrent tool calls can each have their own progress token. The client uses the token to route incoming progress notifications to the right call. If you have two parallel file-processing tools running simultaneously, each has a unique token, and their progress events are independent. The server simply sends each notification with its respective token.
TypeScript — bulk file processing with progress notificationsserver.tool(
'bulk_process_files',
{ directory: z.string(), pattern: z.string().default('**/*.ts') },
async ({ directory, pattern }, { meta }) => {
const files = await glob(pattern, { cwd: directory });
const progressToken = meta?.progressToken;
const results: string[] = [];
for (let i = 0; i < files.length; i++) {
const file = files[i];
// Send progress notification only if the client wants it
if (progressToken !== undefined) {
await server.server.sendProgress({
progressToken,
progress: i + 1,
total: files.length,
message: `Processing ${file} (${i + 1}/${files.length})`
});
}
const analysis = await analyzeFile(path.join(directory, file));
results.push(
`${file}: ${analysis.linesOfCode} lines, complexity ${analysis.cyclomaticComplexity}`
);
}
return {
content: [{
type: 'text',
text: `Processed ${files.length} files:\n\n${results.join('\n')}`
}]
};
}
);
From the client side, progress tokens are created by the caller and injected into the request's _meta field. In practice with Claude Desktop and similar hosts, the host manages token generation and display automatically. When building your own MCP client, you allocate tokens from a counter or UUID generator, subscribe to progress notifications filtered by that token, and update your UI accordingly. When the tool call completes (whether success or error), the token's lifecycle ends.
if (progressToken !== undefined) { await sendProgress(...); } is the correct guard — not a null check, because an integer 0 is a valid token.Network failures happen at the worst possible moments — after a request has been sent but before the response arrives. For read-only tools this is harmless. For tools that send messages, create records, or charge customers, a naive retry causes duplicate side effects. Idempotency keys solve this.
The pattern is simple: the caller includes an optional unique key with each state-mutating request. The server stores the key alongside the result of the first execution. On subsequent calls with the same key, the server returns the stored result immediately without executing the operation again. The caller cannot tell from the response whether it got a fresh execution or a cached replay — both look identical.
The key lifetime must be long enough to cover realistic retry windows. Payment systems typically keep keys for 24–72 hours. For MCP tools, 24 hours is a reasonable default — it covers any plausible retry scenario while keeping the in-memory store manageable. In production, this store should be backed by Redis or a database, not an in-process Map, to survive server restarts.
Optionally, return a wasNew flag in the response text so callers know whether their call caused a side effect. This is useful for debugging and auditing. A monitoring dashboard can track what fraction of calls are replays versus first executions.
TypeScript — idempotency store and tool// idempotency-store.ts
class IdempotencyStore {
private store = new Map<string, { result: any; createdAt: number }>();
private TTL = 24 * 60 * 60 * 1000; // 24 hours
async getOrExecute(
key: string,
fn: () => Promise<any>
): Promise<{ result: any; wasNew: boolean }> {
const existing = this.store.get(key);
if (existing && existing.createdAt + this.TTL > Date.now()) {
return { result: existing.result, wasNew: false };
}
const result = await fn();
this.store.set(key, { result, createdAt: Date.now() });
return { result, wasNew: true };
}
}
const idempotency = new IdempotencyStore();
server.tool(
'slack_send_message',
{
channel: z.string(),
text: z.string(),
idempotencyKey: z.string().optional()
.describe('Optional key to prevent duplicate messages on retry')
},
async ({ channel, text, idempotencyKey }) => {
const key = idempotencyKey ?? crypto.randomUUID();
const { result, wasNew } = await idempotency.getOrExecute(key, () =>
slackClient.chat.postMessage({ channel, text })
);
return {
content: [{
type: 'text',
text: wasNew
? `Message sent (ts: ${result.ts})`
: `Duplicate request — original message (ts: ${result.ts})`
}]
};
},
{ annotations: { idempotentHint: false } }
);
Notice that idempotentHint is set to false in the annotations, even though idempotency keys are supported. The annotation describes the tool's default behavior without a key — sending the same message twice without a key would create two Slack messages, which is inherently non-idempotent. The key is an opt-in mechanism, not the default mode.
idempotencyKey: undefined), generate a fresh crypto.randomUUID() as the key. This ensures that uncached calls are always executed, while still giving the idempotency store a well-defined key for each operation. Never reuse keys across different intended operations.Large, monolithic tool handlers are difficult to test, hard to reason about, and fragile when any one dependency fails. Composing tools from small, pure data-fetching functions produces more testable, more resilient, and more maintainable code.
The key insight is to separate the data-fetching layer from the orchestration layer. Pure functions that fetch a single piece of data — getRepo(), getOpenIssues(), getRecentCommits() — are trivially testable in isolation. You can unit-test each one with a mocked HTTP client without spinning up an MCP server. The orchestrating tool handler then calls these pure functions and formats the results, keeping its own complexity low.
Promise.allSettled() is the composing function of choice here. Unlike Promise.all(), which rejects the entire batch if any promise fails, Promise.allSettled() waits for every promise and reports individual outcomes. This means a composed tool can still return partial results even if one sub-call fails — maybe the commit history API is down, but the repo metadata and issues are fine. The user gets useful information instead of a complete failure.
TypeScript — composed GitHub repo summary tool// lib/github.ts — pure data-fetching functions (each independently testable)
export async function getRepo(octokit: Octokit, owner: string, repo: string) { /* ... */ }
export async function getOpenIssues(octokit: Octokit, owner: string, repo: string, limit = 10) { /* ... */ }
export async function getRecentCommits(octokit: Octokit, owner: string, repo: string, days = 7) { /* ... */ }
export async function getContributors(octokit: Octokit, owner: string, repo: string) { /* ... */ }
// tools.ts — orchestrating tool handler
server.tool(
'github_repo_summary',
{ owner: z.string(), repo: z.string() },
async ({ owner, repo }) => {
// Run all sub-calls concurrently; partial failures are handled gracefully
const [repoData, issues, commits, contributors] = await Promise.allSettled([
getRepo(octokit, owner, repo),
getOpenIssues(octokit, owner, repo),
getRecentCommits(octokit, owner, repo),
getContributors(octokit, owner, repo)
]);
const lines: string[] = [];
if (repoData.status === 'fulfilled') {
const r = repoData.value;
lines.push(
`# ${r.full_name}`,
`${r.description ?? 'No description'}`,
`⭐${r.stargazers_count} · 🍴${r.forks_count} · ${r.language ?? 'unknown'}`
);
}
if (issues.status === 'fulfilled') {
lines.push(`\n## Open Issues (${issues.value.length})`);
issues.value.forEach(i => lines.push(`- #${i.number}: ${i.title}`));
}
if (commits.status === 'fulfilled') {
lines.push(`\n## Recent Commits (last 7 days: ${commits.value.length})`);
}
if (contributors.status === 'fulfilled') {
lines.push(
`\n## Top Contributors: ${contributors.value.slice(0,3).map(c => c.login).join(', ')}`
);
}
return { content: [{ type: 'text', text: lines.join('\n') }] };
}
);
The test story for composed tools is excellent. Each pure fetcher function can be tested with a mocked octokit object that returns fixture data. The orchestrating handler can be tested end-to-end by mocking the fetchers themselves. Because each layer has a clean boundary, you can replace a real API call with a mock at exactly the right level of abstraction.
Concurrency is another win. Because the sub-calls are independent, they run in parallel with Promise.allSettled(). A tool that makes four API calls sequentially might take 2–4 seconds. Running them concurrently brings that down to the time of the slowest single call — often under a second. For tools that compose many data sources, the concurrency benefit compounds.
AI models and host applications may cache tool schemas between sessions. When you rename a tool, change a required parameter, or alter a return shape, you risk breaking callers that were working the day before. Safe tool evolution requires a clear versioning strategy.
Not all changes are equal. Adding an optional parameter is safe — existing callers do not pass it, and the schema still validates their calls. Adding a new tool entirely is safe — nothing breaks because nothing was relying on the new tool before it existed. These are backwards-compatible changes you can ship freely.
Renaming a tool is a breaking change. Any model or application that learned the old name will fail to find it. The right migration path is to keep the old name as a deprecated wrapper that delegates to the new implementation, mark it clearly in its description, log a warning to stderr when it is called, and remove it only after a full major version cycle. This gives users time to update their configurations and prompts.
Removing a required parameter or changing a parameter's type are both breaking. If you remove a parameter, callers that still pass it will either fail validation or silently pass unexpected data. If you change a type — say, from z.string() to z.number() for a port field — existing callers passing strings will fail. These changes require a major version bump and an explicit migration period.
| Change type | Safe? | Strategy |
|---|---|---|
| Add optional parameter | ✅ Safe | Ship immediately; existing calls still validate |
| Add a new tool | ✅ Safe | Ship immediately; no existing caller depends on it |
| Improve tool description | ✅ Safe | Ship immediately; better descriptions help the model |
| Rename a tool | ⚠️ Breaking | Keep old name as deprecated wrapper for one major version |
| Remove a required parameter | ❌ Breaking | Make it optional first, deprecate, then remove in next major |
| Change a parameter type | ❌ Breaking | Add new param with new type, deprecate old, remove in next major |
TypeScript — deprecated wrapper forwarding to new tool// Deprecated v1 tool — kept for one major version cycle
server.tool(
'search', // old name without service prefix
{ q: z.string() },
async ({ q }) => {
console.error('[DEPRECATED] "search" called — update to "github_search_repos"');
return githubSearchRepos({ query: q, limit: 10 }); // delegate to new implementation
},
{
description: '[DEPRECATED] Use github_search_repos instead. Will be removed in v2.0.',
annotations: { readOnlyHint: true }
}
);
// Current v2 tool — properly namespaced
server.tool(
'github_search_repos',
{ query: z.string(), limit: z.number().int().min(1).max(30).default(10) },
async (args) => githubSearchRepos(args),
{ description: 'Search GitHub repositories by keyword, language, stars, or owner.' }
);
The deprecation message in the tool description serves a dual purpose. It informs human operators who read the tool list, and it also appears in the model's context when it is deciding which tool to call. A well-written deprecation message — "[DEPRECATED] Use github_search_repos instead" — will cause the model to prefer the new tool in future calls even without any other configuration change.
The advanced patterns covered in this lesson each require specialized test strategies. Pagination must be tested as a round-trip — encode cursor, decode cursor, verify no overlap between pages. Idempotency keys must be tested for both the first-call and the replay cases. Content types need to be verified for correct structure.
In-process testing is the fastest approach for tool logic. The TypeScript SDK provides a client/server pair that communicates in memory, with no network overhead. You spin up your server, connect a client, and call tools exactly as a real host would. The full MCP protocol is exercised — schema validation, error handling, content type serialization — without the latency of a real network or the complexity of a subprocess.
For the pagination test specifically, the key assertion is that there is no overlap between pages. Extract the cursor from page 1's response text using a regex, use it to fetch page 2, and verify that the items in page 2 do not appear in page 1. Also verify that the last page has no next-cursor in its response — the [End of results] marker.
Idempotency tests need to call the tool twice with the same key and verify that the second call returns the Duplicate request marker, not a fresh execution result. This proves the store is working. A third test should verify that two calls with different keys each produce fresh results — the store is not accidentally caching everything.
TypeScript — advanced-tools.test.ts (Vitest)import { describe, it, expect, vi, beforeEach } from 'vitest';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { createTestTransportPair } from './test-utils.js';
describe('Advanced tool patterns', () => {
let client: Client;
let server: McpServer;
beforeEach(async () => {
server = createServer();
client = await connectInProcess(server);
});
it('returns correct content type for screenshot tool', async () => {
const result = await client.callTool({
name: 'capture_screenshot',
arguments: { url: 'https://example.com' }
});
expect(result.content).toHaveLength(1);
expect(result.content[0].type).toBe('image');
expect(result.content[0].mimeType).toBe('image/png');
expect(result.content[0].data).toMatch(/^[A-Za-z0-9+/]+=*$/); // valid base64
});
it('paginates correctly with cursor', async () => {
const page1 = await client.callTool({
name: 'github_list_repositories',
arguments: { org: 'test-org', pageSize: 5 }
});
const page1Text = page1.content[0].text as string;
const cursorMatch = page1Text.match(/Next page cursor: ([\w-]+)/);
expect(cursorMatch).toBeTruthy();
const page2 = await client.callTool({
name: 'github_list_repositories',
arguments: { org: 'test-org', cursor: cursorMatch![1], pageSize: 5 }
});
const page2Text = page2.content[0].text as string;
// Ensure no items from page 1 appear on page 2
expect(page2Text).not.toContain(page1Text.split('\n')[1]);
});
it('deduplicates with idempotency key', async () => {
const key = crypto.randomUUID();
const r1 = await client.callTool({
name: 'slack_send_message',
arguments: { channel: '#test', text: 'hello', idempotencyKey: key }
});
const r2 = await client.callTool({
name: 'slack_send_message',
arguments: { channel: '#test', text: 'hello', idempotencyKey: key }
});
expect(r1.content[0].text).toContain('Message sent');
expect(r2.content[0].text).toContain('Duplicate request');
});
it('validates discriminated union correctly', async () => {
const result = await client.callTool({
name: 'db_search',
arguments: { type: 'regex', pattern: '[invalid' }
});
expect(result.isError).toBe(true);
expect(result.content[0].text).toContain('Invalid regex');
});
});
For middleware testing, test each middleware function independently before testing the composed stack. A withCache test should verify that a second call with the same arguments does not invoke the underlying handler. A withTimeout test should mock a slow handler and verify that the timeout fires. Composing already-tested middleware gives you confidence that the composition is correct without rerunning the underlying logic tests.
This lesson covered the full advanced toolkit for MCP tool authors. Before shipping any production tool, run through the checklist below. Each item corresponds to a pattern covered in detail above.
z.discriminatedUnion() rather than a flat schema with many optionals.
type: 'image' fields, not as ASCII art in text fields. The model's vision capability works far better with real images.
_meta.progressToken and send incremental notifications.
idempotencyKey parameter and deduplicate against it.
| Content type | Use case | SDK field names | Example |
|---|---|---|---|
| text | Markdown summaries, tables, code, errors, plain prose | type: 'text', text: string | PR summary, search results table, error message |
| image | Screenshots, charts, diagrams, OCR source material | type: 'image', data: string (base64), mimeType: string | Playwright screenshot, matplotlib graph, camera capture |
| resource | Config files, diffs, generated source, binary blobs with identity | type: 'resource', resource: { uri, mimeType, text|blob } | Git diff, JSON config, TypeScript source file |