You've mastered Tools (model acts) and Resources (model reads). Today you complete the trilogy with Prompts — reusable conversation templates the user controls — and dive into Sampling, MCP's breakthrough capability that lets servers ask the AI to think, unlocking true agentic multi-step workflows.
Over the last five days you've built up two-thirds of the MCP primitive trinity. Today we complete the picture and understand how all three work together as a coherent system — each filling a gap the others can't.
The control model is the key to understanding each primitive's purpose. Tools are invoked autonomously by the AI mid-reasoning — the user may not even notice. Resources are attached by the application layer, often before the conversation starts. Prompts are explicitly chosen by the human from a visible menu — they're a UX affordance, a way for server developers to package expert workflows that users can discover and trigger intentionally.
An MCP Prompt is a named, parameterized conversation template. When a user selects a prompt from Claude Desktop's slash-command menu (or any MCP-compatible host), the client calls prompts/get with any argument values the user filled in. The server returns a structured list of messages — user and assistant role turns — that seed the conversation with exactly the right context for the task.
Think of a Prompt as a macro for conversations. Instead of the user laboriously typing "Please analyze the pull request at github.com/org/repo/pull/42 and list every security concern in the changed files, grouped by severity", they select your security-review prompt, type the PR URL, and the host does the rest.
Imagine a law firm where each case type has a standard intake form — "Personal Injury", "Contract Dispute", "IP Infringement". The receptionist doesn't type a unique description from scratch every time. They select the right form, fill in the client-specific fields (name, case number, dates), and the system pre-populates the rest. MCP Prompts work identically: the server defines the template and required fields; the user fills in the variables; the host injects the fully-expanded conversation into Claude's context.
sequenceDiagram
participant U as 👤 User
participant H as Host App
participant C as MCP Client
participant S as MCP Server
participant M as Claude Model
C->>S: prompts/list
S-->>C: [{name, description, arguments}, ...]
H->>U: Show prompt menu (slash commands)
U->>H: Select "security-review", enter PR URL
C->>S: prompts/get { name: "security-review", arguments: { pr_url: "..." } }
S-->>C: { messages: [{ role: "user", content: "..." }] }
H->>M: Inject messages into conversation context
M->>U: Responds with security analysis
A system prompt is set by the host application and sets the AI's persona/rules. A tool description tells the model what a tool does. An MCP Prompt is user-selectable and injects a full conversation structure — potentially multiple turns of user/assistant messages — tailored to a specific workflow. They operate at different layers and serve different purposes. Don't confuse them.
Every MCP Prompt has four components. Understanding each one's role is the foundation of building prompts that work reliably across different host applications.
security-review, write-unit-tests, explain-code. The model never sees this directly — it's purely for user discovery.name, optional description, and required: boolean. Arguments are plain strings — no type system, no Zod. Validation is your handler's job.{ role: "user" | "assistant", content: TextContent | ImageContent | EmbeddedResource } objects. This sequence seeds the conversation. You can include multiple turns — pre-computed assistant reasoning, staged user questions, context documents — anything that sets Claude up for success.Prompt arguments are simpler than Tool input schemas — they are untyped strings with no Zod validation layer. The host renders them as text input fields in the prompt UI. Your handler receives a Record<string, string> and is responsible for validating values, handling missing optionals, and interpolating them into the message content.
Registering a prompt uses server.prompt(). The API mirrors server.tool() — name, description, argument schema (as a plain object, not Zod), and an async handler that returns the message array.
The power of prompts is that your handler runs before Claude sees anything. Fetch the PR diff, read the database record, load the file — attach all the necessary context in the messages array. Claude receives a fully loaded conversation and can immediately deliver insight. If you only pass a URL and ask Claude to "fetch it" (without tool access), you've just written a worse system prompt.
Sampling is MCP's most powerful — and most misunderstood — feature. While every other primitive is about data flow from server to model, sampling reverses the direction: it lets a server ask the AI model to generate a response.
In practical terms: your tool handler is executing, it encounters a step that requires language model intelligence — summarizing a document, classifying text, generating code — and instead of hardcoding logic, it fires a sampling/createMessage request. The MCP client forwards this to Claude, gets the response, and hands it back to your handler. Your server now has AI reasoning as a runtime capability.
Normal MCP flow: User → Claude → calls your tool → you return data → Claude answers user.
Sampling flow: User → Claude → calls your tool → your tool calls Claude back for a sub-task → Claude responds → you use that response in your tool's final answer → Claude answers user.
It's like a contractor (your server) calling an expert consultant (Claude) mid-job. The expert gives advice, the contractor incorporates it, then delivers the finished work back to the client. This is what makes agentic, multi-step workflows possible without building your own AI infrastructure.
MCP's spec requires that the host application controls sampling — not the server. When your server sends sampling/createMessage, the request goes to the client, which shows it to the user for approval before passing it to the model. The user can see what the server is asking the AI to do and can reject it. This "human-in-the-loop" design prevents runaway agentic loops and is a core safety guarantee. Never design systems that assume sampling calls are invisible or auto-approved.
server.server.createMessage() with a messages array and model preferences. This sends a sampling/createMessage request to the client.CreateMessageResult.The sampling/createMessage request accepts these parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| messages | SamplingMessage[] | Required | The conversation to send to the model. Each message has role ("user" | "assistant") and content (text or image). |
| modelPreferences | ModelPreferences | Optional | Hints about model selection: costPriority, speedPriority, intelligencePriority (0–1 each). The client chooses the best-matching model; you can't force a specific model. |
| systemPrompt | string | Optional | System instructions for this specific AI call. Separate from the host's system prompt — gives the sub-task its own context window framing. |
| includeContext | "none" | "thisServer" | "allServers" | Optional | Whether to include MCP context from the current session in the sampling call. "none" = clean context; "thisServer" = include this server's resources; "allServers" = everything. |
| temperature | number (0–1) | Optional | Sampling temperature. Lower = more deterministic (classification). Higher = more creative (generation). Defaults to model default. |
| maxTokens | number | Required | Maximum tokens for the model's response. Always set this explicitly — prevent accidental runaway generation in sub-tasks. |
| stopSequences | string[] | Optional | Strings that stop generation early. Useful for structured outputs: ["```", "###END###"]. |
Let's make sampling concrete with two real examples: a classification task (where sampling replaces brittle keyword matching), and an enrichment task (where sampling augments raw data with AI analysis).
1. Sampling in every tool — not every tool needs AI reasoning. Use sampling for classification, summarization, and generation; never for data retrieval or simple logic. 2. Large maxTokens in sub-tasks — sampling is billed to the user. Keep sub-task token budgets tight. 3. Infinite sampling loops — never chain sampling results into more sampling calls without a strict depth limit. Runaway loops burn tokens fast. 4. Parsing unvalidated JSON — always wrap JSON.parse(result.text) in try/catch. Models sometimes add prose before the JSON.
Here are four production-grade prompt and sampling patterns that show up across MCP servers in the wild. Each solves a specific, common problem.
Five questions covering Prompts, Sampling, and their correct use patterns.
Select one answer per question, then submit to see your score.
temperature and modelPreferences settings are most appropriate?"Sure! Here's the JSON:\n```json\n{\"status\": \"ok\"}\n```". What should your handler do?JSON.parse() directly on such a string throws a SyntaxError. Always extract the JSON block first using a regex or a utility function, then parse. Wrap everything in try/catch in case the model's output is malformed.