📚 Domain 2 · Task Statement 2.1

Design Effective Tool Interfaces with Clear Descriptions and Boundaries

⏳ 📊 Domain Weight: 18% 🎯 Difficulty: Core Concept 🔗 Scenarios: Customer Support & Multi-Agent Research

If Domain 1 is the skeleton of an agentic system, Domain 2 is its nervous system. Tools are how Claude perceives and acts upon the world. Without well-designed tool definitions and crystal-clear descriptions, even the most robust architecture will fail due to unreliable tool selection. This guide covers every dimension of Task 2.1: how Claude selects tools, the anatomy of a production-grade description, naming alignment, single responsibility, system prompt interference, and the danger of over-provisioning.

📋 Contents

The Big Picture: A Real-World Analogy
How Claude Selects Tools: The Description Dependency
Anatomy of a Production-Grade Tool Description
The Naming-Description Alignment Problem
Single Responsibility & Constrained Tools
System Prompt Interference & Keyword Bias
The Over-Provisioning Danger
Code Patterns: Minimal vs Production Tool Definitions
Anti-Patterns to Avoid
Exam Readiness & Key Takeaways

📖 The Big Picture: A Real-World Analogy

🍳 Analogy — The Hospital Triage System

Imagine a hospital emergency department with three check-in windows: "General Admissions," "Cardiology Emergencies," and "Trauma Bay." If all three windows simply had a sign reading "Patient Check-In" with no further description, incoming patients would queue at random windows, causing dangerous delays when a heart attack patient lands at General Admissions.

A well-designed triage system gives each window a precise sign: "Cardiology Emergencies — Chest pain, shortness of breath, suspected heart attack. NOT for broken bones or lacerations — those go to Trauma Bay."

Tool descriptions serve exactly this function. Claude reads them to make routing decisions. Vague descriptions create dangerous "wrong-window" tool calls in production systems.

This analogy maps directly to the exam scenario: a Customer Support Resolution Agent with tools get_customer, lookup_order, process_refund, and escalate_to_human. If get_customer and lookup_order both say "retrieves information," Claude may call either one for any query about customers or orders, with no reliable routing guarantee.

🔍 How Claude Selects Tools: The Description Dependency

Claude's tool selection is not based on function name alone, nor on code-level metadata. It is driven entirely by natural language understanding of the tool description field. When Claude receives a request, it performs this reasoning process:

Read the user's request in the context of the full conversation history.
Read all tool descriptions as natural language to understand each tool's purpose, accepted inputs, edge cases, and boundaries.
Determine which tool (or combination of tools) best matches the intent of this request.
Emit a tool_use block for the selected tool(s).

🎯 Exam Focus — The Root Cause Pattern

Whenever an exam question asks "Why is Claude calling get_customer when a user asks about their order status?" or "Why is Claude misrouting between two similar tools?", the correct answer is almost always: ambiguous or insufficiently differentiated tool descriptions that fail to establish a clear boundary between the two tools.

The fix is always to enhance descriptions — not to add more tools, not to modify the system prompt's persona, and not to add iteration caps.

The Description is the Contract

Think of the tool description as a legal contract between the tool author and the model. It must specify:

What the tool does (and only does)
What format of inputs it accepts
What it returns
When to use it instead of a similar tool
When not to use it, with an explicit pointer to the correct alternative

Minimal descriptions break this contract. A description of "Retrieves customer information" is unenforceable — Claude cannot reliably determine if "What is the status of my order?" falls within that scope or not.

🧬 Anatomy of a Production-Grade Tool Description

A production-grade tool description must contain four elements. Missing any one of them degrades reliability:

Element	Purpose	Example
1. Exact Purpose	Unambiguous explanation of what the tool accomplishes — in one tight sentence.	"Retrieves a customer's account profile and contact details by their unique customer ID."
2. Input Format	Precise format requirements with a concrete example identifier, not just a type declaration.	"customer_id must be in the format CUST-XXXXX (e.g., CUST-00123). Do not pass email addresses or order IDs to this tool."
3. Edge Cases & What It Returns	What is returned on success and what happens on partial data — prevents misinterpretation of empty results.	"Returns full account object. If ID is not found, returns {found: false} — this is a valid empty result, not an error."
4. Explicit Boundary Guidance	Negative constraint telling Claude when NOT to use this tool, and pointing to the correct tool instead.	"Do NOT use this tool to look up order status or refund details. For orders, use lookup_order. For refunds, use process_refund."

💡 The Boundary Rule is the Most Important Element

Of the four elements, Explicit Boundary Guidance is most commonly missing and most commonly the root cause of misrouting. Claude has no automatic understanding of what a tool does not do. You must state it explicitly. Cross-tool disambiguation ("use X for Y, use Z for W") is what converts ambiguous descriptions into reliable routing.

Figure 1 — Minimal vs Production-Grade Tool Description Comparison

🏷️ The Naming-Description Alignment Problem

Claude reads both the function name and the description when making tool selection decisions. The name is the first signal Claude encounters. If the name is generic (e.g., analyze_content), Claude must rely entirely on the description for disambiguation. But when the name is misleading relative to the description, it creates cognitive friction — and LLMs fail under cognitive friction.

The solution is Name-Description Alignment: ensure the tool name itself communicates the precise purpose, so Claude can often make a correct initial guess before even finishing reading the description.

Generic Name (Avoid)	Aligned Name (Use)	Why It Helps
analyze_content	extract_web_results	Signals the scope (web) and the action (extract) before the description is read.
analyze_document	summarize_uploaded_pdf	Differentiates from web-content tools and specifies the file type and action.
query_database	lookup_customer_by_email	Constrains the lookup pattern to a specific entity and key type.
process_item	process_refund_for_order	Makes the action and target entity explicit, preventing use for non-refund payment operations.

💡 Naming Convention: Verb + Noun + Qualifier

A reliable naming pattern is Verb + Noun + Qualifier: what action? on what entity? with what constraint? For example: lookup_order_by_id (lookup=verb, order=noun, by_id=qualifier). This pattern alone eliminates 80% of name-level ambiguity before the description is even read.

🎯 Exam Focus — Rename + Re-describe

When an exam question asks "what is the BEST fix for tool misrouting between analyze_content and analyze_document?", the correct answer combines both renaming AND re-describing — renaming to extract_web_results and updating the description to explicitly state it is for web-sourced content only. Renaming alone, or re-describing alone, is the incomplete answer that appears as a distractor.

🎯 Single Responsibility & Constrained Tools

A tool that attempts to serve multiple different use cases creates decision paralysis. Consider a single analyze_document tool that: extracts structured data, summarizes content, AND verifies claims against source material. Claude must infer from user context which job to perform, and inference errors compound across multi-step workflows.

The principle of Single Responsibility from software engineering applies directly: each tool should do exactly one thing, expressed precisely in its name and description. This enables:

Precise input schemas tailored to the single use case
Output contracts that are always consistent (no conditional response shapes)
Cleaner error messages when the tool is misused
Easy testing of tool behavior in isolation

💡 Pro Tip: The Split Pattern

Replace one generic analyze_document tool with three purpose-specific tools:

extract_data_points — returns structured key-value pairs
summarize_content — returns a concise prose summary
verify_claim_against_source — returns a boolean verdict with evidence

Claude can now reliably distinguish between these tools because they have non-overlapping purposes, names, and schemas.

The Constrained Tool Pattern

For cases where you cannot decompose a tool (e.g., a general-purpose API you don't control), use a constrained wrapper tool. Instead of exposing fetch_url directly to the agent, expose load_internal_document that internally calls fetch_url but validates the URL against your document whitelist. The wrapper tool has a restricted, precise description. Claude sees the constrained tool, not the raw capability.

Python — Constrained Wrapper Tool Pattern

# Instead of exposing fetch_url directly:
# description: "Fetches content from any URL"  <-- TOO BROAD

# Expose a constrained wrapper:
tools = [{
    "name": "load_internal_document",
    "description": (
        "Loads the full text of an internal company document by its document URL. "
        "ONLY accepts URLs from the internal document store (docs.example.com). "
        "Returns the full document text. "
        "Do NOT use this for external web pages or APIs — use web_search for those. "
        "Document URL format: https://docs.example.com/<doc-id>"
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "document_url": {
                "type": "string",
                "description": "The internal document URL (must start with https://docs.example.com/)"
            }
        },
        "required": ["document_url"]
    }
}]

🠩 System Prompt Interference & Keyword Bias

Claude reads the system prompt and the tool descriptions together as a unified context. This creates a subtle but dangerous risk: keywords in the system prompt can create unintended associations with specific tool names, biasing Claude toward calling certain tools even when they are not appropriate.

The Keyword Association Problem

Consider this system prompt fragment:

⚠️ Problematic System Prompt

"You are a customer service agent. When customers mention documents, always analyze them thoroughly before responding. Make sure you analyze every document the customer provides."

If a tool named analyze_document exists in the toolkit, the repeated word "analyze" in the system prompt artificially biases Claude toward invoking that tool — even when the user simply says "I have a document about your return policy" and just wants a text response, not a document analysis.

Three Types of System Prompt Interference

Type	Example	Fix
Keyword Echo	System prompt says "analyze" → biases toward `analyze_document`	Replace generic verbs in system prompt with specific phrases: "review the customer's uploaded file" instead of "analyze documents"
Entity Overlap	System prompt says "look up account details" → could match both `get_customer` and `lookup_order`	Add negative constraints to tool descriptions that explicitly call out the overlap
Always-Do Instructions	"Always search before answering" → causes web_search calls even for factual memory questions	Qualify always-do instructions: "Always search when the user asks about current events or real-time data"

💡 Audit Technique: The Keyword Scan

After writing your system prompt, list all verbs and nouns. Check each one against your tool names. Any word in the system prompt that appears in a tool name or description creates a potential unintended association. Either rename the tool to reduce overlap, or qualify the system prompt instruction to specify exactly when the behavior applies.

📦 The Over-Provisioning Danger

A counterintuitive but exam-critical fact: providing an agent with more tools generally makes it LESS reliable. This violates the intuition that "more capability = more performance." Here is why it fails:

Decision complexity: Selecting from 18 tools vs 4 tools is not 4.5x harder — it is exponentially harder because tool descriptions must now be differentiated across a much larger comparison space.
Description dilution: With 18 tools, the descriptions of similar tools often begin to overlap, increasing misrouting even when each description is individually well-written.
Specialization mismatch: A synthesis agent given web search tools will occasionally use them, even when its role is analysis. Tools outside an agent's specialization create behavioral noise.

⚠️ The 4-5 Tool Benchmark

The exam guide explicitly states that 4-5 focused tools per agent is the target for reliable tool selection. If an agent has significantly more than 5 tools, you must reassess: Does it need all those tools for its specific role? Can the agent be split into more specialized subagents, each with its own constrained tool set?

The Scoped Tool Access Pattern

Instead of giving all agents all tools, implement scoped tool access:

Agent Role	Allowed Tools	Cross-Role Tools (Limited)
Search Agent	`web_search`, `crawl_page`, `filter_results`	`verify_fact` (shared)
Analysis Agent	`extract_data_points`, `summarize_content`	`verify_fact` (shared)
Synthesis Agent	`merge_findings`, `generate_report`	`verify_fact` (shared)
CS Agent	`get_customer`, `lookup_order`, `process_refund`, `escalate_to_human`	None

The verify_fact cross-role tool is an example of a scoped cross-role tool: a shared utility needed frequently by multiple agents, exposed selectively rather than to all agents wholesale.

💻 Code Patterns: Minimal vs Production Tool Definitions

Below are side-by-side implementations of tool definitions — the minimal version that causes misrouting, and the production-grade version used in the customer support scenario.

Python — ❌ MINIMAL Tool Definitions (Anti-Pattern)

# ❌ ANTI-PATTERN: Minimal descriptions with no boundaries
tools_minimal = [
    {
        "name": "get_customer",
        "description": "Retrieves customer information",  # TOO VAGUE
        "input_schema": {
            "type": "object",
            "properties": {
                "identifier": {"type": "string"}  # No format spec
            },
            "required": ["identifier"]
        }
    },
    {
        "name": "lookup_order",
        "description": "Retrieves order details",  # Near-identical to get_customer
        "input_schema": {
            "type": "object",
            "properties": {
                "identifier": {"type": "string"}  # Same schema name = maximum ambiguity
            },
            "required": ["identifier"]
        }
    }
]
# Result: Claude cannot reliably distinguish these tools for queries like
# "What is the status of my account?" — will misroute unpredictably.

Python — ✓ PRODUCTION-GRADE Tool Definitions

# ✅ PRODUCTION PATTERN: Complete descriptions with boundary guidance
tools_production = [
    {
        "name": "get_customer",
        "description": (
            "Retrieves a customer's account profile from the CRM by their unique customer ID. "
            "Returns: name, email, phone, account status, and subscription tier. "
            "customer_id format: CUST-XXXXX (example: CUST-00123). "
            "Returns {found: false} if the customer ID does not exist — this is a VALID result, not an error. "
            "Use this tool ONLY for account-level queries (profile, status, contact info). "
            "Do NOT use for order status, refund history, or payment details — use lookup_order instead."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {  # Specific name, not generic "identifier"
                    "type": "string",
                    "description": "Customer ID in format CUST-XXXXX"
                }
            },
            "required": ["customer_id"]
        }
    },
    {
        "name": "lookup_order",
        "description": (
            "Retrieves a specific order record from the order management system by its order ID. "
            "Returns: line items, quantities, prices, shipment status, tracking number, and expected delivery date. "
            "order_id format: ORD-XXXXX (example: ORD-98765). "
            "Returns {found: false} for non-existent orders — this is a valid empty result, not an error. "
            "Use this tool ONLY for order-level queries (status, tracking, items, shipping). "
            "Do NOT use for customer account or profile queries — use get_customer for those."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "order_id": {  # Distinct from customer_id — no overlap
                    "type": "string",
                    "description": "Order ID in format ORD-XXXXX"
                }
            },
            "required": ["order_id"]
        }
    },
    {
        "name": "process_refund",
        "description": (
            "Initiates a monetary refund for a specific order. Requires BOTH a verified customer ID AND order ID. "
            "PREREQUISITE: get_customer must succeed before this tool can be used — you must have a verified CUST-XXXXX. "
            "refund_amount must be a positive float in USD (e.g., 29.99). Do not include currency symbols. "
            "Returns: refund confirmation ID and estimated processing time. "
            "Do NOT use if the customer account is in suspended status — escalate instead."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "customer_id": {"type": "string", "description": "Verified CUST-XXXXX from get_customer"},
                "order_id": {"type": "string", "description": "Order ID from lookup_order (ORD-XXXXX)"},
                "refund_amount": {"type": "number", "description": "Amount in USD, e.g. 29.99"},
                "reason": {"type": "string", "description": "Brief reason for refund"}
            },
            "required": ["customer_id", "order_id", "refund_amount", "reason"]
        }
    }
]

⛔ Anti-Patterns to Avoid

⛔ Generic Descriptions

Writing descriptions like "retrieves information" or "processes the request." These provide no basis for disambiguation and are the root cause of tool misrouting.

⛔ Missing Boundary Guidance

Describing what a tool DOES without stating what it does NOT do. Without negative constraints, Claude will use any tool that seems plausible for a given context.

⛔ Generic Input Parameter Names

Using identifier or id as parameter names across multiple tools. Use specific names like customer_id and order_id to reinforce the boundary at the schema level.

⛔ Over-Provisioning (18+ Tools)

Giving a single agent access to every available tool "just in case." This degrades selection reliability dramatically. Scope each agent to 4-5 focused tools.

⛔ Multi-Purpose Tools

Tools that "analyze OR summarize OR verify" based on an internal mode parameter. Each purpose needs its own dedicated tool with its own name and schema.

⛔ Keyword-Sensitive System Prompts

System prompts that use verbs echoing tool names without qualifiers (e.g., "always analyze"). These create unintended tool bias that overrides well-written descriptions.

✓ Full Description with 4 Elements

Every production tool description includes: exact purpose, input format with example, edge case handling, and explicit boundary guidance with pointers to alternative tools.

✓ Name-Description Alignment

Tool names use Verb+Noun+Qualifier pattern: lookup_order_by_id, extract_web_results. The name communicates purpose before the description is read.

✓ Scoped Tool Sets Per Agent

Each agent receives only the 4-5 tools relevant to its defined role. Cross-role tools are shared selectively for high-frequency needs only.

✅ Exam Readiness & Key Takeaways

🎓 Exam Scenario Context — Customer Support & Multi-Agent Research

Customer Support: Agent has get_customer, lookup_order, process_refund, escalate_to_human. Questions test whether you can identify why Claude is misrouting (answer: overlapping descriptions) and prescribe the fix (answer: add boundary guidance and rename tools).

Multi-Agent Research: A synthesis agent is incorrectly calling web_search tools. Questions test whether you can identify why (answer: over-provisioning — synthesis agent has too many tools) and how to fix (answer: restrict synthesis agent to synthesis-only tools, add cross-role verify_fact as limited shared tool).

Tool descriptions are the PRIMARY mechanism for tool selection. Claude reads descriptions as natural language to route tool calls. Minimal descriptions are the #1 root cause of tool misrouting in production.

Four required elements in every production-grade description: (1) Exact purpose, (2) Input format with example (e.g., CUST-XXXXX), (3) Edge case handling including empty results, (4) Explicit boundary guidance with pointers to alternative tools.

Name-Description Alignment is mandatory. Rename generic tools (analyze_content → extract_web_results). The Verb+Noun+Qualifier naming pattern reduces ambiguity before Claude even reads the description. Renaming + re-describing is always better than either alone.

Single Responsibility Principle applies to tools. Split multi-purpose tools (analyze_document) into purpose-specific tools (extract_data_points, summarize_content, verify_claim_against_source). Use constrained wrapper tools when you can't modify underlying APIs.

System prompts and tool names interact. Keywords in the system prompt that echo tool names create unintended selection bias. Audit system prompts for verb/noun overlap with tool names. Qualify "always-do" instructions to specific contexts.

4-5 tools per agent is the benchmark. Over-provisioning (18+ tools) degrades selection reliability by increasing decision complexity. Implement scoped tool access: each agent gets only the tools relevant to its role. Shared cross-role tools should be limited and purposeful.

Parameter names reinforce boundaries at the schema level. Use specific names (customer_id, order_id) rather than generic ones (identifier) across similar tools. Schema-level differentiation works in parallel with description-level differentiation.

Previous Task ← Task 1.7: Session State & Forking

Next Task Task 2.2: Structured Error Responses →