Multi-agent orchestration is where Claude truly scales. Instead of one agent trying to do everything, you build a coordinator that delegates intelligently to specialised subagents. This task statement tests your ability to design those systems correctly โ including how context flows, how scope is partitioned, and how the coordinator refines results iteratively.
Imagine a large consulting project. A project manager (coordinator) receives the client brief: "Analyse the impact of AI on the financial services industry."
The PM doesn't personally write every section. Instead, they break the work into specialisms and delegate: a market analyst researches trends and data, a regulatory specialist reviews compliance implications, a technology expert assesses AI tools being adopted, and a writer synthesises everything into the final report.
Crucially: each specialist works independently โ the market analyst doesn't see the regulatory specialist's notes unless the PM explicitly shares them. All communications flow through the PM. The PM reviews drafts, spots gaps ("We haven't covered insurance sector impact"), and sends targeted follow-up requests before delivering the final report.
This is exactly how a Claude coordinator-subagent system works.
The architecture that has emerged as the standard pattern for multi-agent Claude systems is the hub-and-spoke model (coordinator-subagent pattern). In this architecture, a single coordinator agent sits at the centre and manages all orchestration, while specialist subagents sit at the periphery and execute specific assigned tasks.
Key architectural rules of hub-and-spoke:
If Synthesis could call Web Search directly, you lose: (1) error handling consistency โ each subagent would need its own retry logic; (2) observability โ you can't easily log the full chain; (3) circular dependency risk โ Agent A calls B which calls A again. Routing through the coordinator eliminates all three problems.
This is one of the most commonly misunderstood concepts โ and a frequent exam trap. Subagents do NOT automatically inherit the coordinator's conversation history. Every subagent invocation starts with an empty context, receiving only what the coordinator explicitly passes in the prompt.
When you send a package with delivery instructions, the courier doesn't know your entire life history โ only what's written on the label you attached. If you forget to write "leave at side door," the courier won't know. Subagents are the same: they receive exactly the "label" (prompt) the coordinator writes, nothing more.
When invoking a subagent, the coordinator must explicitly pass everything the subagent needs to complete its task. Nothing flows automatically. This is a design responsibility โ the coordinator author must think carefully about what each subagent needs.
# BAD: Subagent will have no idea what articles to synthesise subagent_prompt = "Please synthesise the research findings." # GOOD: Everything the subagent needs is in the prompt subagent_prompt = f"""You are a synthesis specialist. TASK: Produce a 500-word synthesis of the following research findings. TOPIC: {original_user_query} QUALITY CRITERIA: Cite sources, flag conflicting data, highlight consensus. WEB SEARCH FINDINGS: {web_search_results} DOCUMENT ANALYSIS FINDINGS: {doc_analysis_summary} OUTPUT FORMAT: Structured report with sections: Overview, Key Trends, Conflicts, Gaps. """
When passing findings between agents, use structured formats (JSON or markdown tables) to clearly separate content from metadata (source URLs, dates, confidence scores). Unstructured prose handoffs cause the synthesis agent to lose attribution data, making citations impossible.
Example: instead of passing "Article X says AI adoption is 40%", pass {"claim": "AI adoption is 40%", "source": "McKinsey 2024", "url": "..."}.
The coordinator isn't just a router โ it plays four distinct roles throughout the pipeline. Each role requires active intelligence, not just message passing.
The coordinator analyses the user's query to identify what subtasks are required. For the research system example, "impact of AI on creative industries" must be broken into: visual arts, music, writing, film, gaming โ NOT just "digital art, graphics, photography." Overly narrow decomposition is the primary root cause of incomplete reports and a key exam question type.
Based on the decomposed subtasks and the incoming query's complexity, the coordinator selects which subagents to invoke. Complex queries may require all agents. Simple factual queries may need only Web Search + Report Gen. The coordinator must not blindly route through the full pipeline every time.
The coordinator collects outputs from all subagents and assembles them into a coherent whole. This includes resolving conflicts, normalising formats, and preserving source attribution. It must know which subagent produced which finding.
Before finalising, the coordinator evaluates coverage. Are there gaps? Is any subtopic underrepresented? If yes, it re-delegates targeted queries to specific subagents and re-invokes synthesis. This iterative refinement is the difference between a good system and a great one.
One of the key design skills tested is knowing when to invoke the full agent pipeline vs. a targeted subset. Blindly running all subagents on every query is wasteful and slower.
| Scenario | Recommended Routing | Why |
|---|---|---|
| "What year was GPT-3 released?" | Web Search only โ Report Gen | Simple factual query. Doc Analysis and Synthesis add latency with no value. |
| "Summarise the attached 50-page whitepaper" | Doc Analysis only โ Report Gen | Input is already provided. No web search needed. |
| "Comprehensive analysis of AI in healthcare across 5 years" | All agents โ Iterative refinement loop | Broad scope needs web search, document analysis, synthesis, and fact-checking. |
| "Verify if claim X in this document is accurate" | Doc Analysis + Web Search โ Fact Checker | Specific verification task. Report Gen not needed until after fact check. |
| "Generate a weekly news summary" | Full pipeline always | Overkill โ most weeks only need Web Search + Report Gen. |
Encode routing rules in the coordinator's system prompt using explicit criteria: "If the query requires information not in the attached documents, invoke the Web Search agent. If the query is purely about attached documents, skip Web Search." This lets Claude make routing decisions itself, rather than hard-coding routing logic in your application code.
When multiple subagents research the same broad topic in parallel, there's a risk of duplication (both agents find the same articles) or collisions (both agents are assigned "AI in healthcare" in full). Effective partitioning assigns distinct, non-overlapping scope slices to each agent.
Agent A: "Research AI impact on creative industries"
Agent B: "Find information about AI and creative work"
Both agents search the same space. Results overlap by ~70%. Synthesis agent receives duplicate data and produces a bloated report.
Agent A: "AI in visual arts & graphic design โ 2022-2024"
Agent B: "AI in music & audio production โ 2022-2024"
Agent C: "AI in writing, filmmaking & gaming โ 2022-2024"
Zero overlap. Each agent has a clear, distinct domain.
Agent A: "Search academic papers & research journals"
Agent B: "Search industry reports & news articles"
Agent C: "Search social media trends & user sentiment"
Different source layers give complementary perspectives without duplication.
Agent A: "Pre-2022 historical context"
Agent B: "2022-2023 adoption wave"
Agent C: "2024-present current state"
Temporal slicing ensures full chronological coverage.
Sample Question: "Your multi-agent research system is asked about AI's impact on creative industries. The final report covers only visual arts, missing music, writing, and film. Subagents all completed successfully. What is the root cause?"
Answer: Coordinator task decomposition was too narrow. The coordinator only assigned visual arts subtopics. The subagents did exactly what they were told โ the bug is in the coordinator's planning, not the subagents' execution. This is Task 1.2's signature question type.
A sophisticated coordinator doesn't just collect results once โ it evaluates quality, identifies gaps, and loops back to gather more targeted information before finalising. This is the iterative refinement pattern.
The coordinator's evaluation prompt should specify concrete coverage criteria. Without explicit criteria, Claude will generally say "looks complete" even when it isn't.
# Coordinator evaluates synthesis for gaps evaluation_prompt = """ Review the synthesis draft below and check for coverage gaps. REQUIRED COVERAGE (all must be addressed): - Visual arts (painting, sculpture, digital art) - Music (composition, production, performance) - Writing (journalism, fiction, copywriting) - Film and video production - Gaming and interactive media SYNTHESIS DRAFT: {synthesis_draft} OUTPUT: JSON with keys: - covered_topics: list of topics adequately addressed - missing_topics: list of topics absent or insufficient - sufficient: boolean (true only if ALL required topics covered) - targeted_queries: specific search queries for any missing topics """
Always cap your refinement loop (e.g., max 3 iterations) to prevent runaway costs. After the cap, produce the best report available and annotate which topics remain under-covered. This gives users useful output rather than a hung pipeline.
Letting the Synthesis agent call Web Search directly bypasses the coordinator. You lose: consistent error handling, observability, and deadlock prevention.
Routing every query through all agents regardless of complexity wastes tokens and adds latency. A "what year was X founded?" query doesn't need Doc Analysis or Synthesis.
Assuming subagents know the original user query or prior agent results. They don't โ everything must be explicitly included in the Task tool prompt.
Giving multiple agents the same broad scope (e.g., "research AI in creative industries" ร 3). Leads to duplicate findings and an inflated, repetitive synthesis.
Include source URLs, dates, and confidence scores in every inter-agent handoff. Prose-only handoffs lose provenance and make citing sources impossible.
Tell subagents what outcome is needed, not step-by-step instructions. This preserves their adaptability and lets them handle unexpected input formats.
Emit multiple Task tool calls in a single coordinator response to run subagents in parallel. Sequential calls for independent tasks multiply latency unnecessarily.
Since all communication passes through the coordinator, instrument every delegation and result for observability. This is your audit trail for debugging production issues.
The primary exam scenario for this task is: "You are building a multi-agent research system using the Claude Agent SDK. A coordinator agent delegates to specialized subagents: one searches the web, one analyzes documents, one synthesizes findings, and one generates reports. The system researches topics and produces comprehensive, cited reports."
Questions will present a broken multi-agent system and ask you to identify the root cause. The root cause almost always traces to one of: (1) overly narrow task decomposition by the coordinator leading to incomplete coverage of broad topics, (2) missing explicit context passing to subagents, (3) subagents with wrong tool scope, (4) missing iterative refinement, or (5) wrong routing pattern (full pipeline for simple queries).