Complete concept reference for all 7 task statements. Every key concept explained with a real-world use case and the exam signal that identifies which pattern to apply. Use this as your final review before exam day.
The agentic loop is the core execution engine for every autonomous Claude task. Claude receives a request, decides on a tool to call, you execute that tool, you append the result, and you call Claude again β repeating until stop_reason == "end_turn".
Three values: end_turn (done β return response), tool_use (continue β execute tools), max_tokens (truncated β handle separately, NOT as end_turn). Never terminate based on Claude's text content.
After each tool execution, append TWO things: (1) the assistant response as role=assistant, and (2) the tool results as role=user with tool_result type. Missing either breaks the loop.
Claude can emit multiple tool_use blocks in a single response. Your loop must execute ALL of them and return ALL results in one tool_result message. Missing any results leaves Claude reasoning incomplete.
When stop_reason == "max_tokens", the response is truncated mid-generation. Do NOT treat as a completed response. Either increase max_tokens or request Claude to continue. Set production max_tokens to 8096.
A customer asks: "What's the status of my three orders?" Claude calls get_customer β sees 3 active orders β calls lookup_order three times in parallel (one per order) β receives all results β calls end_turn with a unified status response. The loop runs for 3 iterations, each appending results correctly to history.
A question describes an agent that terminates after Claude says "I've completed the task" β this is an anti-pattern. Correct termination is stop_reason == "end_turn" only. Also: if an agent silently produces truncated output, the answer is max_tokens was too low, not a logic error.
Hub-and-spoke architecture: one coordinator agent receives the user request, decomposes it into subtasks, delegates each to specialized subagents, collects results, synthesizes, and returns the final answer. The coordinator never performs work itself β it orchestrates.
Subagents do NOT inherit the coordinator's conversation history. Each subagent starts with an empty context. The coordinator must pass everything a subagent needs β explicitly, in the Task prompt.
The primary failure mode: coordinator decomposes tasks too narrowly, so subagents lack cross-cutting context. Each subagent is correct individually but the synthesis is inconsistent or contradictory.
Subagent errors propagate to the coordinator as structured tool_result errors. The coordinator decides: retry the subagent, escalate to human, or abort with explanation. Never silently swallow subagent errors.
Subagent results must include both content AND metadata (source URLs, document names, publication dates, confidence scores) in structured JSON. Prose-only handoffs lose attribution permanently.
A research coordinator receives: "Write a report on renewable energy adoption." It spawns: (1) WebResearcher agent β gathers sources with URLs and dates, (2) DataAnalyst agent β extracts statistics, (3) PolicyAnalyst agent β identifies regulatory context. Each returns structured JSON. Coordinator synthesizes the cited, attributed report.
If a system produces inconsistent output despite each subagent working correctly β the cause is overly narrow coordinator decomposition. The fix is enlarging each subagent's task scope, not adding more subagents.
The mechanics of spawning subagents: the Task tool, AgentDefinition configuration, explicit context passing, and parallel vs sequential spawning. These are the plumbing details that make multi-agent systems actually work.
"Task" must be in the coordinator's allowedTools. Without it, Claude cannot spawn subagents even if told to. This is the #1 tested configuration error in Task 1.3.
description (Claude uses this to select which agent type to spawn), systemPrompt (defines the agent's role, output format, and quality criteria), allowedTools (security boundary β tightly scoped per role).
Emit multiple Task tool calls in a SINGLE coordinator response. The SDK executes them concurrently. Sequential spawning (one Task call per turn) multiplies latency by the number of subagents. For n independent subtasks, parallel is O(1); sequential is O(n).
fork_session creates independent branches from a shared baseline. Both branches inherit the baseline context but subsequent actions don't pollute each other. Use for A/B comparison of refactoring approaches.
A coordinator agent receives a codebase analysis request. It defines three AgentDefinition types: FileReader (allowedTools: Read, Grep), Analyzer (allowedTools: none β reasons on passed content), Reporter (allowedTools: Write). It spawns FileReader agents in parallel for each directory, then passes all results to the Analyzer, then triggers Reporter. Total time = slowest FileReader, not sum of all.
If subagents produce generic output: Task prompt didn't pass enough context. If coordinator can't spawn: "Task" missing from allowedTools. If parallel calls run sequentially: multiple Task calls were sent in separate turns, not one response.
Programmatic gates enforce that workflow stages execute in the correct order and that prerequisites are complete before downstream steps run. Prompt instructions alone cannot provide this guarantee.
Gates query your database for actual state. Prompt instructions tell Claude to check state. Gates are deterministic (100% compliance). Prompt instructions are probabilistic (non-zero failure rate). Use gates for compliance-critical steps.
The canonical exam example: a pre-tool hook intercepts any process_refund call, checks customer_verified == True in the state store, and blocks if not. This guarantees identity verification before financial operations β impossible with prompt instructions alone.
HiTL suspends (not terminates) the session. Full workflow state is persisted to durable storage so execution resumes exactly at the suspension point after human approval. The session ID and stage position are stored.
When escalating to human agents, compile: customer_id, root cause analysis, refund amount, and recommended action. Human agents lack conversation transcript access β the handoff must be self-contained.
Restoring local state insufficient for stages that created external records. Need a compensating action: delete the created account, send a cancellation email, reverse the database write. Both snapshot AND compensating transaction required.
A customer has 3 billing disputes. Agent decomposes into 3 parallel invoice investigations (each with customer_id + invoice_id context). Results synthesized into a unified resolution. If any refund exceeds $500 β enforcement gate triggers β escalates to human with structured handoff containing all 3 dispute findings, total refund amount, and recommended resolution.
If a compliance step was bypassed despite instructions: gates are in prompts, not code. If a workflow ran out of order: entry conditions checked Claude's output text, not the database. If rollback failed: no compensating transaction for external records.
SDK hooks intercept tool execution at two points: PreToolCall (before execution β can block) and PostToolUse (after execution β can transform). Together they provide deterministic data normalization and policy enforcement that prompt instructions cannot guarantee.
Fires AFTER tool returns. Cannot block. Transforms heterogeneous output before Claude sees it. Three tested formats: Unix timestamps β ISO 8601, numeric status codes β strings, mixed date formats β ISO 8601.
Fires BEFORE tool executes. Can block by returning error as tool_result. Use for: refund threshold ($500), bulk operation limits, irreversible action confirmation. Claude reads the error and takes the instructed alternative action.
Hook enforcement: 100% compliance β fires in your code regardless of Claude's reasoning. Prompt instruction enforcement: non-zero failure rate β persuasive user context can cause deviation. Always use hooks for regulatory/financial rules.
When blocking, error message must specify: violation reason, threshold breached, exact alternative tool to call, and required handoff fields. Generic "Policy violation" messages cause Claude to hallucinate alternatives.
Agent connects to: legacy order system (returns Unix timestamps, numeric status codes), modern CRM (ISO 8601, string statuses). PostToolUse hook normalizes ALL tool outputs to consistent format before Claude reasons on them. When Claude calls process_refund(amount=750) β PreToolCall hook fires β 750 > 500 β blocks β Claude receives "use escalate_to_human with reason='large_refund', include customer_id, amount, root_cause, recommended_action."
"Most reliable way to prevent refunds above $500" β PreToolCall hook, NOT system prompt. "Claude sometimes passes the policy check despite the instruction" β replace prompt instruction with hook. "Data from different tools is inconsistent" β add PostToolUse normalization hook.
Two fundamental decomposition patterns: Prompt Chaining (fixed sequential pipeline when steps are known upfront) and Dynamic Adaptive Decomposition (workflow shape emerges from what is discovered). Choosing correctly is the primary exam test for Task 1.6.
Use when: workflow structure is fully known before seeing any data. Stages defined upfront. Input of stage N is output of stage N-1. Exam example: code review (security β performance β correctness β integration pass). Same stages regardless of code content.
Use when: structure only emerges as you investigate. Subtasks generated from intermediate findings. Exam example: "add tests to legacy codebase" β must map structure, identify high-impact areas, then create a prioritized plan that adapts as dependencies discovered.
Sending 10 files in one 60k-token prompt causes attention dilution: model processes start/end reliably, misses middle. Fix: each file gets dedicated API call ("local issues only, do NOT cross-file analyze"), then a separate cross-file integration pass over all per-file results.
Per-file passes only catch local bugs. The integration pass synthesizes cross-file vulnerabilities: data flow security issues, auth bypass paths, circular dependencies, contradictory findings between files. Cannot be skipped for production code review.
PR with 8 changed files. Step 1: run 8 per-file analysis passes IN PARALLEL (each focused, one API call per file). Step 2: integration pass receives all 8 results, identifies cross-file data flow issues auth.pyβapi.py missed individually. Total latency = slowest single file + integration pass time.
Task: "Add comprehensive tests to a 140-file legacy codebase." Phase 1: map structure (discovers 12 modules, 3 have zero tests). Phase 2: rank by impact (payment.py critical path, auth.py security-sensitive). Phase 3: generate tests β discovers payment.py depends on untested legacy_db.py β adds to plan. Impossible with fixed pipeline.
"Can you draw the full workflow diagram before seeing any data?" YES β Prompt Chaining. NO (shape emerges from data) β Dynamic Adaptive. "Middle files in a code review were poorly analyzed" β attention dilution β add per-file passes. "Test plan missed a critical dependency" β fixed pipeline used where adaptive was needed.
Three session management strategies for long-running tasks that span multiple work sessions: named resumption, fork branching, and fresh start with structured summary. The decision turns on whether prior context is reliable and whether divergent exploration is needed.
Resumes with full prior conversation history β every message, tool call, tool result. Ideal when: investigation paused mid-task, no or minimal code changes since session, prior tool results still accurate. Session names should be descriptive and readable by teammates.
Creates independent copies of current session state. Both branches share the baseline but neither affects the other. Use when: shared expensive baseline exists, need to explore divergent approaches (two testing strategies, two refactoring paths). Fork AFTER the baseline is complete, not before.
Stale = underlying data changed after tool was called. Agent reasons from stale tool results as if current β produces confidently wrong analysis. When significant refactoring occurred: start fresh session + inject structured summary of what's still valid, what's invalidated.
Middle path: resume session AND explicitly name which files changed + nature of change. Agent re-reads only changed files, preserves valid prior analysis of unchanged files. More efficient than full re-exploration when only 2β5 of 20 files changed.
Developer Productivity agent completes a codebase analysis (expensive: 30 min, 50 tool calls). Two refactoring approaches identified. Instead of re-running analysis for each: fork_session() creates Branch A (Module approach) and Branch B (Microservice approach). Both branches explore from identical baseline independently. Developer reviews both findings to choose approach.
Investigation session paused Friday. Monday: 3 of 20 analyzed files were modified over weekend. Decision: --resume <session-name> + "Since last session, auth/processor.py was refactored (JWTβOAuth), config.py added retry keys. All other files unchanged. Please re-read these two files then continue your assessment." Agent does targeted re-analysis of 2 files only.
"Resume or fresh?" β If prior tool results are mostly valid: resume. If significant code changes occurred: fresh + structured summary. "Agent reasoning is inconsistent with actual code" β stale tool results in resumed session. "Two approaches contaminated each other" β should have used fork_session instead of sequential exploration in one session.
| Situation / Symptom | Root Cause | Correct Answer |
|---|---|---|
| Agent terminates after Claude says "task complete" | Wrong termination signal | Terminate only on stop_reason == "end_turn" |
| Agent silently produces truncated output | max_tokens too low | Increase to 8096+; handle max_tokens stop_reason separately |
| Subagents produce generic, context-free output | Insufficient context in Task prompt | Pass all required context explicitly in Task tool prompt |
| Coordinator can't spawn subagents despite instructions | "Task" missing from allowedTools | Add "Task" to coordinator's allowedTools |
| Parallel spawning runs sequentially | Task calls in separate turns | Emit all Task calls in one coordinator response |
| Compliance rule bypassed despite prompt instruction | Prompt = probabilistic | Replace with PreToolCall SDK hook |
| Tool data inconsistent across backends | Heterogeneous formats | Add PostToolUse normalization hook |
| Middle files in code review poorly analyzed | Attention dilution | Per-file local passes + separate integration pass |
| Test plan missed a discovered dependency | Fixed pipeline for open-ended task | Dynamic adaptive decomposition with adapting plan |
| Agent reasons incorrectly after resume | Stale tool results | Fresh session + structured summary of valid/invalidated findings |
| Two strategy explorations contaminated each other | Sequential exploration in one session | fork_session() to create independent branches |
| Compliance gate ran after downstream tool | Entry conditions checked Claude's text | Gate must query database state, not parse Claude's assertion |
stop_reason == "end_turn"max_tokens β end_turn β handle separatelyallowedTools is mandatory for spawningprocess_refund until get_customer completes--resume <name>Watch for implicit loop failure: When Claude returns text that looks like a solution but didn't actually call the final tool (like close_ticket or finalize_invoice). To score 950+, your architecture must enforce that certain user intents must terminate with specific tool signals, not just text runs. This is the difference between a prototype and a production-grade agent.
For the exam, remember: --resume is for continuation (linear time), while fork_session is for divergent exploration (branching time). Using --resume to compare two different implementation strategies is an architectural error because the first exploration contaminates the second. Always fork for comparisons.