β˜… Domain 1 Summary

Agentic Architecture & Orchestration

Complete concept reference for all 7 task statements. Every key concept explained with a real-world use case and the exam signal that identifies which pattern to apply. Use this as your final review before exam day.

27% of Exam Weight 7 Task Statements 3 Primary Scenarios Highest Domain Weight
27%
Exam Weight
7
Task Statements
3
Exam Scenarios
S1
Customer Support
S3
Multi-Agent Research
S4
Dev Productivity

πŸ”„ Task 1.1 β€” Agentic Loops for Autonomous Task Execution

1.1 Design and Implement Agentic Loops Full Page β†’

The agentic loop is the core execution engine for every autonomous Claude task. Claude receives a request, decides on a tool to call, you execute that tool, you append the result, and you call Claude again β€” repeating until stop_reason == "end_turn".

stop_reason β€” The Only Termination Signal

Three values: end_turn (done β€” return response), tool_use (continue β€” execute tools), max_tokens (truncated β€” handle separately, NOT as end_turn). Never terminate based on Claude's text content.

History Appending (The Most Common Bug)

After each tool execution, append TWO things: (1) the assistant response as role=assistant, and (2) the tool results as role=user with tool_result type. Missing either breaks the loop.

Parallel Tool Calls

Claude can emit multiple tool_use blocks in a single response. Your loop must execute ALL of them and return ALL results in one tool_result message. Missing any results leaves Claude reasoning incomplete.

max_tokens Handling

When stop_reason == "max_tokens", the response is truncated mid-generation. Do NOT treat as a completed response. Either increase max_tokens or request Claude to continue. Set production max_tokens to 8096.

🌐 Real-World Use Case β€” Customer Support Agent

A customer asks: "What's the status of my three orders?" Claude calls get_customer β†’ sees 3 active orders β†’ calls lookup_order three times in parallel (one per order) β†’ receives all results β†’ calls end_turn with a unified status response. The loop runs for 3 iterations, each appending results correctly to history.

⚠️ Exam Signal

A question describes an agent that terminates after Claude says "I've completed the task" β€” this is an anti-pattern. Correct termination is stop_reason == "end_turn" only. Also: if an agent silently produces truncated output, the answer is max_tokens was too low, not a logic error.

πŸ•ΈοΈ Task 1.2 β€” Multi-Agent Orchestration

1.2 Orchestrate Multi-Agent Systems with Coordinator-Subagent Patterns Full Page β†’

Hub-and-spoke architecture: one coordinator agent receives the user request, decomposes it into subtasks, delegates each to specialized subagents, collects results, synthesizes, and returns the final answer. The coordinator never performs work itself β€” it orchestrates.

Context Isolation

Subagents do NOT inherit the coordinator's conversation history. Each subagent starts with an empty context. The coordinator must pass everything a subagent needs β€” explicitly, in the Task prompt.

Coordinator Decomposition Risk

The primary failure mode: coordinator decomposes tasks too narrowly, so subagents lack cross-cutting context. Each subagent is correct individually but the synthesis is inconsistent or contradictory.

Error Routing

Subagent errors propagate to the coordinator as structured tool_result errors. The coordinator decides: retry the subagent, escalate to human, or abort with explanation. Never silently swallow subagent errors.

Structured Metadata Handoffs

Subagent results must include both content AND metadata (source URLs, document names, publication dates, confidence scores) in structured JSON. Prose-only handoffs lose attribution permanently.

🌐 Real-World Use Case β€” Multi-Agent Research System

A research coordinator receives: "Write a report on renewable energy adoption." It spawns: (1) WebResearcher agent β†’ gathers sources with URLs and dates, (2) DataAnalyst agent β†’ extracts statistics, (3) PolicyAnalyst agent β†’ identifies regulatory context. Each returns structured JSON. Coordinator synthesizes the cited, attributed report.

⚠️ Exam Signal

If a system produces inconsistent output despite each subagent working correctly β€” the cause is overly narrow coordinator decomposition. The fix is enlarging each subagent's task scope, not adding more subagents.

πŸ€– Task 1.3 β€” Subagent Invocation & Context Passing

1.3 Configure Subagent Invocation, Context Passing, and Spawning Full Page β†’

The mechanics of spawning subagents: the Task tool, AgentDefinition configuration, explicit context passing, and parallel vs sequential spawning. These are the plumbing details that make multi-agent systems actually work.

Task Tool β€” The Only Spawning Mechanism

"Task" must be in the coordinator's allowedTools. Without it, Claude cannot spawn subagents even if told to. This is the #1 tested configuration error in Task 1.3.

AgentDefinition β€” 3 Required Parts

description (Claude uses this to select which agent type to spawn), systemPrompt (defines the agent's role, output format, and quality criteria), allowedTools (security boundary β€” tightly scoped per role).

Parallel Spawning

Emit multiple Task tool calls in a SINGLE coordinator response. The SDK executes them concurrently. Sequential spawning (one Task call per turn) multiplies latency by the number of subagents. For n independent subtasks, parallel is O(1); sequential is O(n).

fork_session for Divergent Exploration

fork_session creates independent branches from a shared baseline. Both branches inherit the baseline context but subsequent actions don't pollute each other. Use for A/B comparison of refactoring approaches.

🌐 Real-World Use Case β€” Developer Productivity Tool

A coordinator agent receives a codebase analysis request. It defines three AgentDefinition types: FileReader (allowedTools: Read, Grep), Analyzer (allowedTools: none β€” reasons on passed content), Reporter (allowedTools: Write). It spawns FileReader agents in parallel for each directory, then passes all results to the Analyzer, then triggers Reporter. Total time = slowest FileReader, not sum of all.

⚠️ Exam Signal

If subagents produce generic output: Task prompt didn't pass enough context. If coordinator can't spawn: "Task" missing from allowedTools. If parallel calls run sequentially: multiple Task calls were sent in separate turns, not one response.

πŸ”’ Task 1.4 β€” Multi-Step Workflows & Enforcement

1.4 Implement Multi-Step Workflows with Enforcement & Handoff Patterns Full Page β†’

Programmatic gates enforce that workflow stages execute in the correct order and that prerequisites are complete before downstream steps run. Prompt instructions alone cannot provide this guarantee.

Programmatic Gates vs Prompt Instructions

Gates query your database for actual state. Prompt instructions tell Claude to check state. Gates are deterministic (100% compliance). Prompt instructions are probabilistic (non-zero failure rate). Use gates for compliance-critical steps.

Blocking process_refund Until get_customer Completes

The canonical exam example: a pre-tool hook intercepts any process_refund call, checks customer_verified == True in the state store, and blocks if not. This guarantees identity verification before financial operations β€” impossible with prompt instructions alone.

Human-in-the-Loop Checkpoints

HiTL suspends (not terminates) the session. Full workflow state is persisted to durable storage so execution resumes exactly at the suspension point after human approval. The session ID and stage position are stored.

Structured Escalation Handoffs

When escalating to human agents, compile: customer_id, root cause analysis, refund amount, and recommended action. Human agents lack conversation transcript access β€” the handoff must be self-contained.

Rollback = Snapshot + Compensating Transaction

Restoring local state insufficient for stages that created external records. Need a compensating action: delete the created account, send a cancellation email, reverse the database write. Both snapshot AND compensating transaction required.

🌐 Real-World Use Case β€” Customer Support: Multi-Concern Billing Dispute

A customer has 3 billing disputes. Agent decomposes into 3 parallel invoice investigations (each with customer_id + invoice_id context). Results synthesized into a unified resolution. If any refund exceeds $500 β†’ enforcement gate triggers β†’ escalates to human with structured handoff containing all 3 dispute findings, total refund amount, and recommended resolution.

⚠️ Exam Signal

If a compliance step was bypassed despite instructions: gates are in prompts, not code. If a workflow ran out of order: entry conditions checked Claude's output text, not the database. If rollback failed: no compensating transaction for external records.

πŸͺ Task 1.5 β€” Agent SDK Hooks

1.5 Apply Agent SDK Hooks for Tool Call Interception & Data Normalization Full Page β†’

SDK hooks intercept tool execution at two points: PreToolCall (before execution β€” can block) and PostToolUse (after execution β€” can transform). Together they provide deterministic data normalization and policy enforcement that prompt instructions cannot guarantee.

PostToolUse β€” Data Normalization

Fires AFTER tool returns. Cannot block. Transforms heterogeneous output before Claude sees it. Three tested formats: Unix timestamps β†’ ISO 8601, numeric status codes β†’ strings, mixed date formats β†’ ISO 8601.

PreToolCall β€” Compliance Enforcement

Fires BEFORE tool executes. Can block by returning error as tool_result. Use for: refund threshold ($500), bulk operation limits, irreversible action confirmation. Claude reads the error and takes the instructed alternative action.

Deterministic vs Probabilistic Compliance

Hook enforcement: 100% compliance β€” fires in your code regardless of Claude's reasoning. Prompt instruction enforcement: non-zero failure rate β€” persuasive user context can cause deviation. Always use hooks for regulatory/financial rules.

Actionable Block Error Messages

When blocking, error message must specify: violation reason, threshold breached, exact alternative tool to call, and required handoff fields. Generic "Policy violation" messages cause Claude to hallucinate alternatives.

🌐 Real-World Use Case β€” Multi-Backend Customer Support

Agent connects to: legacy order system (returns Unix timestamps, numeric status codes), modern CRM (ISO 8601, string statuses). PostToolUse hook normalizes ALL tool outputs to consistent format before Claude reasons on them. When Claude calls process_refund(amount=750) β†’ PreToolCall hook fires β†’ 750 > 500 β†’ blocks β†’ Claude receives "use escalate_to_human with reason='large_refund', include customer_id, amount, root_cause, recommended_action."

⚠️ Exam Signal

"Most reliable way to prevent refunds above $500" β†’ PreToolCall hook, NOT system prompt. "Claude sometimes passes the policy check despite the instruction" β†’ replace prompt instruction with hook. "Data from different tools is inconsistent" β†’ add PostToolUse normalization hook.

πŸ”€ Task 1.6 β€” Task Decomposition Strategies

1.6 Design Task Decomposition Strategies for Complex Workflows Full Page β†’

Two fundamental decomposition patterns: Prompt Chaining (fixed sequential pipeline when steps are known upfront) and Dynamic Adaptive Decomposition (workflow shape emerges from what is discovered). Choosing correctly is the primary exam test for Task 1.6.

Prompt Chaining β€” Fixed Sequential Pipeline

Use when: workflow structure is fully known before seeing any data. Stages defined upfront. Input of stage N is output of stage N-1. Exam example: code review (security β†’ performance β†’ correctness β†’ integration pass). Same stages regardless of code content.

Dynamic Adaptive Decomposition

Use when: structure only emerges as you investigate. Subtasks generated from intermediate findings. Exam example: "add tests to legacy codebase" β€” must map structure, identify high-impact areas, then create a prioritized plan that adapts as dependencies discovered.

Attention Dilution β€” Why Per-File Passes

Sending 10 files in one 60k-token prompt causes attention dilution: model processes start/end reliably, misses middle. Fix: each file gets dedicated API call ("local issues only, do NOT cross-file analyze"), then a separate cross-file integration pass over all per-file results.

Integration Pass β€” Not Optional

Per-file passes only catch local bugs. The integration pass synthesizes cross-file vulnerabilities: data flow security issues, auth bypass paths, circular dependencies, contradictory findings between files. Cannot be skipped for production code review.

🌐 Real-World Use Case A β€” Code Review (Prompt Chaining)

PR with 8 changed files. Step 1: run 8 per-file analysis passes IN PARALLEL (each focused, one API call per file). Step 2: integration pass receives all 8 results, identifies cross-file data flow issues auth.py→api.py missed individually. Total latency = slowest single file + integration pass time.

🌐 Real-World Use Case B β€” Legacy Test Generation (Dynamic Adaptive)

Task: "Add comprehensive tests to a 140-file legacy codebase." Phase 1: map structure (discovers 12 modules, 3 have zero tests). Phase 2: rank by impact (payment.py critical path, auth.py security-sensitive). Phase 3: generate tests β†’ discovers payment.py depends on untested legacy_db.py β†’ adds to plan. Impossible with fixed pipeline.

⚠️ Exam Signal

"Can you draw the full workflow diagram before seeing any data?" YES β†’ Prompt Chaining. NO (shape emerges from data) β†’ Dynamic Adaptive. "Middle files in a code review were poorly analyzed" β†’ attention dilution β†’ add per-file passes. "Test plan missed a critical dependency" β†’ fixed pipeline used where adaptive was needed.

πŸ’Ύ Task 1.7 β€” Session State, Resumption & Forking

1.7 Manage Session State, Resumption & Forking Full Page β†’

Three session management strategies for long-running tasks that span multiple work sessions: named resumption, fork branching, and fresh start with structured summary. The decision turns on whether prior context is reliable and whether divergent exploration is needed.

Named Session Resumption (--resume)

Resumes with full prior conversation history β€” every message, tool call, tool result. Ideal when: investigation paused mid-task, no or minimal code changes since session, prior tool results still accurate. Session names should be descriptive and readable by teammates.

fork_session β€” Independent Branches

Creates independent copies of current session state. Both branches share the baseline but neither affects the other. Use when: shared expensive baseline exists, need to explore divergent approaches (two testing strategies, two refactoring paths). Fork AFTER the baseline is complete, not before.

Stale Tool Results β€” When Fresh Wins

Stale = underlying data changed after tool was called. Agent reasons from stale tool results as if current β†’ produces confidently wrong analysis. When significant refactoring occurred: start fresh session + inject structured summary of what's still valid, what's invalidated.

Targeted Re-Analysis (Resume + Inform)

Middle path: resume session AND explicitly name which files changed + nature of change. Agent re-reads only changed files, preserves valid prior analysis of unchanged files. More efficient than full re-exploration when only 2–5 of 20 files changed.

🌐 Real-World Use Case A β€” Comparing Refactoring Approaches

Developer Productivity agent completes a codebase analysis (expensive: 30 min, 50 tool calls). Two refactoring approaches identified. Instead of re-running analysis for each: fork_session() creates Branch A (Module approach) and Branch B (Microservice approach). Both branches explore from identical baseline independently. Developer reviews both findings to choose approach.

🌐 Real-World Use Case B β€” Post-Sprint Resumption

Investigation session paused Friday. Monday: 3 of 20 analyzed files were modified over weekend. Decision: --resume <session-name> + "Since last session, auth/processor.py was refactored (JWT→OAuth), config.py added retry keys. All other files unchanged. Please re-read these two files then continue your assessment." Agent does targeted re-analysis of 2 files only.

⚠️ Exam Signal

"Resume or fresh?" β†’ If prior tool results are mostly valid: resume. If significant code changes occurred: fresh + structured summary. "Agent reasoning is inconsistent with actual code" β†’ stale tool results in resumed session. "Two approaches contaminated each other" β†’ should have used fork_session instead of sequential exploration in one session.

⚑ Quick Decision Matrix β€” Every Key Choice in Domain 1
Situation / SymptomRoot CauseCorrect Answer
Agent terminates after Claude says "task complete"Wrong termination signalTerminate only on stop_reason == "end_turn"
Agent silently produces truncated outputmax_tokens too lowIncrease to 8096+; handle max_tokens stop_reason separately
Subagents produce generic, context-free outputInsufficient context in Task promptPass all required context explicitly in Task tool prompt
Coordinator can't spawn subagents despite instructions"Task" missing from allowedToolsAdd "Task" to coordinator's allowedTools
Parallel spawning runs sequentiallyTask calls in separate turnsEmit all Task calls in one coordinator response
Compliance rule bypassed despite prompt instructionPrompt = probabilisticReplace with PreToolCall SDK hook
Tool data inconsistent across backendsHeterogeneous formatsAdd PostToolUse normalization hook
Middle files in code review poorly analyzedAttention dilutionPer-file local passes + separate integration pass
Test plan missed a discovered dependencyFixed pipeline for open-ended taskDynamic adaptive decomposition with adapting plan
Agent reasons incorrectly after resumeStale tool resultsFresh session + structured summary of valid/invalidated findings
Two strategy explorations contaminated each otherSequential exploration in one sessionfork_session() to create independent branches
Compliance gate ran after downstream toolEntry conditions checked Claude's textGate must query database state, not parse Claude's assertion
⚑ Exam Day Cheat Sheet β€” Domain 1

Hook Selection

PostToolUse β†’ fires after, transforms, cannot block
PreToolCall β†’ fires before, can block, policy enforcement
Hooks = deterministic (100%). Prompts = probabilistic (<100%)
Block error must include: reason + alternative action + handoff fields

Agentic Loop

Terminate ONLY on stop_reason == "end_turn"
max_tokens β‰  end_turn β€” handle separately
Append: assistant response + tool_result (both, every time)
All parallel tool calls β†’ ALL results in one tool_result message

Multi-Agent

Subagents have ZERO inherited context β€” pass everything explicitly
"Task" in allowedTools is mandatory for spawning
Multiple Task calls in ONE response = parallel execution
AgentDefinition: description + systemPrompt + allowedTools

Workflow Enforcement

Gates query database β€” NOT Claude's output text
Canonical example: block process_refund until get_customer completes
Escalation handoff fields: customer_id + root_cause + amount + action
Rollback = snapshot + compensating transaction (both required)

Decomposition Pattern

Know steps before data? β†’ Prompt Chaining (fixed pipeline)
Steps emerge from findings? β†’ Dynamic Adaptive
Code review β†’ per-file parallel + cross-file integration pass
Open-ended: map structure β†’ rank impact β†’ adapt as discovered

Session Management

Context mostly valid β†’ --resume <name>
Few files changed β†’ resume + inform changed files
Many changes β†’ fresh session + structured summary
Divergent approaches β†’ fork_session() from shared baseline

🎯 950+ Architect's Subtle Nuance

The "Hidden" Loop Failure

Watch for implicit loop failure: When Claude returns text that looks like a solution but didn't actually call the final tool (like close_ticket or finalize_invoice). To score 950+, your architecture must enforce that certain user intents must terminate with specific tool signals, not just text runs. This is the difference between a prototype and a production-grade agent.

The "Fork" vs "Resume" Distinction

For the exam, remember: --resume is for continuation (linear time), while fork_session is for divergent exploration (branching time). Using --resume to compare two different implementation strategies is an architectural error because the first exploration contaminates the second. Always fork for comparisons.

πŸŽ‰ Domain 1 Complete β€” All 7 Task Statements Mastered

Domain 1 covers 27% of the exam. Mastering these 7 task statements puts you well on your way to a 950+ score. Proceed to Domain 2: Tool Design & MCP Integration.

Next: Deep Dive β†’ AI Fluency Framework