The agentic loop is the heartbeat of every autonomous Claude application. Understanding its lifecycle β how Claude decides what to do, when to stop, and how tool results feed back into reasoning β is the single most critical concept on the exam. This guide breaks it all down with real-world analogies, working code patterns, and the anti-patterns you must avoid.
Imagine Claude is a head chef in a restaurant kitchen. You (the orchestrator) hand the chef a task: "Prepare a three-course dinner for a guest with a nut allergy."
The chef doesn't complete the whole meal in one step. Instead, they work in a continuous loop: check what ingredients are available β prep the starter β taste and adjust β move to the main β check the allergy list before plating β finish with dessert β announce the meal is ready. At each step, the chef uses tools (the pantry, the stove, the recipe book) and feeds the result of each action back into their next decision.
The loop ends when the chef says "Service!" (end_turn) β not when a timer goes off, not after a fixed number of steps. The goal drives the termination, not an arbitrary countdown.
This is exactly how an agentic loop works with Claude. The model receives a task, decides which tool to call, gets the result, reasons about what to do next, and keeps looping until it decides it's done. Your code is the kitchen manager β you handle the tool execution plumbing, but Claude drives the decision-making.
Every agentic loop β regardless of complexity β follows the same four-stage cycle. Understanding each stage precisely is what separates architects who pass this exam from those who don't.
Before diving into the loop mechanics, it's critical to understand how Agentic Loops fit into the overarching AI Fluency Framework. We interact with AI via three distinct modes:
Every iteration begins by constructing a request containing three things: a system prompt (the agent's role & instructions), the current conversation history (messages array), and a list of available tools. For subsequent iterations, this growing conversation history is the critical mechanism by which Claude maintains context (working memory).
When Claude receives the request, it processes the entire conversation history along with the tool definitions. This reasoning is holistic β it natively considers what the user originally asked, what it has already attempted, what the results of prior tool calls were, and what logical next step would move the task forward. This invisible reasoning emerges as either a tool_use block, a text response, or both.
Claude generates a response returning the stop_reason field. This is the single most important signal your control loop reads to know what to do next. The definitive signal about what to do next is always the stop_reason, not the presence of text content.
"tool_use" β Claude wants to execute one or more tools before continuing. The loop MUST continue."end_turn" β Claude has finished reasoning; the response is complete. Extract final text and END the loop."max_tokens" β The response was truncated. Treat this as a continuation or error, NOT a completed task.When stop_reason == "tool_use", your orchestrator code reads the tool_use content blocks, calls the actual function/API/database, and collects results. Claude never directly executes tools. Your application performs the exact action. Most critically, you must append the results formatted as tool_result blocks matched to the specific tool_use_id.
When Claude returns stop_reason = 'end_turn', it decides it has sufficient information to produce a complete, final response. Return this to the user and stop the loop. This is a model-driven decision, where Claude alone decides it is finished.
Claude acts as the brain (decides what tool to call and why). Your backend code acts as the hands (actually runs the tool and returns results). Claude trusts your tool results implicitly β this is why input validation in your tool layer matters so much for security.
You take the tool result and append it to the conversation history as a tool_result message with the matching tool_use_id. Then you send the entire updated history back to Claude as the next API request. Claude reads its own prior reasoning and the fresh tool result to decide the next action.
stop_reason is arguably the most important field in the entire Claude API response for agentic systems. It's the foundation of your loop's control flow. Let's be precise about every possible value:
| stop_reason Value | Meaning | Your Action |
|---|---|---|
"tool_use" | Claude is requesting one or more tool calls. Inspect response.content for type:"tool_use" blocks. | Execute tools β append results β loop |
"end_turn" | Claude decided the task is complete and produced a final answer. | Extract text response β EXIT loop |
"max_tokens" | The response was cut off because it hit the max_tokens limit. | Handle truncation β do NOT treat as end_turn |
"stop_sequence" | A custom stop sequence was hit (rarely used in agentic loops). | Application-specific handling |
The exam will test whether you know the exact logic: if stop_reason == "tool_use" β continue; if stop_reason == "end_turn" β stop. Any other approach (checking text content, counting iterations, parsing for "I'm done") violates the API contract and is an anti-pattern.
Claude has no persistent memory between API calls. Every request must carry the complete conversation history, like handing a new colleague a printed transcript of everything that happened before they joined the meeting. Claude reads the whole transcript to get up to speed, then contributes the next piece. Your app is the filing cabinet that stores and resends this transcript each time.
After each tool call, your code must append two new entries to the messages array before looping:
tool_use content block with id, name, and input)tool_result content block (with the matching tool_use_id and the actual result content)# After Claude responds with stop_reason = "tool_use": messages.append({ "role": "assistant", "content": response.content # Contains tool_use block }) for block in response.content: if block.type == "tool_use": result = execute_tool(block.name, block.input) messages.append({ "role": "user", "content": [{ "type": "tool_result", "tool_use_id": block.id, # MUST match the tool_use block's id "content": str(result) }] })
Claude can request multiple tools simultaneously in a single response (parallel tool calls). Each tool_use block has a unique id. When you return results, Claude uses the tool_use_id to match each result back to its specific tool request. Getting the ID wrong causes Claude to misinterpret results β a subtle but devastating bug.
A customer asks: "What's the status of my last three orders?". Claude decides to call get_order_status three times in parallel, each with a different order ID and a different tool_use_id (e.g., tool_01, tool_02, tool_03). Your code executes all three, then returns three tool_result messages, each mapped to its tool_use_id. Claude aggregates and produces a unified answer. This parallel pattern is a performance optimization you should know for the exam.
The exam distinguishes sharply between two architectural philosophies for agent decision-making. Understanding the tradeoffs is essential.
| Aspect | Model-Driven (Claude Decides) | Pre-Configured (Static Flowchart) |
|---|---|---|
| How "next step" is determined | Claude reasons about tool results, conversation context, and instructions to decide the next action dynamically | Your code checks conditions and routes to the next step via if/else or a state machine |
| Flexibility | High β handles unexpected inputs gracefully | Low β breaks on edge cases not anticipated in the flowchart |
| Predictability | Probabilistic β may vary across runs | Deterministic β same input β same path |
| Best for | Open-ended tasks: research, customer support, code review | Compliance-critical workflows: identity verification before financial ops |
| Risk | LLM may skip steps if not enforced programmatically | Cannot handle novel situations not encoded in the flowchart |
| Claude SDK equivalent | Agentic loop with rich system prompt + tools | Hooks + programmatic prerequisites (Task 1.4 topic) |
Model-driven is like GPS navigation. You state the destination, and the GPS dynamically computes the best route, re-routes if there's traffic, and adapts to road closures β but it might occasionally take a weird detour if its map data is wrong.
Pre-configured is like a fixed train route. It always stops at the same stations in the same order β completely predictable β but if your destination isn't on the line, you can't get there.
The best production systems combine both: use model-driven reasoning for the intelligence layer, and programmatic enforcement (hooks, gates) for critical compliance steps.
Below is the canonical agentic loop implementation pattern. Study this structure carefully β the exam tests whether you can identify correct vs incorrect implementations.
import anthropic client = anthropic.Anthropic() def run_agent(user_task: str, tools: list) -> str: messages = [{"role": "user", "content": user_task}] while True: # Loop continues until break # STEP 1: Send request to Claude with full history response = client.messages.create( model="claude-opus-4-5", max_tokens=8096, # Set high enough for complex tool use responses tools=tools, messages=messages ) # STEP 2: Append Claude's response to history messages.append({ "role": "assistant", "content": response.content }) # STEP 3: Inspect stop_reason β the ONLY correct termination check if response.stop_reason == "end_turn": # Task complete β extract and return text response for block in response.content: if hasattr(block, "text"): return block.text elif response.stop_reason == "tool_use": # STEP 4: Execute each requested tool tool_results = [] for block in response.content: if block.type == "tool_use": result = dispatch_tool(block.name, block.input) tool_results.append({ "type": "tool_result", "tool_use_id": block.id, "content": str(result) }) # Append all tool results as a user message messages.append({ "role": "user", "content": tool_results }) # Loop continues β back to STEP 1 elif response.stop_reason == "max_tokens": # Response was truncated β do NOT treat as end_turn # Options: increase max_tokens, summarise context, or raise an error raise RuntimeError("Response truncated (max_tokens hit). Increase token budget or reduce prompt size.") else: # stop_sequence or other unexpected stop reason raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")
You'll notice there's no iteration counter. That's intentional β the loop exits exactly when Claude decides the task is done, which is the correct pattern. The task statement explicitly says that setting arbitrary iteration caps as the primary stopping mechanism is an anti-pattern. That said, you may add a safety cap as a secondary failsafe for runaway loops β just not as the primary termination logic.
The exam explicitly calls out three anti-patterns. Memorise these β they appear in distractor answers:
Checking if Claude's response text contains phrases like "I have completed the task" or "Done!" to decide when to stop. Claude is probabilistic β it might say "Done" mid-task or omit it at actual completion.
Using if iteration >= 10: break as your main exit condition. The agent may legitimately need 15 steps for complex tasks, and cutting it short produces incomplete results. Caps are only valid as emergency failsafes.
Assuming that if Claude's response contains a text block (not a tool_use block), the task must be done. Claude can produce a text explanation alongside a tool_use in the same response.
if response.stop_reason == "end_turn" is the only reliable termination signal. It is set by the API, not by Claude's text β so it is deterministic and immune to prompt drift.
If you want protection against infinite loops, add if iteration > MAX_STEPS: raise LoopLimitError() after the stop_reason check. It's a safety net, not the primary gate.
Iterate over response.content and check block.type β either "tool_use" or "text". Only treat stop_reason == "end_turn" as the terminal state.
## β WRONG β Parsing natural language if "task complete" in response.content[0].text.lower(): break # DON'T DO THIS ## β WRONG β Arbitrary cap as primary stop for i in range(10): response = client.messages.create(...) # This exits after 10 rounds regardless ## β WRONG β Checking for text as terminal indicator if any(b.type == "text" for b in response.content): return response.content[0].text # Text can exist alongside tool_use! ## β CORRECT β Use stop_reason exclusively while True: response = client.messages.create(...) if response.stop_reason == "end_turn": break # The API told us we're done elif response.stop_reason == "tool_use": # Execute tools, append, continue ...
The primary exam scenario for this task is: "You are building a customer support resolution agent using the Claude Agent SDK. The agent handles high-ambiguity requests like returns, billing disputes, and account issues. It has access to your backend systems through custom MCP tools (get_customer, lookup_order, process_refund, escalate_to_human). Your target is 80%+ first-contact resolution while knowing when to escalate."
Questions test your ability to identify bugs where the agent: (1) stops too early before completing the task, (2) fails to append tool results correctly causing Claude to re-request already-completed tools, (3) uses text parsing instead of stop_reason to detect completion, or (4) exits prematurely on a max_tokens truncation instead of handling it.
"tool_use" vs "end_turn" β these are the two values that drive your agentic loop. "max_tokens" must be handled separately (do not treat as end_turn). Never parse text content to decide loop termination.tool_use message AND the user's tool_result message with a matching tool_use_id before each loop iteration.type: "tool_use" content blocks, execute each tool, collect all results, append as a single user message, then loop back. Never respond early without executing all requested tools.get_customer and lookup_order simultaneously). Handle all tool_use blocks, collect all results, return them in one user message. The tool_use_id links each result to its request.