Real production systems can't rely on Claude's judgment alone for every step. Some stages require mandatory human approval, compliance gates, or verifiable prerequisites before proceeding. This task statement teaches you how to enforce those constraints programmatically โ so no clever prompt can convince Claude (or your system) to skip a mandatory step.
At airport security, no matter how persuasive you are, the guard cannot wave you through without a valid boarding pass and ID scan. The gate is programmatic โ it checks a database, not your argument.
Multi-step workflow enforcement works the same way. Claude might reason that "since the customer has been a loyal user for 10 years, we could skip the identity verification step." But the enforcement gate in your code intercepts that tool call, checks whether identity verification has actually completed, and blocks progression if it hasn't โ regardless of Claude's reasoning.
Gates enforce compliance through code, not conversation. This is why enforcement patterns exist as a distinct architectural concept from model-driven progression.
Understanding the distinction between these two modes of workflow control is foundational to this task statement โ and directly tested on the exam.
| Dimension | Model-Driven | Rule-Enforced |
|---|---|---|
| Who decides next step | Claude reasons about context and chooses the next action | Your code checks a condition and either permits or blocks the next step |
| Can it be bypassed? | Yes โ by providing Claude with persuasive reasoning | No โ programmatic gate ignores Claude's reasoning entirely |
| Best for | Flexible research tasks, open-ended workflows | Compliance steps, financial transactions, irreversible actions |
| Failure mode | Claude skips steps if distracted by strong context | Gate has a bug โ false blocks or false passes |
| SDK implementation | Agentic loop with rich system prompt | Hooks intercept tool calls and enforce prerequisites |
| Audit trail | Limited โ depends on Claude's output | Strong โ gate records every check with timestamp and result |
Production systems use model-driven reasoning for the intelligence layer (Claude decides which tool to call, how to interpret results, what to research next) and rule-enforced gates for mandatory checkpoints (KYC verification before account funding, human approval before code deployment, legal review before contract signing). Never rely on Claude's judgment alone for irreversible or high-risk actions.
Every well-designed workflow stage has two programmatic definitions: what must be true before the stage begins, and what must be true for the stage to be considered complete. Without these, Claude can enter a stage prematurely or declare completion too early.
| Stage | Entry Condition (must be true to enter) | Exit Criteria (must be true to proceed) |
|---|---|---|
| 1. Collect Info | New session started | All required fields present AND validated (format, not empty) |
| 2. KYC Check | Stage 1 exit criteria met + fields_validated == True | kyc_status == "PASSED" in the database (not just Claude's assertion) |
| 3. Risk Assessment | kyc_status == "PASSED" | risk_tier assigned AND (if high) human approval recorded |
| 4. Account Setup | Risk assessment complete + approval if needed | account_id exists in database |
| 5. Welcome | account_id exists | Confirmation email sent |
The enforcement gate must query your system of record โ not parse Claude's message to see if it said "KYC passed." Claude could be wrong, hallucinate, or be manipulated. The gate checks db.get_kyc_status(customer_id) == "PASSED". This is non-negotiable for production compliance systems.
The exam guide states: "Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed." The concrete example is: blocking process_refund until get_customer has returned a verified customer ID.
This means your enforcement hook intercepts any call to process_refund, checks whether customer_verified == True in the workflow state store, and blocks it if not. This is a pre-tool hook โ not a system prompt instruction. The hook fires before Claude's tool call reaches your backend.
SDK hooks are interceptors that fire before or after a tool call. The pre-tool hook is what makes enforcement gates work: it receives the tool call Claude wants to make, checks whether the required prerequisite is satisfied, and either allows or blocks the call.
from anthropic_agent_sdk import AgentHook, ToolCallContext class WorkflowEnforcementHook(AgentHook): def __init__(self, state_store): self.state = state_store async def pre_tool_call(self, ctx: ToolCallContext): tool_name = ctx.tool_name customer_id = ctx.session_data.get("customer_id") # Enforce: setup_account requires KYC + risk assessment + human approval if tool_name == "setup_account": wf = self.state.get_workflow_state(customer_id) if not wf.get("kyc_verified"): # Block โ return error as tool_result return ctx.block( error="PREREQUISITE_FAILED: KYC verification not completed." " Complete step 2 before account setup." ) if wf.get("risk_tier") == "high" and not wf.get("human_approved"): return ctx.block( error="PREREQUISITE_FAILED: High-risk customer requires human approval." " Awaiting approval from compliance team." ) # Allow the tool call to proceed return ctx.allow()
When a gate blocks a tool call, the error message returned as tool_result is what Claude reads next. Make it specific and actionable: "KYC verification not completed. Complete step 2 first." โ not a generic "Prerequisite failed." This allows Claude to correctly inform the user about what's needed, rather than hallucinating an explanation.
Some decisions are too high-stakes, legally sensitive, or contextually ambiguous for any AI system to make autonomously. Human-in-the-loop (HiTL) checkpoints pause workflow execution and wait for explicit human approval before continuing.
โข Financial transactions above a threshold (e.g., >$10,000)
โข Irreversible changes (deletion, mass emails, contract signing)
โข High-risk customer classifications
โข Legal or regulatory exceptions
โข Ambiguous refund decisions above policy limits
1. Workflow reaches HiTL checkpoint
2. System creates a pending_approval record in state store
3. Notification sent to human reviewer (email, Slack, dashboard)
4. Workflow suspends (session persists)
5. Human reviews and clicks Approve/Reject
6. State store updated: human_approved = true
7. Workflow resumes from checkpoint
async def handle_hitl_checkpoint(customer_id, risk_data, session): # 1. Suspend the workflow โ persist state await state_store.suspend_workflow( customer_id=customer_id, stage="risk_review", context={ "risk_tier": risk_data["tier"], "risk_factors": risk_data["factors"], "session_id": session.id } ) # 2. Notify human reviewer await notifications.send_review_request( reviewer="compliance-team@company.com", customer_id=customer_id, review_url=f"https://dashboard/review/{customer_id}", summary=risk_data["summary"] ) # 3. Return status โ Claude tells user what's happening return { "status": "AWAITING_HUMAN_REVIEW", "message": "Your application requires manual review. You will be notified within 24h.", "eta_hours": 24 } # When human approves (webhook callback): async def on_human_approval(customer_id, approved: bool, reviewer_notes: str): if approved: await state_store.update(customer_id, {"human_approved": True, "reviewer_notes": reviewer_notes}) await session_manager.resume(customer_id) # Resume the suspended session else: await handle_rejection(customer_id, reviewer_notes)
When a workflow suspends for human review, the session may be idle for hours or days. Design your state store to preserve the complete workflow context at the suspension point โ not just a flag. When resumed, Claude must be reconstructed with the full history so it doesn't lose context. Restarting from scratch wastes resources and confuses users who already completed earlier steps.
A handoff package is a structured data payload that carries all relevant state, context, and outputs from one workflow stage to the next. It ensures the receiving stage (or agent) has everything it needs without querying upstream systems again.
{
"handoff_id": "ho_abc123",
"from_stage": "kyc_verification",
"to_stage": "risk_assessment",
"completed_at": "2024-03-20T14:32:00Z",
/* Business outputs from this stage */
"outputs": {
"kyc_status": "PASSED",
"identity_confirmed": true,
"verification_provider": "Jumio",
"verification_reference": "jum_789xyz"
},
/* Carry-forward context from earlier stages */
"customer_profile": {
"customer_id": "cust_456",
"name": "Aarav Shah",
"dob": "1985-07-14",
"country": "IN",
"requested_product": "premium_savings"
},
/* Gate compliance evidence โ for audit trail */
"gate_evidence": {
"gate_id": "gate_kyc_exit",
"checked_at": "2024-03-20T14:31:58Z",
"checked_by": "WorkflowEnforcementHook v2.1",
"result": "PASSED"
}
}
| Handoff Package Field | Purpose | Required? |
|---|---|---|
handoff_id | Unique ID for this specific state transition โ enables replay and audit | Always |
from_stage / to_stage | Explicit stage labelling โ prevents handoff being applied to wrong stage | Always |
outputs | Results produced by the completed stage | Always |
customer_profile | Carry-forward context so receiving stage doesn't need to re-fetch | Always |
gate_evidence | Audit trail proving the gate was actually checked | Compliance systems |
rollback_data | Snapshot of state before this stage ran โ enables rollback | Reversible stages |
When a stage fails partway through, you need to undo any partial changes and return the system to a consistent state. This requires capturing a rollback snapshot before each stage begins and an explicit rollback procedure for each stage type.
Before running each stage, save the current workflow state and any affected DB records to a rollback_snapshots table. If the stage fails, restore from the snapshot. Tag each snapshot with the stage name and timestamp.
For stages that created external records (account created, email sent), a snapshot alone isn't enough. Define a compensating action: delete the account, send a cancellation email. The rollback procedure must call the compensating transaction, not just restore local state.
Roll back only the failed stage, not the entire workflow. If Stage 4 (Account Setup) fails, KYC (Stage 2) and Risk Assessment (Stage 3) don't need to be rolled back โ just retry Stage 4. State store tracks which stages are committed.
After rollback, the workflow must notify both Claude (via a tool_result error message describing what was rolled back) and the user (via a user-facing message). Claude can then re-attempt the failed stage or escalate to human support.
async def execute_stage_with_rollback(stage_name, execute_fn, rollback_fn, state): # 1. Take snapshot before executing snapshot = state.snapshot(stage_name) try: result = await execute_fn() state.mark_stage_complete(stage_name, result) return {"status": "SUCCESS", "result": result} except Exception as e: # 2. Execute compensating transaction await rollback_fn(snapshot) state.restore_snapshot(snapshot) state.mark_stage_failed(stage_name, str(e)) # 3. Return structured error for Claude to act on return { "status": "ROLLED_BACK", "failed_stage": stage_name, "error": str(e), "rolled_back_to": snapshot["previous_stage"], "retry_allowed": True }
Checking if Claude's response contains "KYC is complete" instead of querying the database. Claude can be wrong, context-confused, or manipulated through clever user prompts.
Writing "Do not call setup_account before KYC is done" in the system prompt. This relies on Claude following instructions โ a persuasive user can bypass it. Gates must be in code, not prompts.
Creating accounts, sending emails, or charging cards without a compensating transaction defined. When these stages fail partway, you're left with partial, inconsistent state.
Suspending for human review but not persisting session state. When the human approves and the workflow resumes, Claude has no context and starts over โ forcing the user to re-enter all their data.
All gate checks query the state store (database), never Claude's conversation. The state store is authoritative. Claude's beliefs about state are advisory only.
When a gate blocks, return specific, actionable messages: "KYC not completed โ complete ID verification at step 2." This lets Claude provide correct user guidance without hallucinating explanations.
Include gate_evidence in every handoff package. This creates an immutable audit trail proving each gate was checked, when, and by which version of the enforcement hook.
Design each stage so running it twice produces the same result (idempotency). This makes retry after rollback safe and prevents double-charges, duplicate accounts, or duplicate emails.
The primary exam scenario for Task 1.4 is: "You are building a customer support resolution agent using the Claude Agent SDK. The agent handles high-ambiguity requests like returns, billing disputes, and account issues. It has access to your backend systems through custom MCP tools (get_customer, lookup_order, process_refund, escalate_to_human)."
Questions will present scenarios where a compliance workflow was bypassed, a stage ran out of order, or a rollback didn't work. The answer always traces to: trusting Claude's output instead of querying state, gates in prompts not code, missing HiTL for high-risk actions, or missing rollback snapshot before a stage that performs irreversible writes. Also tested: decomposing multi-concern customer requests into parallel items then synthesizing a unified resolution.
process_refund until get_customer has returned a verified customer ID is the canonical example from the exam guide.customer_id, root cause analysis, refund amount, and recommended action when escalating to human agents who lack access to the conversation transcript. This is the exam guide's exact required handoff content.