📚 Domain 1 · Task Statement 1.4

Implement Multi-Step Workflows with Enforcement & Handoff Patterns

📊 Domain Weight: 27% ⭐ Critical for Production Systems 🔗 Scenario: Customer Support Resolution Agent

Real production systems can't rely on Claude's judgment alone for every step. Some stages require mandatory human approval, compliance gates, or verifiable prerequisites before proceeding. This task statement teaches you how to enforce those constraints programmatically — so no clever prompt can convince Claude (or your system) to skip a mandatory step.

📋 Contents

Analogy — The Airport Security Checkpoint
Model-Driven vs Rule-Enforced Workflow Progression
Workflow Stages: Entry Conditions & Exit Criteria
Enforcement Gates & SDK Hooks
Human-in-the-Loop Checkpoints
Handoff Packages — Transferring State Between Stages
Rollback Mechanisms for Failed Stages
Anti-Patterns & Pro Tips
Summary & Exam Key Points

✈️ Analogy — The Airport Security Checkpoint

✈️ Analogy — You Cannot Charm Your Way Through Security

At airport security, no matter how persuasive you are, the guard cannot wave you through without a valid boarding pass and ID scan. The gate is programmatic — it checks a database, not your argument.

Multi-step workflow enforcement works the same way. Claude might reason that "since the customer has been a loyal user for 10 years, we could skip the identity verification step." But the enforcement gate in your code intercepts that tool call, checks whether identity verification has actually completed, and blocks progression if it hasn't — regardless of Claude's reasoning.

Gates enforce compliance through code, not conversation. This is why enforcement patterns exist as a distinct architectural concept from model-driven progression.

⚖️ Model-Driven vs Rule-Enforced Workflow Progression

Understanding the distinction between these two modes of workflow control is foundational to this task statement — and directly tested on the exam.

Dimension	Model-Driven	Rule-Enforced
Who decides next step	Claude reasons about context and chooses the next action	Your code checks a condition and either permits or blocks the next step
Can it be bypassed?	Yes — by providing Claude with persuasive reasoning	No — programmatic gate ignores Claude's reasoning entirely
Best for	Flexible research tasks, open-ended workflows	Compliance steps, financial transactions, irreversible actions
Failure mode	Claude skips steps if distracted by strong context	Gate has a bug → false blocks or false passes
SDK implementation	Agentic loop with rich system prompt	Hooks intercept tool calls and enforce prerequisites
Audit trail	Limited — depends on Claude's output	Strong — gate records every check with timestamp and result

💡 The Correct Architecture — Combine Both

Production systems use model-driven reasoning for the intelligence layer (Claude decides which tool to call, how to interpret results, what to research next) and rule-enforced gates for mandatory checkpoints (KYC verification before account funding, human approval before code deployment, legal review before contract signing). Never rely on Claude's judgment alone for irreversible or high-risk actions.

🏗️ Workflow Stages: Entry Conditions & Exit Criteria

Every well-designed workflow stage has two programmatic definitions: what must be true before the stage begins, and what must be true for the stage to be considered complete. Without these, Claude can enter a stage prematurely or declare completion too early.

Figure 1 — 5-Stage Financial Onboarding Workflow with Enforcement Gates

Defining Entry Conditions and Exit Criteria

Stage	Entry Condition (must be true to enter)	Exit Criteria (must be true to proceed)
1. Collect Info	New session started	All required fields present AND validated (format, not empty)
2. KYC Check	Stage 1 exit criteria met + `fields_validated == True`	`kyc_status == "PASSED"` in the database (not just Claude's assertion)
3. Risk Assessment	`kyc_status == "PASSED"`	`risk_tier` assigned AND (if high) human approval recorded
4. Account Setup	Risk assessment complete + approval if needed	`account_id` exists in database
5. Welcome	`account_id` exists	Confirmation email sent

⚠️ Critical: Gate Checks the Database, NOT Claude's Output

The enforcement gate must query your system of record — not parse Claude's message to see if it said "KYC passed." Claude could be wrong, hallucinate, or be manipulated. The gate checks db.get_kyc_status(customer_id) == "PASSED". This is non-negotiable for production compliance systems.

💡 Exam Guide — Programmatic Prerequisites for the Customer Support Agent

The exam guide states: "Implementing programmatic prerequisites that block downstream tool calls until prerequisite steps have completed." The concrete example is: blocking process_refund until get_customer has returned a verified customer ID.

This means your enforcement hook intercepts any call to process_refund, checks whether customer_verified == True in the workflow state store, and blocks it if not. This is a pre-tool hook — not a system prompt instruction. The hook fires before Claude's tool call reaches your backend.

🔒 Enforcement Gates & SDK Hooks

SDK hooks are interceptors that fire before or after a tool call. The pre-tool hook is what makes enforcement gates work: it receives the tool call Claude wants to make, checks whether the required prerequisite is satisfied, and either allows or blocks the call.

Figure 2 — Pre-Tool Hook Intercepting a Forbidden Jump

Python — Pre-Tool Hook Enforcing Prerequisites

from anthropic_agent_sdk import AgentHook, ToolCallContext

class WorkflowEnforcementHook(AgentHook):
    def __init__(self, state_store):
        self.state = state_store

    async def pre_tool_call(self, ctx: ToolCallContext):
        tool_name = ctx.tool_name
        customer_id = ctx.session_data.get("customer_id")

        # Enforce: setup_account requires KYC + risk assessment + human approval
        if tool_name == "setup_account":
            wf = self.state.get_workflow_state(customer_id)

            if not wf.get("kyc_verified"):
                # Block — return error as tool_result
                return ctx.block(
                    error="PREREQUISITE_FAILED: KYC verification not completed."
                         " Complete step 2 before account setup."
                )

            if wf.get("risk_tier") == "high" and not wf.get("human_approved"):
                return ctx.block(
                    error="PREREQUISITE_FAILED: High-risk customer requires human approval."
                         " Awaiting approval from compliance team."
                )

        # Allow the tool call to proceed
        return ctx.allow()

⭐ Pro Tip — Return Actionable Error Messages

When a gate blocks a tool call, the error message returned as tool_result is what Claude reads next. Make it specific and actionable: "KYC verification not completed. Complete step 2 first." — not a generic "Prerequisite failed." This allows Claude to correctly inform the user about what's needed, rather than hallucinating an explanation.

👤 Human-in-the-Loop Checkpoints

Some decisions are too high-stakes, legally sensitive, or contextually ambiguous for any AI system to make autonomously. Human-in-the-loop (HiTL) checkpoints pause workflow execution and wait for explicit human approval before continuing.

🔵 When to Use HiTL

• Financial transactions above a threshold (e.g., >$10,000)
• Irreversible changes (deletion, mass emails, contract signing)
• High-risk customer classifications
• Legal or regulatory exceptions
• Ambiguous refund decisions above policy limits

🟠 HiTL Implementation Pattern

1. Workflow reaches HiTL checkpoint
2. System creates a pending_approval record in state store
3. Notification sent to human reviewer (email, Slack, dashboard)
4. Workflow suspends (session persists)
5. Human reviews and clicks Approve/Reject
6. State store updated: human_approved = true
7. Workflow resumes from checkpoint

Python — HiTL Checkpoint Pattern

async def handle_hitl_checkpoint(customer_id, risk_data, session):
    # 1. Suspend the workflow — persist state
    await state_store.suspend_workflow(
        customer_id=customer_id,
        stage="risk_review",
        context={
            "risk_tier": risk_data["tier"],
            "risk_factors": risk_data["factors"],
            "session_id": session.id
        }
    )
    # 2. Notify human reviewer
    await notifications.send_review_request(
        reviewer="compliance-team@company.com",
        customer_id=customer_id,
        review_url=f"https://dashboard/review/{customer_id}",
        summary=risk_data["summary"]
    )
    # 3. Return status — Claude tells user what's happening
    return {
        "status": "AWAITING_HUMAN_REVIEW",
        "message": "Your application requires manual review. You will be notified within 24h.",
        "eta_hours": 24
    }

# When human approves (webhook callback):
async def on_human_approval(customer_id, approved: bool, reviewer_notes: str):
    if approved:
        await state_store.update(customer_id, {"human_approved": True, "reviewer_notes": reviewer_notes})
        await session_manager.resume(customer_id)  # Resume the suspended session
    else:
        await handle_rejection(customer_id, reviewer_notes)

⭐ Pro Tip — Design for Session Resumption, Not Restart

When a workflow suspends for human review, the session may be idle for hours or days. Design your state store to preserve the complete workflow context at the suspension point — not just a flag. When resumed, Claude must be reconstructed with the full history so it doesn't lose context. Restarting from scratch wastes resources and confuses users who already completed earlier steps.

📦 Handoff Packages — Transferring State Between Stages

A handoff package is a structured data payload that carries all relevant state, context, and outputs from one workflow stage to the next. It ensures the receiving stage (or agent) has everything it needs without querying upstream systems again.

Anatomy of a Well-Structured Handoff Package

JSON — Financial Onboarding Handoff Package (Stage 2 → Stage 3)

{
  "handoff_id": "ho_abc123",
  "from_stage": "kyc_verification",
  "to_stage": "risk_assessment",
  "completed_at": "2024-03-20T14:32:00Z",
  /* Business outputs from this stage */
  "outputs": {
    "kyc_status": "PASSED",
    "identity_confirmed": true,
    "verification_provider": "Jumio",
    "verification_reference": "jum_789xyz"
  },
  /* Carry-forward context from earlier stages */
  "customer_profile": {
    "customer_id": "cust_456",
    "name": "Aarav Shah",
    "dob": "1985-07-14",
    "country": "IN",
    "requested_product": "premium_savings"
  },
  /* Gate compliance evidence — for audit trail */
  "gate_evidence": {
    "gate_id": "gate_kyc_exit",
    "checked_at": "2024-03-20T14:31:58Z",
    "checked_by": "WorkflowEnforcementHook v2.1",
    "result": "PASSED"
  }
}

Handoff Package Field	Purpose	Required?
`handoff_id`	Unique ID for this specific state transition — enables replay and audit	Always
`from_stage / to_stage`	Explicit stage labelling — prevents handoff being applied to wrong stage	Always
`outputs`	Results produced by the completed stage	Always
`customer_profile`	Carry-forward context so receiving stage doesn't need to re-fetch	Always
`gate_evidence`	Audit trail proving the gate was actually checked	Compliance systems
`rollback_data`	Snapshot of state before this stage ran — enables rollback	Reversible stages

↩️ Rollback Mechanisms for Failed Stages

When a stage fails partway through, you need to undo any partial changes and return the system to a consistent state. This requires capturing a rollback snapshot before each stage begins and an explicit rollback procedure for each stage type.

📸 Snapshot-Before-Execute Pattern

Before running each stage, save the current workflow state and any affected DB records to a rollback_snapshots table. If the stage fails, restore from the snapshot. Tag each snapshot with the stage name and timestamp.

⛔ Compensating Transactions

For stages that created external records (account created, email sent), a snapshot alone isn't enough. Define a compensating action: delete the account, send a cancellation email. The rollback procedure must call the compensating transaction, not just restore local state.

↩️ Partial Rollback (Stage-Level)

Roll back only the failed stage, not the entire workflow. If Stage 4 (Account Setup) fails, KYC (Stage 2) and Risk Assessment (Stage 3) don't need to be rolled back — just retry Stage 4. State store tracks which stages are committed.

🔔 Rollback Notification

After rollback, the workflow must notify both Claude (via a tool_result error message describing what was rolled back) and the user (via a user-facing message). Claude can then re-attempt the failed stage or escalate to human support.

Python — Stage Rollback Pattern

async def execute_stage_with_rollback(stage_name, execute_fn, rollback_fn, state):
    # 1. Take snapshot before executing
    snapshot = state.snapshot(stage_name)

    try:
        result = await execute_fn()
        state.mark_stage_complete(stage_name, result)
        return {"status": "SUCCESS", "result": result}

    except Exception as e:
        # 2. Execute compensating transaction
        await rollback_fn(snapshot)
        state.restore_snapshot(snapshot)
        state.mark_stage_failed(stage_name, str(e))

        # 3. Return structured error for Claude to act on
        return {
            "status": "ROLLED_BACK",
            "failed_stage": stage_name,
            "error": str(e),
            "rolled_back_to": snapshot["previous_stage"],
            "retry_allowed": True
        }

⚠️ Anti-Patterns & Pro Tips

❌ Trusting Claude's Assertion as Gate Evidence

Checking if Claude's response contains "KYC is complete" instead of querying the database. Claude can be wrong, context-confused, or manipulated through clever user prompts.

❌ Gates in the System Prompt Only

Writing "Do not call setup_account before KYC is done" in the system prompt. This relies on Claude following instructions — a persuasive user can bypass it. Gates must be in code, not prompts.

❌ No Rollback for Irreversible Actions

Creating accounts, sending emails, or charging cards without a compensating transaction defined. When these stages fail partway, you're left with partial, inconsistent state.

❌ Blocking HiTL Without Session Persistence

Suspending for human review but not persisting session state. When the human approves and the workflow resumes, Claude has no context and starts over — forcing the user to re-enter all their data.

✅ State Store as Single Source of Truth

All gate checks query the state store (database), never Claude's conversation. The state store is authoritative. Claude's beliefs about state are advisory only.

✅ Actionable Gate Error Messages

When a gate blocks, return specific, actionable messages: "KYC not completed — complete ID verification at step 2." This lets Claude provide correct user guidance without hallucinating explanations.

✅ Gate Evidence in Handoff Packages

Include gate_evidence in every handoff package. This creates an immutable audit trail proving each gate was checked, when, and by which version of the enforcement hook.

✅ Idempotent Stage Execution

Design each stage so running it twice produces the same result (idempotency). This makes retry after rollback safe and prevents double-charges, duplicate accounts, or duplicate emails.

📝 Summary & Exam Key Points

🎯 Exam Scenario — Customer Support Resolution Agent

The primary exam scenario for Task 1.4 is: "You are building a customer support resolution agent using the Claude Agent SDK. The agent handles high-ambiguity requests like returns, billing disputes, and account issues. It has access to your backend systems through custom MCP tools (get_customer, lookup_order, process_refund, escalate_to_human)."

Questions will present scenarios where a compliance workflow was bypassed, a stage ran out of order, or a rollback didn't work. The answer always traces to: trusting Claude's output instead of querying state, gates in prompts not code, missing HiTL for high-risk actions, or missing rollback snapshot before a stage that performs irreversible writes. Also tested: decomposing multi-concern customer requests into parallel items then synthesizing a unified resolution.

The difference between programmatic enforcement and prompt-based guidance. When deterministic compliance is required (e.g., identity verification before financial operations), prompt instructions alone have a non-zero failure rate. Use programmatic hooks for guaranteed compliance.

Every stage needs entry conditions AND exit criteria checked programmatically against the state store — not inferred from Claude's output. Gates must verify actual database state.

Enforcement gates use SDK pre-tool hooks. The hook intercepts the tool call, queries the state store, and either allows or blocks — returning a structured error if blocked. Critically: blocking process_refund until get_customer has returned a verified customer ID is the canonical example from the exam guide.

Decompose multi-concern customer requests into distinct items, then investigate each in parallel using shared context before synthesizing a unified resolution. For example: a customer with 3 billing disputes — decompose into 3 separate invoice investigations run in parallel, then unify the resolution.

Structured handoff summaries for escalation to human agents. Compile: customer_id, root cause analysis, refund amount, and recommended action when escalating to human agents who lack access to the conversation transcript. This is the exam guide's exact required handoff content.

Handoff packages carry complete context between stages. Include outputs, carry-forward profile data, and gate evidence. Receiving agents/stages get everything they need without re-querying upstream systems.

Rollback requires snapshots AND compensating transactions. Restoring local state isn't enough for stages that created external records — you need a compensating action to undo the external effect (e.g., delete the account, send a cancellation email).

Gates in system prompts are insufficient. User prompts can override system prompt instructions through persuasive context. Only programmatic hooks in your backend code provide genuine, deterministic enforcement that cannot be bypassed through conversation.

← Previous Task 1.3 — Subagent Invocation & Context Passing

Next → Task 1.5 — Agent SDK Hooks & Tool Interception