For mission-critical tasks (legal drafting, safety auditing), a single AI turn is a single point of failure. This task explores Parallel and Sequential ensembles. You will learn to architect systems that run multiple independent Claude instances and sequential review chains to eliminate hallucinations through consensus and criticism.
Imagine you have a complex medical condition. Standard Prompting is like asking one doctor for a diagnosis. If they missed one day of class, their advice might be wrong. Single point of failure.
Multi-Instance Architecture is like asking 3 independent doctors in 3 different cities. If they all arrive at the same diagnosis (Consensus), your confidence is near 100%.
Multi-Pass Architecture is like the 1st doctor doing the surgery, and a 2nd "Senior Specialist" watching the monitor and Saying, "Watch that artery!" (Critic/Reviewer).
Architects must decide if they need Variety (Parallel) or Depth (Sequential).
| Method | Structure | Best For | Goal |
|---|---|---|---|
| Parallel | Claude A, B, and C run in tandem. | Fast extraction, classification. | Consensus (remove outliers). |
| Sequential | A → B → C (Chain). | Summarization, coding, legal. | Refinement (improve quality). |
Run 3 instances of Claude-3-Sonnet with temperature: 0.7 using varied role prompts (e.g. "Skeptical Auditor" vs "Efficient Developer"). Compare results in your code.
If Instance 1 and 2 extract the number $500, but Instance 3 extracts $550, your aggregator code should discard Instance 3 as an outlier. This is a common pattern for high-accuracy financial extraction.
If you have limited tokens but need high accuracy, use **Self-Debate**. Prompt Claude to simulate two internal personas arguing over a complex decision before it outputs the final XML.
<thinking> - Persongen A: Suggests we use the 'strict' parser for security. - Persona B: Objects, saying it might break legacy data. - Synthesis: Resolve to use 'strict' but with a fallback handler. </thinking>
The "Consensus Algorithm" should escalate to a human if fewer than 2/3 of instances agree on a specific field value.
Assign a 2nd turn to a "Reviewer" prompt. Give the 1st agent's output and the original source to the 2nd larger instance (e.g. Claude-3-Opus) and say: "Flag any claim NOT supported by the source XML."
Best Practice: Use the larger Claude-3-Sonnet for the 1st draft, and the fast Claude-3-Haiku for checking simple syntax errors. For logical errors, the Checker must be at least as large as the Creator.
Running 3 instances with the exact same system prompt. Claude will arrive at the same hallucination. Fix: Vary the roles.
Using multi-instance for simple tasks. Fix: Only trigger if the 1st agent expresses low confidence.
Letting two agents argue back and forth without a termination condition. Fix: Limit to 2 passes.
Using Haiku to judge the reasoning of Sonnet. Haiku will miss the subtle logic.
Scenario: You're building an automated summarizer for legal contracts. The system MUST not miss any "Right of First Refusal" clauses. Accuracy is prioritize over cost.
Question: What is the most architecturally sound approach?
Correct Answer: B. A multi-pass review architecture provides a "Self-Check" system that significantly reduces omissions.