The most powerful MCP pattern: using Bedrock as a tool backend so Claude can invoke Claude — recursive AI augmentation. Also covers Knowledge Bases as MCP resources, streaming Bedrock responses through SSE, and caching for cost optimization.
Amazon Bedrock gives you access to 30+ foundation models (Claude, Llama, Titan, Mistral, Stable Diffusion) via a unified API. When you put Bedrock behind an MCP tool, you give your MCP clients access to specialized AI capabilities without those clients needing direct Bedrock credentials.
| Bedrock Feature | MCP Tool Type | Use Case |
|---|---|---|
| Converse API | Tool (invoke) | Chat with any model, Claude-in-Claude |
| Knowledge Bases | Resource (read) | RAG over your enterprise documents |
| InvokeModelWithResponseStream | Streaming Tool | Real-time Bedrock output in MCP SSE |
| Bedrock Agents | Tool (invoke) | Delegate to a pre-built Bedrock Agent |
| Titan Embeddings | Tool (compute) | Generate embeddings for semantic search |
The bedrock_converse MCP tool lets the orchestrating Claude call another Claude instance with a specialized system prompt. The inner Claude acts as a domain expert — code reviewer, legal document analyzer, translation engine — without polluting the outer conversation's context window.
import boto3, json from fastmcp import FastMCP mcp = FastMCP("BedrockMCP") bedrock = boto3.client("bedrock-runtime", region_name="us-east-1") SPECIALISTS = { "code_reviewer": "You are a senior software engineer. Review the provided code for bugs, security issues, and best practices. Be specific and actionable.", "legal_analyst": "You are a legal document analyst. Identify key clauses, obligations, risks, and red flags in the provided text.", "translator_es": "You are a professional Spanish translator. Translate the provided text naturally and accurately.", "data_scientist": "You are a senior data scientist. Analyze the provided data or code and suggest statistical insights and improvements.", } @mcp.tool() async def bedrock_specialist( specialist: str, content: str, model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0" ) -> str: """Invoke a Bedrock specialist Claude with a domain-specific system prompt.""" if specialist not in SPECIALISTS: return f"Unknown specialist. Available: {list(SPECIALISTS.keys())}" system_prompt = SPECIALISTS[specialist] response = bedrock.converse( modelId=model_id, system=[{"text": system_prompt}], messages=[{"role": "user", "content": [{"text": content}]}], inferenceConfig={"maxTokens": 2000, "temperature": 0.3} ) return response["output"]["message"]["content"][0]["text"]
Claude in Claude Code calls bedrock_specialist("code_reviewer", my_code). The inner Bedrock Claude analyzes the code with a focused system prompt, returns findings. The outer Claude synthesizes and presents. Each has its own context — no cross-contamination.
Amazon Bedrock Knowledge Bases provide managed RAG (Retrieval Augmented Generation) over your documents — S3 PDFs, Confluence pages, SharePoint, databases. Expose them as MCP resources so Claude can retrieve relevant enterprise knowledge on demand.
bedrock_agent = boto3.client("bedrock-agent-runtime") KB_ID = "your-knowledge-base-id" @mcp.tool() async def knowledge_base_search(query: str, num_results: int = 5) -> str: """Search your Bedrock Knowledge Base with semantic search.""" resp = bedrock_agent.retrieve( knowledgeBaseId=KB_ID, retrievalQuery={"text": query}, retrievalConfiguration={ "vectorSearchConfiguration": {"numberOfResults": num_results} } ) results = [] for r in resp["retrievalResults"]: results.append({ "content": r["content"]["text"], "score": round(r["score"], 3), "source": r["location"].get("s3Location", {}).get("uri", "unknown") }) return json.dumps({"query": query, "results": results}, indent=2)
Bedrock's InvokeModelWithResponseStream lets you stream tokens as they're generated. Pipe this through your MCP SSE transport for a real-time typing effect on long responses — critical for user experience with 1000+ token outputs.
@mcp.tool() async def bedrock_stream(prompt: str, model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"): """Stream tokens from Bedrock through MCP SSE — real-time output.""" body = json.dumps({ "anthropic_version": "bedrock-2023-05-31", "max_tokens": 2000, "messages": [{"role": "user", "content": prompt}] }) stream_resp = bedrock.invoke_model_with_response_stream( modelId=model_id, body=body, contentType="application/json", accept="application/json" ) for event in stream_resp["body"]: chunk = json.loads(event["chunk"]["bytes"]) if chunk.get("type") == "content_block_delta": yield chunk["delta"].get("text", "") # streams each token chunk
Bedrock charges per token. Cache frequently requested results in DynamoDB with a TTL. A semantic cache (using vector similarity to match similar queries) can reduce costs by 40-70% for workloads with repetitive queries.
import hashlib, time dynamo = boto3.resource("dynamodb") cache_table = dynamo.Table("bedrock-response-cache") async def cached_bedrock(prompt: str, specialist: str, ttl_hours: int = 24) -> str: # Create deterministic cache key cache_key = hashlib.sha256(f"{specialist}:{prompt}".encode()).hexdigest() # Check cache cached = cache_table.get_item(Key={"cache_key": cache_key}).get("Item") if cached and cached["expiry"] > int(time.time()): return cached["response"] # Cache hit — $0 Bedrock cost # Cache miss — call Bedrock result = await bedrock_specialist(specialist, prompt) # Store in cache with TTL cache_table.put_item(Item={ "cache_key": cache_key, "response": result, "expiry": int(time.time()) + (ttl_hours * 3600) }) return result
A complete production deployment: ECS Fargate runs the Bedrock MCP server with an IAM role that has bedrock:InvokeModel, bedrock:Retrieve, and DynamoDB cache access. CloudFront + ALB handles SSE connections. X-Ray traces every Bedrock call for cost attribution.
4 questions on AWS Bedrock MCP patterns