📅 Day 22⏱ 55 min🔥 Ascend🤖 Bedrock

AWS Bedrock
MCP Server

The most powerful MCP pattern: using Bedrock as a tool backend so Claude can invoke Claude — recursive AI augmentation. Also covers Knowledge Bases as MCP resources, streaming Bedrock responses through SSE, and caching for cost optimization.

Claude calling Claude via MCP is not a gimmick — it's a superpower. The outer Claude (in Claude Code or Claude Desktop) handles the conversation and orchestration. The inner Claude (via Bedrock) handles specialized tasks: translation, code review, document analysis — each isolated, with its own system prompt and context window.
📋 Today's topics
🤖 Bedrock Overview

Bedrock as MCP Tool Backend

Amazon Bedrock gives you access to 30+ foundation models (Claude, Llama, Titan, Mistral, Stable Diffusion) via a unified API. When you put Bedrock behind an MCP tool, you give your MCP clients access to specialized AI capabilities without those clients needing direct Bedrock credentials.

Bedrock FeatureMCP Tool TypeUse Case
Converse APITool (invoke)Chat with any model, Claude-in-Claude
Knowledge BasesResource (read)RAG over your enterprise documents
InvokeModelWithResponseStreamStreaming ToolReal-time Bedrock output in MCP SSE
Bedrock AgentsTool (invoke)Delegate to a pre-built Bedrock Agent
Titan EmbeddingsTool (compute)Generate embeddings for semantic search
🔄 Claude-in-Claude

Converse API Tool — Claude-within-Claude

The bedrock_converse MCP tool lets the orchestrating Claude call another Claude instance with a specialized system prompt. The inner Claude acts as a domain expert — code reviewer, legal document analyzer, translation engine — without polluting the outer conversation's context window.

import boto3, json
from fastmcp import FastMCP

mcp     = FastMCP("BedrockMCP")
bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

SPECIALISTS = {
    "code_reviewer": "You are a senior software engineer. Review the provided code for bugs, security issues, and best practices. Be specific and actionable.",
    "legal_analyst": "You are a legal document analyst. Identify key clauses, obligations, risks, and red flags in the provided text.",
    "translator_es": "You are a professional Spanish translator. Translate the provided text naturally and accurately.",
    "data_scientist": "You are a senior data scientist. Analyze the provided data or code and suggest statistical insights and improvements.",
}

@mcp.tool()
async def bedrock_specialist(
    specialist: str,
    content: str,
    model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"
) -> str:
    """Invoke a Bedrock specialist Claude with a domain-specific system prompt."""
    if specialist not in SPECIALISTS:
        return f"Unknown specialist. Available: {list(SPECIALISTS.keys())}"

    system_prompt = SPECIALISTS[specialist]
    response = bedrock.converse(
        modelId=model_id,
        system=[{"text": system_prompt}],
        messages=[{"role": "user", "content": [{"text": content}]}],
        inferenceConfig={"maxTokens": 2000, "temperature": 0.3}
    )
    return response["output"]["message"]["content"][0]["text"]
💡
Real usage pattern

Claude in Claude Code calls bedrock_specialist("code_reviewer", my_code). The inner Bedrock Claude analyzes the code with a focused system prompt, returns findings. The outer Claude synthesizes and presents. Each has its own context — no cross-contamination.

📚 Knowledge Bases

Knowledge Bases as MCP Resources

Amazon Bedrock Knowledge Bases provide managed RAG (Retrieval Augmented Generation) over your documents — S3 PDFs, Confluence pages, SharePoint, databases. Expose them as MCP resources so Claude can retrieve relevant enterprise knowledge on demand.

bedrock_agent = boto3.client("bedrock-agent-runtime")
KB_ID = "your-knowledge-base-id"

@mcp.tool()
async def knowledge_base_search(query: str, num_results: int = 5) -> str:
    """Search your Bedrock Knowledge Base with semantic search."""
    resp = bedrock_agent.retrieve(
        knowledgeBaseId=KB_ID,
        retrievalQuery={"text": query},
        retrievalConfiguration={
            "vectorSearchConfiguration": {"numberOfResults": num_results}
        }
    )
    results = []
    for r in resp["retrievalResults"]:
        results.append({
            "content": r["content"]["text"],
            "score": round(r["score"], 3),
            "source": r["location"].get("s3Location", {}).get("uri", "unknown")
        })
    return json.dumps({"query": query, "results": results}, indent=2)
⚡ Streaming

Streaming Bedrock via SSE MCP

Bedrock's InvokeModelWithResponseStream lets you stream tokens as they're generated. Pipe this through your MCP SSE transport for a real-time typing effect on long responses — critical for user experience with 1000+ token outputs.

@mcp.tool()
async def bedrock_stream(prompt: str, model_id: str = "anthropic.claude-3-5-sonnet-20241022-v2:0"):
    """Stream tokens from Bedrock through MCP SSE — real-time output."""
    body = json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 2000,
        "messages": [{"role": "user", "content": prompt}]
    })
    stream_resp = bedrock.invoke_model_with_response_stream(
        modelId=model_id, body=body,
        contentType="application/json", accept="application/json"
    )
    for event in stream_resp["body"]:
        chunk = json.loads(event["chunk"]["bytes"])
        if chunk.get("type") == "content_block_delta":
            yield chunk["delta"].get("text", "")  # streams each token chunk
💰 Cost Optimization

Cost Optimization with DynamoDB Cache

Bedrock charges per token. Cache frequently requested results in DynamoDB with a TTL. A semantic cache (using vector similarity to match similar queries) can reduce costs by 40-70% for workloads with repetitive queries.

import hashlib, time
dynamo = boto3.resource("dynamodb")
cache_table = dynamo.Table("bedrock-response-cache")

async def cached_bedrock(prompt: str, specialist: str, ttl_hours: int = 24) -> str:
    # Create deterministic cache key
    cache_key = hashlib.sha256(f"{specialist}:{prompt}".encode()).hexdigest()
    # Check cache
    cached = cache_table.get_item(Key={"cache_key": cache_key}).get("Item")
    if cached and cached["expiry"] > int(time.time()):
        return cached["response"]  # Cache hit — $0 Bedrock cost
    # Cache miss — call Bedrock
    result = await bedrock_specialist(specialist, prompt)
    # Store in cache with TTL
    cache_table.put_item(Item={
        "cache_key": cache_key,
        "response": result,
        "expiry": int(time.time()) + (ttl_hours * 3600)
    })
    return result
🏗️ Architecture

Production Bedrock MCP Architecture

A complete production deployment: ECS Fargate runs the Bedrock MCP server with an IAM role that has bedrock:InvokeModel, bedrock:Retrieve, and DynamoDB cache access. CloudFront + ALB handles SSE connections. X-Ray traces every Bedrock call for cost attribution.

Bedrock MCP Production Stack
🤖
Claude Code
MCP Client
ECS Fargate
Bedrock MCP Server
🧠
Amazon Bedrock
Claude / Knowledge Bases
🗄️
DynamoDB
Response Cache

🧠 Knowledge Check — Day 22

4 questions on AWS Bedrock MCP patterns

Q1/4
What is the key benefit of "Claude-within-Claude" via Bedrock MCP?
AIt's cheaper than a single Claude call
BEach specialist Claude has its own system prompt and context window, keeping concerns isolated
CBedrock is faster than the Anthropic API
DIt bypasses the need for tool definitions
✅ B. Each inner Claude instance has its own system prompt tailored to a domain (code review, legal analysis, translation) and runs in its own isolated context window. The outer Claude orchestrates without polluting its own context.
Q2/4
What does Amazon Bedrock Knowledge Bases provide for MCP?
AA way to store MCP server configurations
BManaged RAG over enterprise documents — semantic search without building your own vector infrastructure
CA backup for DynamoDB
DLambda function storage
✅ B. Bedrock Knowledge Bases handle embedding generation, vector storage (OpenSearch), and retrieval — letting you expose semantic search over enterprise documents as a simple MCP tool without managing vector infrastructure.
Q3/4
How can you reduce Bedrock costs in an MCP server by 40-70%?
AUse a smaller model
BCache responses in DynamoDB keyed by a hash of the prompt+specialist, with a TTL
CReduce max_tokens to 100
DUse spot instances
✅ B. A response cache keyed by SHA-256(specialist+prompt) stores results in DynamoDB. Identical or similar requests get instant cached responses at $0 Bedrock cost. A 24-hour TTL balances freshness and cost savings.
Q4/4
What API call enables real-time token streaming from Bedrock through an MCP SSE server?
Abedrock.stream_model()
Bbedrock.invoke_model_with_response_stream() combined with yield in an async generator MCP tool
Cbedrock.converse_stream()
Dlambda.invoke_with_stream()
✅ B. invoke_model_with_response_stream() returns a streaming body. Iterating over the chunks and yielding each delta text in an MCP async generator tool pipes the stream through SSE transport to the client in real time.