Prompt Details
Model
(claude-4-5-sonnet)
Token size
698
Example input
[AGENTS]: Router, Retriever, Generator, Validator
[WORKFLOW]: User → Router → Retriever → Generator → Validator → User
[DATA_TYPES]: Query (≤500 chars), Documents (<5KB each), Response (≤2000 chars)
[CONSTRAINTS]: p95 latency <3s, Retriever may fail, Validator timeout acceptable
Example output
# Multi-Agent RAG Communication Design
## 0) ASSUMPTIONS (if constraints incomplete)
- **Timeout defaults:**
- Router→Retriever: 1000ms (retrieval is I/O bound, should be fast)
- Retriever→Generator: 1500ms (document processing + LLM context preparation)
- Generator→Validator: 800ms (validation is rule-based, fast)
- Overall system timeout: 2800ms (stays within p95 <3s with buffer)
- **Retry strategy:**
- Retriever: 2 retries with 200ms exponential backoff (explicitly allowed to fail)
- Generator: 1 retry (expensive operation, fast-fail preferred)
- Validator: No retry (timeout acceptable per constraints)
- **Data size assumptions:**
- Max 10 documents per retrieval (10 × 5KB = 50KB transfer)
- Document batch processing acceptable (network efficiency)
- Response chunks if exceeds 2000 chars (streaming not assumed available)
- **Concurrency:** Single request processing (no parallel agent calls unless stated)
- **Network:** Reliable internal network, <10ms latency between agents
- **Security:** Internal trusted zone (authentication handled at Router entry point)
**Confidence in assumptions: 7/10**
*(Would improve with: failure rate SLAs, cost constraints, scale targets)*
---
## 1) Connection Map
| Sender | Receiver | Purpose | Sync/Async | Timeout (ms) | Retry | Idempotent? | Priority |
|-----------|-----------|----------------------------------|------------|--------------|------------|-------------|----------|
| User | Router | Submit query | Sync | 3000 | None | Yes | High |
| Router | Retriever | Fetch relevant documents | Sync | 1000 | 2×200ms | Yes | High |
| Retriever | Generator | Provide context for generation | Sync | 1500 | 1×300ms | Yes | High |
| Generator | Validator | Validate generated response | Sync | 800 | None | Yes | Medium |
| Validator | Router | Return validation result | Sync | 100 | None | Yes | High |
| Router | User | Deliver final response | Sync | N/A | None | Yes | High |
**Notes:**
- All connections idempotent to support safe retries
- Retriever failure → Router falls back to "no documents" mode
- Validator timeout → Router accepts Generator output (degraded mode)
---
## 2) Message Schemas (Top 3 Flows)
### Flow A: Router → Retriever (Query Expansion)
**Message Type:** request
```json
{
"required": {
"correlation_id": "uuid-v4",
"query": "string (1-500 chars, sanitized)",
"timestamp": "ISO8601",
"max_documents": "integer (1-10)"
},
"optional": {
"filters": {
"date_range": "ISO8601 pair",
"source_types": "array<string>",
"min_relevance": "float (0.0-1.0)"
},
"retrieval_strategy": "enum: semantic | keyword | hybrid (default: semantic)"
}
}
```
**Validation Rules:**
- Query must not contain SQL injection patterns (sanitized at Router)
- max_documents capped at 10 (cost control)
- filters.min_relevance defaults to 0.6 if unspecified
- timestamp must be within 5 seconds of current time (prevent replay)
**Versioning:** v1 (backward compatible until v2 adds multi-modal support)
---
### Flow B: Retriever → Generator (Context Delivery)
**Message Type:** response
```json
{
"required": {
"correlation_id": "uuid-v4 (matches request)",
"status": "enum: success | partial_failure | failure",
"documents": "array<Document> (0-10 items)"
},
"optional": {
"metadata": {
"retrieval_time_ms": "integer",
"total_candidates": "integer",
"fallback_used": "boolean"
},
"warnings": "array<string>"
}
}
```
**Document Schema:**
```json
{
"doc_id": "string (unique)",
"content": "string (<5KB)",
"relevance_score": "float (0.0-1.0)",
"source": "string",
"chunk_index": "integer (for multi-chunk docs)"
}
```
**Validation Rules:**
- Empty documents array allowed (retrieval failure handled downstream)
- Content must be UTF-8 encoded, stripped of markdown/HTML if not requested
- Relevance scores sorted descending
- Total payload <55KB (10 docs × 5KB + metadata overhead)
**Versioning:** v1 (v2 will add embedding vectors for re-ranking)
---
### Flow C: Generator → Validator (Quality Check)
**Message Type:** request
```json
{
"required": {
"correlation_id": "uuid-v4",
"response_text": "string (1-2000 chars)",
"source_documents": "array<string> (doc_ids from context)",
"generation_metadata": {
"model": "string",
"temperature": "float (0.0-2.0)",
"prompt_tokens": "integer",
"completion_tokens": "integer"
}
},
"optional": {
"validation_rules": "array<string> (default: [factuality, toxicity, relevance])",
"strict_mode": "boolean (default: false)"
}
}
```
**Validation Rules:**
- response_text truncated at 2000 chars (hard limit)
- Must include at least 1 source_document if documents were retrieved
- strict_mode=true → validator blocks questionable outputs (vs. flag-and-pass)
**Versioning:** v1 (v2 adds citation verification)
---
## 3) Communication Patterns
### Message Delivery
- **Transport:** Direct HTTP/2 calls (internal service mesh)
- **Queue vs Direct:** Direct synchronous (sub-3s latency requirement incompatible with queue overhead)
- **Fallback:** Router maintains persistent connections; uses local cache on Retriever failure
### Correlation & Tracing
- **correlation_id:** UUIDv4 generated at Router, propagated through all hops
- **Distributed tracing:** OpenTelemetry span per agent call (parent span = correlation_id)
- **Logging:** Structured JSON logs with correlation_id for debuggability
### Backpressure & Ordering
- **Backpressure:** Router implements token bucket (100 req/sec), returns 429 with Retry-After
- **Ordering:** Not required (stateless request processing)
- **Concurrency:** Generator limited to 5 parallel requests (LLM API rate limits)
### Batching
- Retriever can batch multiple document fetches internally (not exposed to Router)
- Generator does NOT batch (single query = single LLM call for context coherence)
---
## 4) State & Ownership
### Single Writer Ownership
| State | Owner | Mutation Pattern | Read Access |
|------------------------|-----------|----------------------------|---------------------|
| Query context | Router | Write once per request | All downstream |
| Document cache | Retriever | Write on cache miss | Retriever only |
| Generated response | Generator | Write once | Validator, Router |
| Validation results | Validator | Write once | Router only |
### Shared State & Synchronization
- **No shared mutable state** between agents (stateless design)
- **Request-scoped state:** Passed via message payloads only
- **Caching:** Retriever maintains local LRU cache (10k entries, 1hr TTL) - invalidation via cache key versioning
### Conflict Resolution
- **N/A** - No concurrent writes to shared state
- **Retry conflicts:** Last-write-wins if duplicate correlation_id detected (idempotency key)
### Session Management
- **Stateless:** Each request independent (no conversation history)
- **Future:** If adding conversation state, Router owns session store (Redis), passes session_id downstream
---
## 5) Error Contracts
### Standard Error Envelope
```json
{
"error": {
"code": "string (see codes below)",
"message": "string (human-readable)",
"correlation_id": "uuid-v4",
"timestamp": "ISO8601",
"details": {
"agent": "string (failing agent name)",
"retryable": "boolean",
"retry_after_ms": "integer (if retryable)"
}
}
}
```
### Error Codes
| Code | Agent | Meaning | Retry? | User Action |
|-------------------------|-----------|--------------------------------------|--------|---------------------------------|
| `QUERY_INVALID` | Router | Query validation failed | No | Fix query format |
| `RETRIEVAL_TIMEOUT` | Retriever | Document fetch exceeded 1000ms | Yes | Automatic (2 retries) |
| `RETRIEVAL_FAILED` | Retriever | All retrieval attempts exhausted | No | Fallback to no-context mode |
| `GENERATION_TIMEOUT` | Generator | LLM call exceeded 1500ms | Yes | Automatic (1 retry) |
| `GENERATION_FAILED` | Generator | LLM API error or quota exceeded | No | Return error to user |
| `VALIDATION_TIMEOUT` | Validator | Validation exceeded 800ms | No | Accept unvalidated response |
| `VALIDATION_REJECTED` | Validator | Response failed quality checks | No | Regenerate or return with warning|
| `SYSTEM_OVERLOAD` | Router | Rate limit exceeded | Yes | Backoff 1-5 seconds |
### Retry Logic
```python
# Exponential backoff with jitter
def retry_with_backoff(func, max_retries, base_delay_ms):
for attempt in range(max_retries):
try:
return func()
except RetryableError as e:
if attempt == max_retries - 1:
raise
delay = base_delay_ms * (2 ** attempt) + random(0, 100)
sleep(delay / 1000)
```
### Circuit Breakers
- **Retriever:** Open circuit after 10 consecutive failures (60s cooldown)
- **Generator:** Open circuit after 5 failures in 30s window (120s cooldown)
- **Behavior when open:** Router immediately returns cached/fallback response
### Graceful Degradation Paths
1. **Retriever fails** → Generator uses query only (no documents)
2. **Validator times out** → Return Generator output with `unvalidated: true` flag
3. **Generator fails** → Return templated response: "Unable to generate answer, please try again"
4. **All agents fail** → Return cached response (if available) or generic error
---
## 6) Sample Payloads (Top 3 Risks)
### Risk 1: Retriever Returns Empty Documents (Legitimate)
**Router → Retriever:**
```json
{
"correlation_id": "a1b2c3d4-e5f6-4789-g0h1-i2j3k4l5m6n7",
"query": "quantum entanglement in coffee makers",
"timestamp": "2025-12-18T10:30:00Z",
"max_documents": 10
}
```
**Retriever → Generator:**
```json
{
"correlation_id": "a1b2c3d4-e5f6-4789-g0h1-i2j3k4l5m6n7",
"status": "success",
"documents": [],
"metadata": {
"retrieval_time_ms": 450,
"total_candidates": 0,
"fallback_used": false
},
"warnings": ["No relevant documents found for query"]
}
```
**Mitigation:** Generator detects empty array, generates response without grounding (with disclaimer).
---
### Risk 2: Generator Produces Oversized Response
**Generator → Validator (INVALID):**
```json
{
"correlation_id": "x9y8z7w6-v5u4-3210-t2s3-r4q5p6o7n8m9",
"response_text": "[2500 characters of text]...", // EXCEEDS 2000 LIMIT
"source_documents": ["doc_123", "doc_456"],
"generation_metadata": {
"model": "claude-sonnet-4",
"temperature": 0.7,
"prompt_tokens": 1200,
"completion_tokens": 650
}
}
```
**Validator Response:**
```json
{
"error": {
"code": "RESPONSE_TOO_LONG",
"message": "Response exceeds 2000 character limit (2500 chars)",
"correlation_id": "x9y8z7w6-v5u4-3210-t2s3-r4q5p6o7n8m9",
"details": {
"agent": "Validator",
"retryable": false
}
}
}
```
**Mitigation:** Generator implements pre-validation truncation at 1950 chars (safety margin).
---
### Risk 3: Cascading Timeouts (Worst Case)
**Timeline:**
- T+0ms: User → Router (query received)
- T+1050ms: Router → Retriever timeout (1st attempt fails)
- T+1300ms: Retriever retry timeout (2nd attempt fails)
- T+1550ms: Retriever retry timeout (3rd attempt fails, give up)
- T+1551ms: Router → Generator (empty documents fallback)
- T+3100ms: Generator → Validator timeout (1st attempt fails)
- T+3500ms: Generator retry timeout (2nd attempt fails)
- **T+3501ms: Router returns unvalidated response to User**
**Router → User (Degraded Success):**
```json
{
"correlation_id": "cascade-fail-example",
"response_text": "Based on your query, here's what I know...",
"metadata": {
"latency_ms": 3501,
"degraded": true,
"warnings": [
"Document retrieval failed after 3 attempts",
"Response not validated (timeout)"
],
"sources_used": []
}
}
```
**Mitigation:** p95 latency breached but within acceptable range. Consider async validation in future.
---
## 7) Risks & Anti-Patterns
### Tight Coupling Risks
- ❌ **Generator depends on Retriever document format:** If Retriever changes schema, Generator breaks
- ✅ **Mitigation:** Schema versioning + backward compatibility contract (Router transforms if needed)
- ❌ **Validator requires specific Generator model metadata:** Hard-codes model assumptions
- ✅ **Mitigation:** Validator uses model-agnostic heuristics (toxicity, factuality) rather than model-specific rules
### Single Points of Failure (SPOFs)
- ⚠️ **Router:** All requests flow through Router (unavoidable for orchestration, but...)
- ✅ **Mitigation:** Deploy 3+ Router replicas behind load balancer
- ⚠️ **Generator:** LLM API is external dependency
- ✅ **Mitigation:** Multi-model fallback (primary: Claude, fallback: GPT-4, last resort: cached responses)
### Schema Drift Risks
- ⚠️ **Unversioned schemas:** Teams independently update message formats
- ✅ **Mitigation:** Enforce versioning in `Content-Type: application/vnd.rag.v1+json`
- ⚠️ **Optional fields become required:** Breaking change disguised as minor update
- ✅ **Mitigation:** Schema registry with automated compatibility checks (Protobuf/Avro-style evolution rules)
### Performance Anti-Patterns
- ❌ **Chatty communication:** Router makes N calls to Retriever for N documents
- ✅ **Current design:** Batched single call retrieving up to 10 documents
- ❌ **Synchronous blocking:** Each agent waits idly for downstream response
- ⚠️ **Partially mitigated:** Consider async patterns if adding multi-query support
### Data Consistency Issues
- ❌ **Stale cache:** Retriever returns outdated documents
- ✅ **Mitigation:** TTL-based expiration (1hr) + versioned cache keys
### Debugging & Observability Gaps
- ❌ **Lost correlation_id:** Impossible to trace requests across agents
- ✅ **Mitigation:** Enforced propagation via middleware + automated log aggregation
---
## 8) Confidence Ratings
| Decision | Confidence (1–10) | To Raise Confidence |
|-------------------------------------------|-------------------|-------------------------------------------------------------------------------------|
| Timeout values (1000/1500/800ms) | 7 | Load testing with realistic document sizes; measure p50/p95/p99 latencies |
| Retriever retry strategy (2×200ms) | 8 | Analyze historical failure patterns; tune backoff based on error types |
| 10-document limit | 6 | Benchmark Generator performance with varying context sizes (5/10/20 docs) |
| Direct HTTP vs message queue | 9 | Validate assumption that <3s latency rules out queues (test Kafka/SQS overhead) |
| Error code taxonomy | 7 | Collect metrics on error frequency; add codes for common failure modes |
| No conversation state (stateless) | 6 | Clarify if follow-up queries needed; impacts Router design significantly |
| Validator timeout acceptable | 5 | **Low confidence** - Quantify risk: % of responses that fail validation? |
| Circuit breaker thresholds (10/5 failures)| 6 | Simulate cascading failures; tune thresholds to balance availability vs. quality |
| Schema versioning strategy | 8 | Define rollout plan for v1→v2 (blue-green? canary?); test backward compatibility |
| Generator LLM fallback strategy | 5 | **Low confidence** - Need cost analysis & quality degradation metrics |
### Priority Improvements (Ordered by Impact)
1. **Load test entire pipeline** → Validate timeout assumptions (confidence: 7→9)
2. **Quantify Validator timeout risk** → Determine if degraded mode acceptable (confidence: 5→8)
3. **Benchmark document limit** → Optimize retrieval count vs. quality (confidence: 6→8)
4. **Define LLM fallback SLA** → Cost-quality tradeoff analysis (confidence: 5→8)
---
## VERIFICATION CHECKLIST ✅
- [x] Every connection has timeout + retry specified (see Connection Map)
- [x] Schemas include JSON examples with validation rules
- [x] Error envelope defined with standard codes
- [x] SPOFs identified (Router, Generator LLM API) with mitigations
- [x] Idempotency marked for all operations
- [x] State ownership explicitly assigned
- [x] Correlation strategy defined (UUIDv4 + OpenTelemetry)
- [x] Degradation paths documented (3 scenarios)
- [x] Sample payloads for risky edge cases provided
**Overall Design Confidence: 7.5/10**
*Ready for prototype implementation. Schedule load testing sprint after initial build.*
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-5-SONNET
Designs robust communication contracts between AI agents. Outputs:
connection map (sync/async/timeouts), message schemas (fields + validation),
error contracts, state ownership rules, and sample payloads. Flags coupling
risks and single points of failure. Perfect for teams building RAG systems,
autonomous workflows, or any multi-agent architecture. Prevents cascade
failures before production.
✅ Schema definitions
✅ Retry + timeout logic
✅ PII/security patterns
✅ SPOF warnings
...more
Added over 1 month ago
