PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-5-sonnet)
Token size
698
Example input
[AGENTS]: Router, Retriever, Generator, Validator [WORKFLOW]: User → Router → Retriever → Generator → Validator → User [DATA_TYPES]: Query (≤500 chars), Documents (<5KB each), Response (≤2000 chars) [CONSTRAINTS]: p95 latency <3s, Retriever may fail, Validator timeout acceptable
Example output
# Multi-Agent RAG Communication Design ## 0) ASSUMPTIONS (if constraints incomplete) - **Timeout defaults:** - Router→Retriever: 1000ms (retrieval is I/O bound, should be fast) - Retriever→Generator: 1500ms (document processing + LLM context preparation) - Generator→Validator: 800ms (validation is rule-based, fast) - Overall system timeout: 2800ms (stays within p95 <3s with buffer) - **Retry strategy:** - Retriever: 2 retries with 200ms exponential backoff (explicitly allowed to fail) - Generator: 1 retry (expensive operation, fast-fail preferred) - Validator: No retry (timeout acceptable per constraints) - **Data size assumptions:** - Max 10 documents per retrieval (10 × 5KB = 50KB transfer) - Document batch processing acceptable (network efficiency) - Response chunks if exceeds 2000 chars (streaming not assumed available) - **Concurrency:** Single request processing (no parallel agent calls unless stated) - **Network:** Reliable internal network, <10ms latency between agents - **Security:** Internal trusted zone (authentication handled at Router entry point) **Confidence in assumptions: 7/10** *(Would improve with: failure rate SLAs, cost constraints, scale targets)* --- ## 1) Connection Map | Sender | Receiver | Purpose | Sync/Async | Timeout (ms) | Retry | Idempotent? | Priority | |-----------|-----------|----------------------------------|------------|--------------|------------|-------------|----------| | User | Router | Submit query | Sync | 3000 | None | Yes | High | | Router | Retriever | Fetch relevant documents | Sync | 1000 | 2×200ms | Yes | High | | Retriever | Generator | Provide context for generation | Sync | 1500 | 1×300ms | Yes | High | | Generator | Validator | Validate generated response | Sync | 800 | None | Yes | Medium | | Validator | Router | Return validation result | Sync | 100 | None | Yes | High | | Router | User | Deliver final response | Sync | N/A | None | Yes | High | **Notes:** - All connections idempotent to support safe retries - Retriever failure → Router falls back to "no documents" mode - Validator timeout → Router accepts Generator output (degraded mode) --- ## 2) Message Schemas (Top 3 Flows) ### Flow A: Router → Retriever (Query Expansion) **Message Type:** request ```json { "required": { "correlation_id": "uuid-v4", "query": "string (1-500 chars, sanitized)", "timestamp": "ISO8601", "max_documents": "integer (1-10)" }, "optional": { "filters": { "date_range": "ISO8601 pair", "source_types": "array<string>", "min_relevance": "float (0.0-1.0)" }, "retrieval_strategy": "enum: semantic | keyword | hybrid (default: semantic)" } } ``` **Validation Rules:** - Query must not contain SQL injection patterns (sanitized at Router) - max_documents capped at 10 (cost control) - filters.min_relevance defaults to 0.6 if unspecified - timestamp must be within 5 seconds of current time (prevent replay) **Versioning:** v1 (backward compatible until v2 adds multi-modal support) --- ### Flow B: Retriever → Generator (Context Delivery) **Message Type:** response ```json { "required": { "correlation_id": "uuid-v4 (matches request)", "status": "enum: success | partial_failure | failure", "documents": "array<Document> (0-10 items)" }, "optional": { "metadata": { "retrieval_time_ms": "integer", "total_candidates": "integer", "fallback_used": "boolean" }, "warnings": "array<string>" } } ``` **Document Schema:** ```json { "doc_id": "string (unique)", "content": "string (<5KB)", "relevance_score": "float (0.0-1.0)", "source": "string", "chunk_index": "integer (for multi-chunk docs)" } ``` **Validation Rules:** - Empty documents array allowed (retrieval failure handled downstream) - Content must be UTF-8 encoded, stripped of markdown/HTML if not requested - Relevance scores sorted descending - Total payload <55KB (10 docs × 5KB + metadata overhead) **Versioning:** v1 (v2 will add embedding vectors for re-ranking) --- ### Flow C: Generator → Validator (Quality Check) **Message Type:** request ```json { "required": { "correlation_id": "uuid-v4", "response_text": "string (1-2000 chars)", "source_documents": "array<string> (doc_ids from context)", "generation_metadata": { "model": "string", "temperature": "float (0.0-2.0)", "prompt_tokens": "integer", "completion_tokens": "integer" } }, "optional": { "validation_rules": "array<string> (default: [factuality, toxicity, relevance])", "strict_mode": "boolean (default: false)" } } ``` **Validation Rules:** - response_text truncated at 2000 chars (hard limit) - Must include at least 1 source_document if documents were retrieved - strict_mode=true → validator blocks questionable outputs (vs. flag-and-pass) **Versioning:** v1 (v2 adds citation verification) --- ## 3) Communication Patterns ### Message Delivery - **Transport:** Direct HTTP/2 calls (internal service mesh) - **Queue vs Direct:** Direct synchronous (sub-3s latency requirement incompatible with queue overhead) - **Fallback:** Router maintains persistent connections; uses local cache on Retriever failure ### Correlation & Tracing - **correlation_id:** UUIDv4 generated at Router, propagated through all hops - **Distributed tracing:** OpenTelemetry span per agent call (parent span = correlation_id) - **Logging:** Structured JSON logs with correlation_id for debuggability ### Backpressure & Ordering - **Backpressure:** Router implements token bucket (100 req/sec), returns 429 with Retry-After - **Ordering:** Not required (stateless request processing) - **Concurrency:** Generator limited to 5 parallel requests (LLM API rate limits) ### Batching - Retriever can batch multiple document fetches internally (not exposed to Router) - Generator does NOT batch (single query = single LLM call for context coherence) --- ## 4) State & Ownership ### Single Writer Ownership | State | Owner | Mutation Pattern | Read Access | |------------------------|-----------|----------------------------|---------------------| | Query context | Router | Write once per request | All downstream | | Document cache | Retriever | Write on cache miss | Retriever only | | Generated response | Generator | Write once | Validator, Router | | Validation results | Validator | Write once | Router only | ### Shared State & Synchronization - **No shared mutable state** between agents (stateless design) - **Request-scoped state:** Passed via message payloads only - **Caching:** Retriever maintains local LRU cache (10k entries, 1hr TTL) - invalidation via cache key versioning ### Conflict Resolution - **N/A** - No concurrent writes to shared state - **Retry conflicts:** Last-write-wins if duplicate correlation_id detected (idempotency key) ### Session Management - **Stateless:** Each request independent (no conversation history) - **Future:** If adding conversation state, Router owns session store (Redis), passes session_id downstream --- ## 5) Error Contracts ### Standard Error Envelope ```json { "error": { "code": "string (see codes below)", "message": "string (human-readable)", "correlation_id": "uuid-v4", "timestamp": "ISO8601", "details": { "agent": "string (failing agent name)", "retryable": "boolean", "retry_after_ms": "integer (if retryable)" } } } ``` ### Error Codes | Code | Agent | Meaning | Retry? | User Action | |-------------------------|-----------|--------------------------------------|--------|---------------------------------| | `QUERY_INVALID` | Router | Query validation failed | No | Fix query format | | `RETRIEVAL_TIMEOUT` | Retriever | Document fetch exceeded 1000ms | Yes | Automatic (2 retries) | | `RETRIEVAL_FAILED` | Retriever | All retrieval attempts exhausted | No | Fallback to no-context mode | | `GENERATION_TIMEOUT` | Generator | LLM call exceeded 1500ms | Yes | Automatic (1 retry) | | `GENERATION_FAILED` | Generator | LLM API error or quota exceeded | No | Return error to user | | `VALIDATION_TIMEOUT` | Validator | Validation exceeded 800ms | No | Accept unvalidated response | | `VALIDATION_REJECTED` | Validator | Response failed quality checks | No | Regenerate or return with warning| | `SYSTEM_OVERLOAD` | Router | Rate limit exceeded | Yes | Backoff 1-5 seconds | ### Retry Logic ```python # Exponential backoff with jitter def retry_with_backoff(func, max_retries, base_delay_ms): for attempt in range(max_retries): try: return func() except RetryableError as e: if attempt == max_retries - 1: raise delay = base_delay_ms * (2 ** attempt) + random(0, 100) sleep(delay / 1000) ``` ### Circuit Breakers - **Retriever:** Open circuit after 10 consecutive failures (60s cooldown) - **Generator:** Open circuit after 5 failures in 30s window (120s cooldown) - **Behavior when open:** Router immediately returns cached/fallback response ### Graceful Degradation Paths 1. **Retriever fails** → Generator uses query only (no documents) 2. **Validator times out** → Return Generator output with `unvalidated: true` flag 3. **Generator fails** → Return templated response: "Unable to generate answer, please try again" 4. **All agents fail** → Return cached response (if available) or generic error --- ## 6) Sample Payloads (Top 3 Risks) ### Risk 1: Retriever Returns Empty Documents (Legitimate) **Router → Retriever:** ```json { "correlation_id": "a1b2c3d4-e5f6-4789-g0h1-i2j3k4l5m6n7", "query": "quantum entanglement in coffee makers", "timestamp": "2025-12-18T10:30:00Z", "max_documents": 10 } ``` **Retriever → Generator:** ```json { "correlation_id": "a1b2c3d4-e5f6-4789-g0h1-i2j3k4l5m6n7", "status": "success", "documents": [], "metadata": { "retrieval_time_ms": 450, "total_candidates": 0, "fallback_used": false }, "warnings": ["No relevant documents found for query"] } ``` **Mitigation:** Generator detects empty array, generates response without grounding (with disclaimer). --- ### Risk 2: Generator Produces Oversized Response **Generator → Validator (INVALID):** ```json { "correlation_id": "x9y8z7w6-v5u4-3210-t2s3-r4q5p6o7n8m9", "response_text": "[2500 characters of text]...", // EXCEEDS 2000 LIMIT "source_documents": ["doc_123", "doc_456"], "generation_metadata": { "model": "claude-sonnet-4", "temperature": 0.7, "prompt_tokens": 1200, "completion_tokens": 650 } } ``` **Validator Response:** ```json { "error": { "code": "RESPONSE_TOO_LONG", "message": "Response exceeds 2000 character limit (2500 chars)", "correlation_id": "x9y8z7w6-v5u4-3210-t2s3-r4q5p6o7n8m9", "details": { "agent": "Validator", "retryable": false } } } ``` **Mitigation:** Generator implements pre-validation truncation at 1950 chars (safety margin). --- ### Risk 3: Cascading Timeouts (Worst Case) **Timeline:** - T+0ms: User → Router (query received) - T+1050ms: Router → Retriever timeout (1st attempt fails) - T+1300ms: Retriever retry timeout (2nd attempt fails) - T+1550ms: Retriever retry timeout (3rd attempt fails, give up) - T+1551ms: Router → Generator (empty documents fallback) - T+3100ms: Generator → Validator timeout (1st attempt fails) - T+3500ms: Generator retry timeout (2nd attempt fails) - **T+3501ms: Router returns unvalidated response to User** **Router → User (Degraded Success):** ```json { "correlation_id": "cascade-fail-example", "response_text": "Based on your query, here's what I know...", "metadata": { "latency_ms": 3501, "degraded": true, "warnings": [ "Document retrieval failed after 3 attempts", "Response not validated (timeout)" ], "sources_used": [] } } ``` **Mitigation:** p95 latency breached but within acceptable range. Consider async validation in future. --- ## 7) Risks & Anti-Patterns ### Tight Coupling Risks - ❌ **Generator depends on Retriever document format:** If Retriever changes schema, Generator breaks - ✅ **Mitigation:** Schema versioning + backward compatibility contract (Router transforms if needed) - ❌ **Validator requires specific Generator model metadata:** Hard-codes model assumptions - ✅ **Mitigation:** Validator uses model-agnostic heuristics (toxicity, factuality) rather than model-specific rules ### Single Points of Failure (SPOFs) - ⚠️ **Router:** All requests flow through Router (unavoidable for orchestration, but...) - ✅ **Mitigation:** Deploy 3+ Router replicas behind load balancer - ⚠️ **Generator:** LLM API is external dependency - ✅ **Mitigation:** Multi-model fallback (primary: Claude, fallback: GPT-4, last resort: cached responses) ### Schema Drift Risks - ⚠️ **Unversioned schemas:** Teams independently update message formats - ✅ **Mitigation:** Enforce versioning in `Content-Type: application/vnd.rag.v1+json` - ⚠️ **Optional fields become required:** Breaking change disguised as minor update - ✅ **Mitigation:** Schema registry with automated compatibility checks (Protobuf/Avro-style evolution rules) ### Performance Anti-Patterns - ❌ **Chatty communication:** Router makes N calls to Retriever for N documents - ✅ **Current design:** Batched single call retrieving up to 10 documents - ❌ **Synchronous blocking:** Each agent waits idly for downstream response - ⚠️ **Partially mitigated:** Consider async patterns if adding multi-query support ### Data Consistency Issues - ❌ **Stale cache:** Retriever returns outdated documents - ✅ **Mitigation:** TTL-based expiration (1hr) + versioned cache keys ### Debugging & Observability Gaps - ❌ **Lost correlation_id:** Impossible to trace requests across agents - ✅ **Mitigation:** Enforced propagation via middleware + automated log aggregation --- ## 8) Confidence Ratings | Decision | Confidence (1–10) | To Raise Confidence | |-------------------------------------------|-------------------|-------------------------------------------------------------------------------------| | Timeout values (1000/1500/800ms) | 7 | Load testing with realistic document sizes; measure p50/p95/p99 latencies | | Retriever retry strategy (2×200ms) | 8 | Analyze historical failure patterns; tune backoff based on error types | | 10-document limit | 6 | Benchmark Generator performance with varying context sizes (5/10/20 docs) | | Direct HTTP vs message queue | 9 | Validate assumption that <3s latency rules out queues (test Kafka/SQS overhead) | | Error code taxonomy | 7 | Collect metrics on error frequency; add codes for common failure modes | | No conversation state (stateless) | 6 | Clarify if follow-up queries needed; impacts Router design significantly | | Validator timeout acceptable | 5 | **Low confidence** - Quantify risk: % of responses that fail validation? | | Circuit breaker thresholds (10/5 failures)| 6 | Simulate cascading failures; tune thresholds to balance availability vs. quality | | Schema versioning strategy | 8 | Define rollout plan for v1→v2 (blue-green? canary?); test backward compatibility | | Generator LLM fallback strategy | 5 | **Low confidence** - Need cost analysis & quality degradation metrics | ### Priority Improvements (Ordered by Impact) 1. **Load test entire pipeline** → Validate timeout assumptions (confidence: 7→9) 2. **Quantify Validator timeout risk** → Determine if degraded mode acceptable (confidence: 5→8) 3. **Benchmark document limit** → Optimize retrieval count vs. quality (confidence: 6→8) 4. **Define LLM fallback SLA** → Cost-quality tradeoff analysis (confidence: 5→8) --- ## VERIFICATION CHECKLIST ✅ - [x] Every connection has timeout + retry specified (see Connection Map) - [x] Schemas include JSON examples with validation rules - [x] Error envelope defined with standard codes - [x] SPOFs identified (Router, Generator LLM API) with mitigations - [x] Idempotency marked for all operations - [x] State ownership explicitly assigned - [x] Correlation strategy defined (UUIDv4 + OpenTelemetry) - [x] Degradation paths documented (3 scenarios) - [x] Sample payloads for risky edge cases provided **Overall Design Confidence: 7.5/10** *Ready for prototype implementation. Schedule load testing sprint after initial build.*
🌀 Claude

Multiagent Communication Designer

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-5-SONNET
Tested icon
Guide icon
4 examples icon
Free credits icon
Designs robust communication contracts between AI agents. Outputs: connection map (sync/async/timeouts), message schemas (fields + validation), error contracts, state ownership rules, and sample payloads. Flags coupling risks and single points of failure. Perfect for teams building RAG systems, autonomous workflows, or any multi-agent architecture. Prevents cascade failures before production. ✅ Schema definitions ✅ Retry + timeout logic ✅ PII/security patterns ✅ SPOF warnings
...more
Added over 1 month ago
Report
Browse Marketplace