Prompt Details

Model

(claude-4-5-sonnet)

Token size

2,372

Example input

[TASK_FLOW]: 1. Parse query (extract keywords, intent) 2. Retrieve from Source A (vector DB, depends on 1) 3. Retrieve from Source B (knowledge graph, depends on 1) 4. Retrieve from Source C (search API, depends on 1) 5. Aggregate results (merge + dedupe, depends on 2,3,4) 6. Generate response (LLM with context, depends on 5) 7. Validate citations (fact-check, depends on 6) 8. Format output (markdown, depends on 7) Notes: Steps 2, 3, 4 are fully independent (can run in parallel) [PERFORMANCE]: - Latency target: p95 <4s end-to-end - Cost constraint: Minimize LLM calls (only 1 generation pass) - Concurrency: Support 100 simultaneous requests [RELIABILITY]: - SLA: 99% uptime - Failure tolerance: Can proceed if 2 of 3 retrievers succeed - Partial results: Acceptable (flag to user) [CURRENT_APPROACH]: Sequential retrieval: Parse → A (wait) → B (wait) → C (wait) → Aggregate → Generate → Validate → Format Pain points: - Average latency: 5.2s (exceeds target) - p95 latency: 6.8s (very slow) - Single retriever timeout blocks entire pipeline

Example output

# Multi-Agent Orchestration Analysis: RAG Pipeline Optimization ## 0) Feasibility Check **Critical Path Minimum:** ~2,800ms (Parse → Parallel Retrieve [slowest] → Aggregate → Generate → Validate → Format) **Performance Target:** 4,000ms (p95) **Gap:** +1,200ms buffer (achievable ✅) **Breakdown Assumptions:** - Parse: 50ms - Parallel Retrieve (max of A/B/C): 1,500ms (assuming slowest retriever) - Aggregate: 200ms - Generate (LLM): 800ms - Validate: 150ms - Format: 100ms **Status:** Requirements are achievable with proper parallelization. The 1.2s buffer accommodates variance and network latency. --- ## 1) Dependency Analysis **Dependency Map:** ``` [1. Parse Query] | ┌───────┼───────┐ ↓ ↓ ↓ [2. Get A] [3. Get B] [4. Get C] ← PARALLEL (no dependencies) └───────┼───────┘ ↓ [5. Aggregate] ↓ [6. Generate] ↓ [7. Validate] ↓ [8. Format] ``` **Critical Path:** 1 → (2||3||4) → 5 → 6 → 7 → 8 **Estimated Total:** 2,800ms (with parallelization) **Parallelizable Steps:** Steps 2, 3, 4 (retrievers) **Potential Time Savings:** If sequential retrievers average 500ms each → saves ~1,000ms by parallelizing **Gating Steps:** - Step 1 (Parse): Gates all retrieval operations - Step 5 (Aggregate): Gates downstream generation pipeline - Step 6 (Generate): Most expensive operation, gates validation **Failure Points (Risk Assessment):** - **Step 2/3/4 (Retrievers)**: Impact = Medium | Mitigation = Allow 2/3 success, timeout wrappers - **Step 5 (Aggregate)**: Impact = High | Mitigation = Handle empty results gracefully - **Step 6 (Generate)**: Impact = Critical | Mitigation = Retry with exponential backoff, circuit breaker - **Step 7 (Validate)**: Impact = Low | Mitigation = Skip if timeout, proceed with warning flag - **Step 1/8 (Parse/Format)**: Impact = Low | Mitigation = Fast operations, minimal failure risk --- ## 2) Pattern Evaluation (Scored for THIS Case) ### Sequential Pipeline **Fits Because:** N/A - clearly inappropriate given parallel retrieval opportunity **Doesn't Fit Because:** Current approach causing 6.8s p95 latency; retrieval steps are independent and sequential execution wastes time **Latency Impact:** 3,500-4,000ms (optimistic) to 6,800ms (current reality) **Failure Handling:** Single point of failure - any retriever timeout blocks entire pipeline **Complexity:** Low (1/3) **Pattern Score: 2/10** (only avoids over-engineering, but fails all key requirements) --- ### Parallel Fan-out/Fan-in **Fits Because:** - Steps 2/3/4 are independent, perfect for concurrent execution - Aggregation step naturally acts as fan-in point - Meets 2/3 retriever tolerance requirement cleanly - Simple timeout + partial result handling **Doesn't Fit Because:** N/A - this is the ideal pattern **Latency Impact:** 2,800-3,200ms estimated (meets <4s target comfortably) **Failure Handling:** - Wait for best 2 of 3 retrievers (timeout after 2s per retriever) - Aggregate partial results with flag to user - Continue pipeline even if 1 retriever fails **Complexity:** Medium (2/3) - requires Promise.all/race logic, timeout wrappers **Pattern Score: 9/10** (loses 1 point for moderate aggregation complexity vs. simpler patterns) --- ### Hierarchical Manager/Workers **Fits Because:** Could model as Manager (orchestrator) → Workers (retrievers) **Doesn't Fit Because:** - Overkill for this simple dependency structure - Manager adds unnecessary coordination overhead - No need for work distribution logic (tasks are fixed, not dynamic) - Doesn't improve on fan-out pattern for this specific flow **Latency Impact:** 3,000-3,500ms (similar to fan-out but with manager overhead) **Failure Handling:** Manager retry logic adds latency; worker isolation is same as fan-out **Complexity:** High (3/3) - manager state machine, worker health tracking **Pattern Score: 5/10** (functional but over-engineered; adds complexity without benefit) --- ### Event-Driven Reactive **Fits Because:** Could emit events on parse completion → retrievers listen → aggregate on 3 results **Doesn't Fit Because:** - Adds latency (event queue overhead) - Request-response pattern better for user-facing API - Complexity of event infrastructure not justified - Harder to implement timeout guarantees - Doesn't align with synchronous user expectation **Latency Impact:** 3,200-4,000ms (event overhead + eventual consistency delays) **Failure Handling:** Event loss risk, complex compensation logic for partial results **Complexity:** High (3/3) - event bus, subscribers, dead letter queue **Pattern Score: 3/10** (architecturally interesting but mismatched to use case) --- ## 3) Recommendation **Primary Pattern:** **Parallel Fan-out/Fan-in** with sequential post-processing **Justification:** - ✅ **Meets PERFORMANCE:** Reduces latency from 6.8s → ~3s p95 (46% improvement), well under 4s target - ✅ **Meets RELIABILITY:** Built-in 2/3 retriever tolerance, explicit timeout handling per retriever - ✅ **Addresses CURRENT_APPROACH pain:** Eliminates sequential blocking; single retriever timeout no longer catastrophic - ⚠️ **Trade-off accepted:** Slightly more complex aggregation logic (checking which retrievers succeeded), but complexity is isolated and testable **Implementation Details:** **Timeout Rules:** - **Parse Query:** 100ms (rationale: simple regex/NLP parsing, should be <50ms but buffer for safety) - **Retriever A/B/C (each):** 2,000ms (rationale: allows for network latency + DB query; fails fast if unresponsive) - **Aggregate:** 500ms (rationale: merge + dedupe operations, should be <200ms with buffer) - **Generate (LLM):** 1,500ms (rationale: typical GPT-4 response time at p95) - **Validate:** 300ms (rationale: fact-check calls, may involve external API) - **Format:** 200ms (rationale: markdown conversion, minimal logic) **Aggregation Rules:** - **Wait for:** First 2 of 3 retrievers to complete OR 2,000ms timeout (whichever comes first) - **Minimum viable result:** At least 1 retriever success (2 preferred; if only 1 succeeds, flag to user as "limited results") - **Early termination:** If 2 retrievers succeed before 2s, don't wait for 3rd **Degradation Policy:** - **If 1 retriever fails:** Continue with 2 sources, add metadata: `{"sources": 2, "missing": ["Source X"]}` - **If 2 retrievers fail:** Continue with 1 source, prominent user warning: "⚠️ Limited results - 2 sources unavailable" - **If all 3 retrievers fail:** Return cached result (if available) OR user-friendly error: "Unable to retrieve information. Please try again." - **If Generate (LLM) fails:** Retry once with exponential backoff (wait 500ms), then return aggregated results without generated summary - **If Validate times out:** Skip validation, proceed with response + flag: `{"validated": false}` **State Management:** - **Parse agent** owns: Extracted keywords, intent classification - **Each retriever** owns: Independent result set (no shared state) - **Aggregator** owns: Merged results, deduplication cache (in-memory per request) - **Shared state:** None (stateless request/response model) --- ## 4) Trade-off Matrix (Score 1-5, 5=best for THIS use case) | Pattern | Latency | Cost | Reliability | Complexity | Overall Fit | |---------|---------|------|-------------|------------|-------------| | Sequential | 1 | 5 | 1 | 5 | 2.2 | | Parallel | 5 | 5 | 5 | 4 | 4.8 | | Hierarchical | 4 | 4 | 4 | 2 | 3.3 | | Event-Driven | 3 | 3 | 3 | 1 | 2.3 | **Scoring Rationale:** - **Parallel (4.8/5):** High on latency (parallelization cuts 50% time), cost (no extra LLM calls), reliability (graceful degradation). Slightly lower on complexity (timeout orchestration) but still reasonable. - **Hierarchical (3.3/5):** Decent latency/reliability but lower on complexity (overhead doesn't add value) and cost (manager coordination uses resources). - **Sequential (2.2/5):** Maximal simplicity/cost but fails latency and reliability requirements catastrophically. - **Event-Driven (2.3/5):** Poor fit across all dimensions for synchronous user-facing API. --- ## 5) Migration Plan **Current State:** Sequential retrieval with blocking operations: Parse → A → B → C → Aggregate → Generate → Validate → Format **Pain Points:** - 6.8s p95 latency (70% over target) - Single retriever timeout blocks entire pipeline - 30% of requests exceed 5s (poor UX) --- ### **Phase 1: Quick Wins (Week 1)** - ☐ **Change 1:** Add timeout wrapper to each retriever (2s limit) → Expected gain: Eliminate 10%+ hanging requests - ☐ **Change 2:** Implement aggregation logic to handle missing retrievers → Expected gain: Pipeline continues even if 1 source fails - ☐ **Change 3:** Add monitoring for per-retriever latency (p50/p95/p99) → Expected gain: Data-driven optimization in Phase 2 **Risk:** Low - these are defensive changes that don't alter core flow **Rollback:** Remove timeout wrappers; revert to "fail if any retriever fails" logic **Validation Checkpoint 1:** - **Measure:** % of requests completing within 5s - **Target:** 85% (up from current 70%) - **If below target:** Reduce timeout to 1.5s, investigate slowest retriever --- ### **Phase 2: Core Pattern Implementation (Week 2-3)** - ☐ **Step 1:** Replace sequential retrieval with `Promise.allSettled([A, B, C])` to execute in parallel - ☐ **Step 2:** Implement "wait for 2/3" logic using `Promise.race` + manual tracking - ☐ **Step 3:** Update aggregation to merge results from succeeded retrievers only (dedupe logic remains same) - ☐ **Step 4:** Add response metadata: `{"sources_used": 3, "latency_ms": 2800}` **Risk:** Medium - changes core orchestration logic **Rollback:** Feature flag to switch back to sequential mode; deploy with 10% traffic first (canary) **Validation Checkpoint 2:** - **Load test:** 100 concurrent requests - **Measure p95 latency:** Expect <3.5s (4s target with buffer) - **Measure error rate:** Expect <1% (99% SLA) - **Measure 2/3 retriever scenario:** Ensure pipeline completes successfully --- ### **Phase 3: Optimization & Resilience (Week 4+)** - ☐ **Fine-tune timeouts:** Based on Phase 2 data, adjust per-retriever timeout (may reduce to 1.5s if safe) - ☐ **Add circuit breaker:** For each retriever (open after 5 consecutive failures, retry after 30s) - ☐ **Monitoring dashboard:** Real-time metrics: latency histogram, retriever success rates, partial result frequency - ☐ **Retry logic:** For LLM generation step (1 retry with 500ms backoff) - ☐ **Caching layer:** Cache Parse results for identical queries (5-minute TTL) **Success Metrics:** - **Latency:** 6.8s p95 → 3.2s p95 (53% improvement) - **Reliability:** Current ~97% → 99.5% (account for partial results as success) - **Cost:** Neutral (no additional LLM calls; parallel retrievers use same infrastructure) --- ## 6) Confidence Assessment **Overall Design Confidence: 8/10** | Decision | Confidence (1-10) | To Raise Confidence, Need: | |----------|-------------------|----------------------------| | Pattern choice | 9 | N/A - clear fit for parallel retrieval | | Timeout values | 6 | Production timing data (p50/p95/p99) for each retriever under load | | Aggregation rules | 7 | Failure rate statistics: How often do retrievers fail individually? | | Cost estimate | 8 | Token usage logs to confirm LLM generation is only expense | | Migration risk | 7 | Staging environment test with 100 concurrent requests | **Biggest Assumption:** Each retriever completes within 2s at p95 under concurrent load (not validated yet) **Biggest Risk:** If slowest retriever is >2.5s p95, even parallel execution may not hit 4s target **Next Steps to De-risk:** 1. **Benchmark each retriever latency independently** under 100 concurrent requests (simulate fan-out load) 2. **Measure current failure rates** for each retriever over 7 days (determine if 2/3 tolerance is sufficient) 3. **Load test aggregation step** with 1,000-result merge to confirm 500ms timeout is adequate --- ## VERIFICATION ✅ - ✅ Critical path identified with timing estimate (2,800ms) - ✅ All 4 patterns evaluated with scores 1-10 - ✅ Recommended pattern has timeout + aggregation + degradation rules - ✅ Migration plan has 3 phases with validation checkpoints - ✅ Trade-off matrix shows scoring rationale - ✅ Confidence <8 items flagged with specific "need" (3 items flagged) - ✅ Requirements feasibility confirmed in section 0 (1.2s buffer available)

🌀 Claude

Multiagent Orchestration

Name: Multiagent Orchestration Claude Prompt
Brand: PromptBase
Price: 29.99 USD
Availability: InStock
Author: monna

@monna

$29.99

What do I get when I buy a prompt?

Add to Cart

Instant access

Commercial use

Money‑back

By purchasing this prompt, you agree to our terms of service

CLAUDE-4-5-SONNET

Selects the best orchestration pattern (sequential, parallel, hierarchical, event-driven, or hybrid) based on YOUR dependencies, latency targets, and failure tolerance. Outputs: dependency map with critical path, pattern scoring matrix (1-5), recommended approach with timeout/aggregation rules, and a 3-phase migration plan. Perfect for RAG systems, tool-using agents, or any multi-step AI workflow. ✅ Critical path analysis ✅ Pattern trade-offs ✅ Migration roadmap ✅ Confidence ratings

...more

Added over 1 month ago

Browse Marketplace