PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-6-sonnet)
Token size
1,118
Example input
[USER_QUERY]: What is the difference between the Pro plan and the Enterprise plan? [RETRIEVED_CHUNKS]: Chunk 1: Pro Plan – Includes advanced analytics, priority email support, and up to 10 team members. Chunk 2: Enterprise Plan – Includes unlimited team members, dedicated account manager, SSO authentication, and custom SLAs. Chunk 3: Billing Policy – All plans are billed monthly unless annual billing is selected. [AGENT_RESPONSE]: The Pro plan includes analytics and email support for up to 10 users, while the Enterprise plan offers unlimited users, SSO authentication, and a dedicated account manager. Enterprise plans also include 24/7 phone support. [EXPECTED_BEHAVIOR]: The answer should compare plan features and avoid mentioning features not present in documentation.
Example output
**Overall: 41/50 — pipeline mostly healthy, one clean hallucination to fix.** The retrieval layer did its job well. Chunks 1 and 2 directly cover both sides of the comparison, and the model correctly ignored the billing chunk (Chunk 3) as irrelevant noise. The generated response is also well-structured and nearly complete. The single failure is unambiguous: **"Enterprise plans also include 24/7 phone support"** has no grounding in any retrieved chunk. The model reached into its parametric memory — likely pattern-matching against common enterprise SaaS feature sets — and injected a feature the documentation never mentions. This is exactly the failure mode RAG is supposed to prevent. **The fix is squarely in the generation layer:** 1. Tighten the system prompt with an explicit grounding constraint — something like *"only state facts present in the retrieved context; if unsupported, say so."* 2. Optionally layer in a post-generation NLI check that scores each response sentence against the context before it's served. The retrieval and context setup don't need changes. You could also tighten Chunk 3 out by raising your similarity threshold, but that's a minor quality-of-life improvement, not a correctness fix.
🌀 Claude

Search Quality Fixers

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-6-SONNET
Tested icon
Guide icon
4 examples icon
Free credits icon
Retrieval-Augmented Generation (RAG) systems fail in subtle ways: irrelevant chunks, missing context, hallucinated answers, or poorly grounded responses. When something goes wrong, it’s often unclear whether the problem came from retrieval, context selection, or generation. This prompt acts as a structured RAG audit framework that analyzes a full interaction — query → retrieved chunks → generated answer — and evaluates the pipeline across five critical quality dimensions: Retrieval relevance
...more
Added over 1 month ago
Report
Browse Marketplace