PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-6-sonnet)
Token size
2,469
Example input
[USE_CASE]: Legal document review and contract analysis platform for law firms [MODELS_AVAILABLE]: Claude 4.6 Sonnet, GPT5.4, Mistral 8x7B
Example output
Now I'll run the full routing system design based on the prompt in your document, then render it as a beautiful interactive artifact.Now I'll generate the full routing system response. This is a comprehensive design — I'll render the routing matrix and architecture as a visual widget, then provide the full text.Here is the complete routing system design. Click any node in the diagram above to explore a section deeper. --- ## ⚠️ Pre-Flight: Inference Flags Before routing is built, three conflicts must be surfaced: **Flag 1 — Data Residency Risk (CRITICAL for legal):** All three models in the stack are cloud-hosted. Legal document review for law firms almost certainly involves privileged client communications, PII, and potentially regulated data under GDPR or jurisdiction-specific bar rules. Sending this data to OpenAI's or Anthropic's cloud APIs may violate attorney-client privilege protections and data processing agreements. **Mitigation built into this system:** all requests pass through a document classification pre-filter that strips or pseudonymises identifying information before transmission, and the system design flags a compliance review before any client data reaches an external API. **Flag 2 — No Fast Triage Model:** None of the three models are lightweight inference models. For low-complexity tasks (simple clause lookup, document type detection), routing everything to a full-size model adds unnecessary latency and cost. **Mitigation:** Mistral 8x7B is designated the triage/fast lane model for complexity-low tasks; its MoE architecture makes it faster than its parameter count suggests. **Flag 3 — No Fine-Tuned Legal Model:** The stack contains only general-purpose models. Legal reasoning has well-known failure modes in general LLMs (hallucinated case citations, misapplied jurisdiction). **Mitigation:** the system design includes a legal citation verification layer and confidence-gated escalation for any output containing statute references. --- ## Section 1 — Routing System Vision A single-model setup for a legal platform fails in three specific ways: it cannot cost-effectively handle the enormous variance in task complexity between "extract the governing law clause" and "analyse cross-jurisdictional risk across a 200-page merger agreement"; it applies the same latency profile to tasks that need realtime responses (in-session Q&A) and those that tolerate batch processing (overnight contract audits); and it provides no graceful degradation path when a model is unavailable. The routing layer solves this by matching each incoming request to the model best suited to its specific combination of task type, complexity, context length, and cost tolerance — targeting a 45–60% reduction in API spend compared to routing everything to GPT-5.4, while maintaining or improving quality scores on high-stakes tasks by ensuring they always reach the most capable model. This architecture handles up to approximately 50,000 daily requests before requiring horizontal scaling of the classifier microservice. The core routing principle for this use case: **legal precision beats latency**. Unlike a customer support platform where a fast good-enough answer wins, law firms will tolerate 3–5 second responses to get provably correct clause extraction and risk flags. The routing logic is calibrated accordingly. --- ## Section 2 — Task Classification Layer **Task Taxonomy** (derived from legal platform use case): `contract_review` — Full document analysis: identify parties, obligations, term dates, governing law, termination provisions. Example: "Review this SaaS MSA and flag non-standard clauses." Complexity typically medium–high. Context: long (20k–100k tokens). Output: structured JSON + prose. `clause_extraction` — Targeted extraction of specific named provisions from a known document. Example: "Pull all indemnification clauses from this NDA." Complexity low–medium. Context: medium. Output: structured list. `risk_flagging` — Identify provisions that deviate from firm templates or carry unusual liability exposure. Example: "Flag any unlimited liability provisions." Complexity medium–high. Requires comparison to baseline templates. Output: structured + prose. `summarisation_comparison` — Produce plain-English summaries or compare two contract versions. Example: "Summarise the key changes between v1 and v2 of this employment agreement." Complexity medium. Output: prose. `legal_qa_drafting` — Answer specific legal questions or draft new language. Example: "Draft a mutual NDA governing law clause for California." Complexity high for drafting. Output: prose or structured clause. **Complexity Scoring Signals:** - Low: single clause target, document under 5 pages, no cross-document reference, no jurisdiction-specific analysis required - Medium: multi-clause review, 5–30 pages, single jurisdiction, template comparison - High: multi-document analysis, 30+ pages, cross-jurisdictional, novel clause drafting, adversarial risk analysis **Classification Method:** Hybrid — a lightweight intent classifier (fine-tuned DistilBERT or similar, deployed locally) handles task_type assignment, combined with rule-based complexity scoring (page count + keyword signals like "compare", "draft", "jurisdiction"). The local classifier avoids sending document content to an external API before the routing decision is made. **Output Schema:** ```json { "task_type": "contract_review", "complexity": "high", "latency_requirement": "standard", "context_length": 42000, "output_format": "mixed", "jurisdiction_signal": true, "citation_risk": false } ``` --- ## Section 3 — Model Capability Registry **Claude Sonnet 4.6** - Strength: instruction-following precision, structured output reliability, long-context coherence across 200k token windows. Best model in this stack for maintaining legal accuracy across a full MSA without drift. - Weakness: slower than Mistral on simple extraction; not consistently superior to GPT-5.4 on open-ended drafting tasks. - Cost tier: medium (middle of the stack) - Latency: medium (2–4s typical) - Optimal task match: `contract_review` (all complexity), `risk_flagging` (medium–high), any task with context > 30k tokens - Avoidance: do not route `clause_extraction` (low complexity) — wasteful; do not route `legal_qa_drafting` (novel drafting) as primary — GPT-5.4 drafts more fluently. - Failure signature: on very long documents, watch for repetition in the final 20% of output — signal that context compression is degrading; flag when output length drops suddenly relative to input length. **GPT-5.4** - Strength: open-ended generation quality, multi-document synthesis, novel clause drafting. Best choice when the output will be read by a partner rather than parsed by a system. - Weakness: highest cost; can be verbose on structured tasks where JSON precision is needed; latency spikes under load. - Cost tier: high - Latency: medium–slow (3–6s, higher variance) - Optimal task match: `legal_qa_drafting` (drafting mode), `summarisation_comparison` (high stakes), escalation path for any high-complexity task where Claude's confidence score is below threshold - Avoidance: do not route repetitive extraction tasks — cost profile makes it economically indefensible at scale; do not route tasks requiring strict JSON output format without explicit formatting constraints in the prompt. - Failure signature: watch for hallucinated statute citations (e.g. citing a real-sounding but non-existent section number); implement a citation verification step before any GPT-5.4 output is delivered. **Mistral 8x7B** - Strength: fastest inference in this stack, cheapest per token, reliable for structured extraction tasks on well-defined schemas. MoE architecture means activated parameters per token are actually closer to 13B, keeping latency low. - Weakness: weakest reasoning on complex multi-hop legal logic; context window limitations (32k in standard deployment); highest hallucination rate on jurisdiction-specific questions. - Cost tier: low - Latency: fast (<1.5s typical) - Optimal task match: `clause_extraction` (low–medium), `legal_qa_drafting` (simple Q&A mode, not drafting), document triage and pre-classification - Avoidance: never route `risk_flagging` or `contract_review` on documents > 15k tokens — context limit and reasoning quality both fail here; never route any task where `jurisdiction_signal: true` without escalation check. - Failure signature: output format drift on longer prompts — watch for JSON schema violations; confidence degrades rapidly after ~20k input tokens. --- ## Section 4 — Routing Decision Engine **Primary Routing Matrix:** | task_type | complexity | latency_req | Primary Model | Backup | |---|---|---|---|---| | `contract_review` | low | any | Mistral 8x7B | Claude Sonnet 4.6 | | `contract_review` | medium | any | Claude Sonnet 4.6 | GPT-5.4 | | `contract_review` | high | any | Claude Sonnet 4.6 | GPT-5.4 | | `clause_extraction` | low | realtime | Mistral 8x7B | Claude Sonnet 4.6 | | `clause_extraction` | medium | any | Mistral 8x7B | Claude Sonnet 4.6 | | `risk_flagging` | low | any | Claude Sonnet 4.6 | GPT-5.4 | | `risk_flagging` | medium/high | any | Claude Sonnet 4.6 | GPT-5.4 | | `summarisation_comparison` | any | standard | Claude Sonnet 4.6 | GPT-5.4 | | `summarisation_comparison` | high | batch | GPT-5.4 | Claude Sonnet 4.6 | | `legal_qa_drafting` | low | realtime | Mistral 8x7B | Claude Sonnet 4.6 | | `legal_qa_drafting` | high/drafting | any | GPT-5.4 | Claude Sonnet 4.6 | **Cost-Quality Threshold:** A cheaper model is acceptable when: complexity is low AND output_format is structured (not prose) AND the task has been seen before with high confidence from that model (rolling 7-day quality score ≥ 0.88). Quality must win when: `jurisdiction_signal: true`, context_length > 40k, or the requesting attorney has marked the matter as "high stakes" in the platform. **Confidence Scoring:** The router assigns confidence 0.0–1.0 based on: classifier certainty on task_type (weighted 0.4), match quality between task signals and the selected model's known strength profile (weighted 0.4), and historical quality score of that model on the nearest task cluster (weighted 0.2). If confidence < 0.65, the task escalates to the next-tier model automatically. **Decision Output Schema:** ```json { "primary_model": "claude-sonnet-4-6", "backup_model": "gpt-5.4", "routing_reason": "high_complexity_contract_review_long_context", "estimated_cost": 0.048, "estimated_latency": "3-5s", "confidence": 0.87, "compliance_flag": "strip_pii_before_transmission" } ``` --- ## Section 5 — Multi-Step Workflow Routing **Workflow 1: Full Contract Review with Risk Report** This is the most common high-complexity task. It decomposes into four sub-tasks: 1. **Document triage** → Mistral 8x7B: detect document type, page count, jurisdiction signals, extract party names. Fast, cheap, structured output. 2. **Clause inventory** → Mistral 8x7B or Claude Sonnet 4.6 (depending on doc length): extract all named provisions into a structured schema. 3. **Risk analysis** → Claude Sonnet 4.6: compare extracted clauses against firm templates, flag deviations, score each clause for risk severity. 4. **Executive summary** → Claude Sonnet 4.6 (or GPT-5.4 if partner-facing): synthesise findings into a readable risk memo. Context passing: the triage output (document metadata + clause inventory) is passed as a structured prefix to the risk analysis prompt, keeping the full document in context only for step 3 where it's needed. State management: the orchestrator holds a `workflow_state` object with sub-task status, and the clause inventory JSON is the checkpoint — if step 3 fails, the system retries from the last successful inventory, not from scratch. **Workflow 2: Cross-Document Comparison** 1. **Per-document summarisation** → Claude Sonnet 4.6 (run in parallel for each document) 2. **Structured diff generation** → Claude Sonnet 4.6: given two summaries + original documents, identify changed provisions 3. **Change significance ranking** → GPT-5.4: produce a human-readable memo ranking changes by legal significance If step 2 fails (e.g., context overflow), the system falls back to comparing summaries only and flags the output as "summary-level comparison — full diff unavailable." --- ## Section 6 — Cost Optimization Layer **Cost routing rules specific to legal platform task distribution:** Given the estimated distribution (~30% contract review, ~25% clause extraction, ~20% risk flagging, ~15% summarisation, ~10% Q&A/drafting), and routing as defined above, approximately 45–50% of requests route to Mistral 8x7B, 40–45% to Claude Sonnet 4.6, and only 10–15% to GPT-5.4. This yields a blended cost approximately 50% below routing everything to GPT-5.4. **Caching strategy:** Cache key structure: `hash(task_type + complexity + document_fingerprint + prompt_template_version)`. Document fingerprint is a truncated SHA-256 of the document content. Cache TTL: 24 hours for extraction tasks (document content doesn't change), 1 hour for Q&A tasks (prompt variations matter more). Cache invalidation triggers: document version change, prompt template update, or model version change. Do not cache any output where `jurisdiction_signal: true` and the jurisdiction law has a known recent update (maintained via a legal update feed). **Token reduction:** Before sending to Claude Sonnet 4.6 or GPT-5.4, run a document pre-processor that: removes boilerplate header/footer repeated across all pages, deduplicates repeated standard clauses flagged by the clause inventory, and summarises exhibits and schedules unless they were explicitly requested. **Budget enforcement:** Per-request cost cap: $0.25. If estimated cost exceeds cap: first attempt to downgrade one tier (e.g., swap GPT-5.4 for Claude Sonnet 4.6). If downgrade still exceeds cap: queue for batch processing and notify the attorney. Hard reject only if the request comes from a free-tier account. --- ## Section 7 — Performance Monitoring System **KPIs:** 1. **Routing accuracy rate** — measured as the % of routed requests where the assigned model's quality score exceeds the task-type threshold (≥ 0.88 for extraction, ≥ 0.82 for review, ≥ 0.78 for drafting). Measured via automated output scoring + weekly human spot-check. 2. **Blended cost per request** — rolling 24h average across all task types. Alert threshold: > $0.09 average (signals routing is over-escalating to expensive models). 3. **p95 latency per task type** — measured at the API response level. Alert: `contract_review` p95 > 8s, `clause_extraction` p95 > 3s. 4. **Model quality drift score** — rolling 7-day quality score per model per task type. Alert: any model's score drops > 0.05 in a 48h window, indicating model degradation or prompt regression. 5. **Fallback trigger rate** — % of requests that activate backup model. Alert: > 8% triggers a routing policy review. **A/B testing framework:** Shadow routing — 5% of traffic is double-routed (primary model + challenger model) and outputs are compared via automated quality scoring. No user sees the challenger output. New routing policies require 72h of shadow testing before promotion. --- ## Section 8 — Fallback & Redundancy System **Failure detection triggers:** Timeout (> 30s for any model), HTTP 5xx from model API, output confidence score < 0.45 (measured by a lightweight output quality classifier), JSON schema validation failure on structured tasks. **Fallback cascade:** - Level 1 (primary fails): route to `backup_model` from routing decision schema, with a stripped-down prompt if the original was near the context limit. - Level 2 (backup fails): serve from cache if a similar request was answered in the last 24 hours; if not, route to whichever of the remaining two models has lowest current error rate. - Level 3 (all models degraded): return a structured partial response — whatever sub-tasks completed successfully — with an explicit flag that the response is incomplete. Notify the attorney via in-platform alert. - Graceful degradation: for `clause_extraction`, fall back to a regex-based rule system that extracts common clause headers without LLM inference — lower quality but deterministic. **Circuit breaker:** If a model returns errors or sub-threshold quality scores on > 15% of requests in any 10-minute window, it is taken offline from routing for 5 minutes. Auto-recovery: re-admit after a single successful health check probe. **Data residency fallback note:** If a compliance scan detects that a document contains unredacted personally identifiable client information, the request is held and the attorney is asked to confirm transmission consent before any cloud model call is made. This is non-negotiable for legal platforms under GDPR and most bar association data handling guidelines. --- ## Section 9 — Self-Learning Routing Layer **Feedback signals:** Attorney-provided ratings (thumbs up/down + optional tag: "missed clause", "wrong jurisdiction", "format error"), task completion signals (did the attorney use the output or re-run the request?), cost overruns (requests that exceeded the $0.25 cap), and automated quality scorer deltas (divergence between predicted and actual quality). **Routing improvement loop:** Weekly batch re-training of model quality scores per task_type cluster, using the past 7 days of feedback. Routing weights update every Monday at 00:00 UTC with a 10% maximum change cap per cycle (prevents overcorrection from a single bad week). Routing policy changes are versioned (semantic versioning) and stored with their A/B test results. **Model drift detection:** A rolling Wilcoxon signed-rank test compares the current 7-day quality distribution against the prior 30-day baseline for each model × task_type pair. If p < 0.05 with a negative direction, drift is flagged for human review. --- ## Section 10 — Routing System Blueprint **The single most dangerous routing error for this use case:** Routing a high-complexity, multi-jurisdiction contract review to Mistral 8x7B because the classifier underestimates complexity. This produces a confidently-wrong risk analysis that a busy associate may not catch — the legal and reputational consequences for a law firm are severe. Mitigation: any task with `jurisdiction_signal: true` or context_length > 25k is hard-blocked from Mistral regardless of complexity scoring. **Biggest risk in this specific stack:** All three models are cloud-hosted, making every call a potential privilege breach. This risk dwarfs cost and quality concerns. Mitigation: document pre-processor with PII stripping must be deployed before the routing layer, not after. **Self-learning priority (first 90 days):** Focus the improvement loop on the Mistral quality score calibration. The most common routing failure will be Mistral being over-trusted on extraction tasks that are actually medium-complexity. Getting that threshold right generates the highest ROI — it either saves cost (right level of escalation) or prevents quality failures (under-escalation caught). **Implementation sequence:** 1. Deploy the local classifier (no external API calls required — safe to build immediately) 2. Implement PII stripping pre-processor (legal requirement — must exist before any model call) 3. Wire Claude Sonnet 4.6 as the sole model with static routing (simplest working system) 4. Add Mistral for low-complexity extraction tasks (first cost saving) 5. Add GPT-5.4 for drafting tasks and as the high-confidence escalation path 6. Deploy monitoring and fallback logic 7. Enable self-learning loop once 30 days of quality data is accumulated **Architecture recommendation:** Microservices. The classifier, routing engine, PII processor, and model gateway should be separate deployable units. The legal platform's compliance and audit requirements mean you need to be able to swap, update, or independently audit each component — a monolithic embedded router makes that impossible. The gateway layer should log every routing decision with the full decision schema for compliance audit trails.
🌀 Claude

Multillm Routing Architect

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-6-SONNET
Tested icon
Guide icon
4 examples icon
Free credits icon
Intelligent Model Orchestration from 2 Inputs Generates model-specific capability profiles with cost/latency/strength tradeoffs — not generic "use GPT-4 for hard tasks" ✅ Builds multi-step workflow routing for complex tasks that need different models at different stages ✅ Includes fallback logic with circuit breakers, confidence thresholds, and automatic rerouting ✅Outputs routing configuration with built-in monitoring rules — includes guidelines for tracking performance and adjusting routing
...more
Added 1 week ago
Report
Browse Marketplace