Prompt Details

Model

(claude-4-6-sonnet)

Token size

999

Example input

[WORKFLOW]: Claude Code on a 40-file repo, runs tests every prompt, full test output goes back into chat [MONTHLY_SPEND]: $180/mo [PAIN_POINT]: tests

Example output

Good. Current pricing confirmed. Now computing the audit against the workflow. --- **WORKFLOW CLASS:** Coding Assistant (Agentic loop variant) · **AUDIT CONFIDENCE: MEDIUM** *(spend is known; per-prompt token breakdown is not — sizes inferred from class norms)* Pricing basis used: Sonnet 4.6 at $3.00 input / $15.00 output per million tokens (assumed model; adjust if Opus). --- ``` ═══ TOKEN LEAK MAP ═══ ``` --- **▌LEAK #1 — L2 RAW DUMPS (Test Output)** **▌SEVERITY: CRITICAL** **▌EXPOSURE: CONFIRMED (partial — spend known, token split estimated)** **▌IMPACT:** A full pytest/jest run on a 40-file repo typically produces 8,000–25,000 tokens of raw output (stack traces, pass/fail lines, timing data, deprecation warnings). At ~15,000 tokens per run, 20 prompts/day, 30 days, Sonnet 4.6 input rate: ``` 15,000 tokens × 20 runs/day × 30 days = 9,000,000 input tokens 9M × $3.00/MTok = $27.00/month in test output alone ``` Of your $180/mo, this is plausibly **15–25% of total spend** on content the model mostly ignores. **▌WHERE:** The raw test runner stdout being pasted wholesale into the chat context after every prompt. **▌WHY IT BURNS:** Every passing test line is a paid input token — you're billing Claude to read "✓ UserService › should return 404" hundreds of times per session. **▌FIX:** Pipe test output through a pre-processor before it hits the context: failures only, with file + line + error message, stripping timing rows, deprecation warnings, and all passing tests. A 10-line shell wrapper (`pytest --tb=short -q 2>&1 | grep -E "FAILED|ERROR|assert"`) cuts this to 200–800 tokens in most runs. **▌MEASURE:** Log `input_tokens` per API call before and after the filter. Target: test output block drops below 1,000 tokens per run. --- **▌LEAK #2 — L1 REPEATED CONTEXT (Full Repo Codebase in Context)** **▌SEVERITY: HIGH** **▌EXPOSURE: ESTIMATED** **▌IMPACT:** A 40-file repo fed into context each turn (common in Claude Code's auto-add behavior) at ~300 tokens/file = ~12,000 tokens of stable code per prompt. Formula: ``` 12,000 tokens × {UNKNOWN prompts/day} × 30 days × $3.00/MTok = ${UNKNOWN} ``` If 20 prompts/day: 12,000 × 20 × 30 = 7,200,000 tokens → **$21.60/mo** for files that haven't changed. **▌WHERE:** Auto-included file context — files added to conversation that are unchanged between turns. **▌WHY IT BURNS:** Unchanged source files are re-tokenized and re-billed on every single turn even when the prompt is "fix the typo in line 42." **▌FIX:** Enable prompt caching on the stable file block. Prompt caching cuts input costs by up to 90%. Alternatively, scope context aggressively: only include files touched by the last git diff, not the full repo. **▌MEASURE:** Check Claude Code's token counter (visible in the UI) — note the "files in context" token count vs. "prompt + test output" token count per session. --- **▌LEAK #3 — L4 UNCACHED LOOKUPS (Stable System Prompt / Instructions)** **▌SEVERITY: HIGH** **▌EXPOSURE: ESTIMATED** **▌IMPACT:** Claude Code's system prompt (coding rules, repo conventions, tool descriptions) is typically 2,000–6,000 tokens, re-sent on every API call. Formula: ``` 4,000 tokens × {UNKNOWN prompts/day} × 30 days × $3.00/MTok = ${UNKNOWN} ``` At 20 prompts/day: **$7.20/mo** from a block that never changes. **▌WHERE:** The system prompt / CLAUDE.md instructions block. **▌WHY IT BURNS:** Static instructions have a 0% novelty rate but a 100% billing rate without caching. **▌FIX:** Add a `cache_control: {"type": "ephemeral"}` breakpoint after the system prompt block. The prompt caching overhead on Sonnet 4.6 is approximately 497 tokens per request — negligible against a 4,000-token system prompt. Cache writes cost 1.25× once; subsequent reads cost 10% of standard input price. **▌MEASURE:** Look for `cache_read_input_tokens` in API response metadata. If it's 0 on repeat calls, caching is not active. --- **▌LEAK #4 — L3 OVERSIZED OUTPUT (Verbose Code + Commentary)** **▌SEVERITY: MODERATE** **▌EXPOSURE: SPECULATIVE** **▌IMPACT:** Cannot calculate — no output token data provided. **▌WHERE:** Claude Code responses that re-emit entire files when only a function changed, or that produce lengthy "here's what I changed and why" narration before and after diffs. **▌WHY IT BURNS:** Output tokens cost 5× input tokens — a 1,000-token explanation costs as much as 5,000 tokens of re-read context. **▌FIX:** Add to CLAUDE.md: "Return only the changed functions/lines as unified diffs. Do not re-emit unchanged code. Suppress change summaries unless explicitly asked." **▌MEASURE:** Track `output_tokens` per call in API logs. Flag any response over 2,000 tokens and check whether the full file was returned. --- **▌LEAK #5 — L5B REDUNDANT INSTRUCTIONS (Test-Run Trigger Instructions)** **▌SEVERITY: MODERATE** **▌EXPOSURE: SPECULATIVE** **▌IMPACT:** Cannot calculate — prompt text not provided. **▌WHERE:** Any inline instructions telling Claude to "run tests after every change" that appear in the system prompt AND are repeated in individual user turns. **▌WHY IT BURNS:** If "always run tests and show output" appears in both CLAUDE.md and in each prompt, you're paying to instruct the same behavior twice every call. **▌FIX:** Audit CLAUDE.md vs. your typical prompt text. Remove any instruction from the prompt that already exists verbatim in the system prompt. **▌MEASURE:** Diff your system prompt against your 10 most recent user prompts. Count repeated instruction phrases. --- ``` ═══ VERDICT ═══ ``` **TOP FIX:** L2 — filter test output to failures only before it enters context. This is the pain point you named, it's the most concretely calculable leak, and a shell one-liner fixes it this afternoon. Estimated reduction: **$20–$40/mo** on a $180 base. **QUICK WIN:** Enable prompt caching on the system prompt / CLAUDE.md block (Leak #3). It takes under 15 minutes if you have API access, and the 90% read discount applies from the very next call. At 20+ prompts/day the payback is immediate. **NOT A LEAK:** Running tests on every prompt. This *feels* expensive and is the stated pain point — but the waste is in *how the output is returned to context*, not in running the tests themselves. The test execution is the right behavior; the raw dump is the leak. Don't suppress tests; suppress noise.

🌀 Claude

Cost Leak Detector

Name: Cost Leak Detector Claude Prompt
Brand: PromptBase
Price: 19.99 USD
Availability: InStock
Rating: 5 (1 reviews)
Author: monna

@monna

$19.99

What do I get when I buy a prompt?

Add to Cart

Instant access

Commercial use

Money‑back

By purchasing this prompt, you agree to our terms of service

CLAUDE-4-6-SONNET

✅ Finds the 6 classic waste patterns: repeated context, raw log dumps, oversized outputs, uncached lookups, prompt bloat, retrieval overfetch ✅ Ranks by monthly cost exposure (waste × frequency), not raw token size — the way real AI FinOps teams prioritize ✅ Classifies your workflow first (chatbot / RAG / agent / coding) so the audit targets the right leak types ✅ Gives a specific fix per leak, plus a confidence rating — no fabricated savings numbers

...more

Added over 1 month ago

Reviews for this prompt (1)

Browse Marketplace