Prompt Details
Model
(claude-4-6-sonnet)
Token size
999
Example input
[WORKFLOW]: Claude Code on a 40-file repo, runs tests every prompt, full test output goes back into chat
[MONTHLY_SPEND]: $180/mo
[PAIN_POINT]: tests
Example output
Good. Current pricing confirmed. Now computing the audit against the workflow.
---
**WORKFLOW CLASS:** Coding Assistant (Agentic loop variant) · **AUDIT CONFIDENCE: MEDIUM**
*(spend is known; per-prompt token breakdown is not — sizes inferred from class norms)*
Pricing basis used: Sonnet 4.6 at $3.00 input / $15.00 output per million tokens (assumed model; adjust if Opus).
---
```
═══ TOKEN LEAK MAP ═══
```
---
**▌LEAK #1 — L2 RAW DUMPS (Test Output)**
**▌SEVERITY: CRITICAL**
**▌EXPOSURE: CONFIRMED (partial — spend known, token split estimated)**
**▌IMPACT:** A full pytest/jest run on a 40-file repo typically produces 8,000–25,000 tokens of raw output (stack traces, pass/fail lines, timing data, deprecation warnings). At ~15,000 tokens per run, 20 prompts/day, 30 days, Sonnet 4.6 input rate:
```
15,000 tokens × 20 runs/day × 30 days = 9,000,000 input tokens
9M × $3.00/MTok = $27.00/month in test output alone
```
Of your $180/mo, this is plausibly **15–25% of total spend** on content the model mostly ignores.
**▌WHERE:** The raw test runner stdout being pasted wholesale into the chat context after every prompt.
**▌WHY IT BURNS:** Every passing test line is a paid input token — you're billing Claude to read "✓ UserService › should return 404" hundreds of times per session.
**▌FIX:** Pipe test output through a pre-processor before it hits the context: failures only, with file + line + error message, stripping timing rows, deprecation warnings, and all passing tests. A 10-line shell wrapper (`pytest --tb=short -q 2>&1 | grep -E "FAILED|ERROR|assert"`) cuts this to 200–800 tokens in most runs.
**▌MEASURE:** Log `input_tokens` per API call before and after the filter. Target: test output block drops below 1,000 tokens per run.
---
**▌LEAK #2 — L1 REPEATED CONTEXT (Full Repo Codebase in Context)**
**▌SEVERITY: HIGH**
**▌EXPOSURE: ESTIMATED**
**▌IMPACT:** A 40-file repo fed into context each turn (common in Claude Code's auto-add behavior) at ~300 tokens/file = ~12,000 tokens of stable code per prompt. Formula:
```
12,000 tokens × {UNKNOWN prompts/day} × 30 days × $3.00/MTok = ${UNKNOWN}
```
If 20 prompts/day: 12,000 × 20 × 30 = 7,200,000 tokens → **$21.60/mo** for files that haven't changed.
**▌WHERE:** Auto-included file context — files added to conversation that are unchanged between turns.
**▌WHY IT BURNS:** Unchanged source files are re-tokenized and re-billed on every single turn even when the prompt is "fix the typo in line 42."
**▌FIX:** Enable prompt caching on the stable file block. Prompt caching cuts input costs by up to 90%. Alternatively, scope context aggressively: only include files touched by the last git diff, not the full repo.
**▌MEASURE:** Check Claude Code's token counter (visible in the UI) — note the "files in context" token count vs. "prompt + test output" token count per session.
---
**▌LEAK #3 — L4 UNCACHED LOOKUPS (Stable System Prompt / Instructions)**
**▌SEVERITY: HIGH**
**▌EXPOSURE: ESTIMATED**
**▌IMPACT:** Claude Code's system prompt (coding rules, repo conventions, tool descriptions) is typically 2,000–6,000 tokens, re-sent on every API call. Formula:
```
4,000 tokens × {UNKNOWN prompts/day} × 30 days × $3.00/MTok = ${UNKNOWN}
```
At 20 prompts/day: **$7.20/mo** from a block that never changes.
**▌WHERE:** The system prompt / CLAUDE.md instructions block.
**▌WHY IT BURNS:** Static instructions have a 0% novelty rate but a 100% billing rate without caching.
**▌FIX:** Add a `cache_control: {"type": "ephemeral"}` breakpoint after the system prompt block. The prompt caching overhead on Sonnet 4.6 is approximately 497 tokens per request — negligible against a 4,000-token system prompt. Cache writes cost 1.25× once; subsequent reads cost 10% of standard input price.
**▌MEASURE:** Look for `cache_read_input_tokens` in API response metadata. If it's 0 on repeat calls, caching is not active.
---
**▌LEAK #4 — L3 OVERSIZED OUTPUT (Verbose Code + Commentary)**
**▌SEVERITY: MODERATE**
**▌EXPOSURE: SPECULATIVE**
**▌IMPACT:** Cannot calculate — no output token data provided.
**▌WHERE:** Claude Code responses that re-emit entire files when only a function changed, or that produce lengthy "here's what I changed and why" narration before and after diffs.
**▌WHY IT BURNS:** Output tokens cost 5× input tokens — a 1,000-token explanation costs as much as 5,000 tokens of re-read context.
**▌FIX:** Add to CLAUDE.md: "Return only the changed functions/lines as unified diffs. Do not re-emit unchanged code. Suppress change summaries unless explicitly asked."
**▌MEASURE:** Track `output_tokens` per call in API logs. Flag any response over 2,000 tokens and check whether the full file was returned.
---
**▌LEAK #5 — L5B REDUNDANT INSTRUCTIONS (Test-Run Trigger Instructions)**
**▌SEVERITY: MODERATE**
**▌EXPOSURE: SPECULATIVE**
**▌IMPACT:** Cannot calculate — prompt text not provided.
**▌WHERE:** Any inline instructions telling Claude to "run tests after every change" that appear in the system prompt AND are repeated in individual user turns.
**▌WHY IT BURNS:** If "always run tests and show output" appears in both CLAUDE.md and in each prompt, you're paying to instruct the same behavior twice every call.
**▌FIX:** Audit CLAUDE.md vs. your typical prompt text. Remove any instruction from the prompt that already exists verbatim in the system prompt.
**▌MEASURE:** Diff your system prompt against your 10 most recent user prompts. Count repeated instruction phrases.
---
```
═══ VERDICT ═══
```
**TOP FIX:** L2 — filter test output to failures only before it enters context. This is the pain point you named, it's the most concretely calculable leak, and a shell one-liner fixes it this afternoon. Estimated reduction: **$20–$40/mo** on a $180 base.
**QUICK WIN:** Enable prompt caching on the system prompt / CLAUDE.md block (Leak #3). It takes under 15 minutes if you have API access, and the 90% read discount applies from the very next call. At 20+ prompts/day the payback is immediate.
**NOT A LEAK:** Running tests on every prompt. This *feels* expensive and is the stated pain point — but the waste is in *how the output is returned to context*, not in running the tests themselves. The test execution is the right behavior; the raw dump is the leak. Don't suppress tests; suppress noise.
$19.99-50%
$10.00
SPECIAL PROMOTION — Offer ends 13 June
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-6-SONNET
✅ Finds the 6 classic waste patterns: repeated context, raw log dumps, oversized outputs, uncached lookups, prompt bloat, retrieval overfetch
✅ Ranks by monthly cost exposure (waste × frequency), not raw token size — the way real AI FinOps teams prioritize
✅ Classifies your workflow first (chatbot / RAG / agent / coding) so the audit targets the right leak types
✅ Gives a specific fix per leak, plus a confidence rating — no fabricated savings numbers
...more
Added 12 hours ago
