Prompt Details
Model
(claude-4-6-sonnet)
Token size
3,509
Example input
[SYSTEM_DESCRIPTION]: Proprietary credit scoring model from third‑party vendor. Outputs: approval/rejection + score 0–1000. No architecture disclosed. Only high‑level marketing docs available (treated as VENDOR‑CLAIM).
[OBSERVABLE_INPUTS]: age (int), income (float), employment_type (categorical), credit_history_years (float), existing_debt_ratio (float).
[OBSERVABLE_OUTPUTS]: decision (approve/reject), score (0–1000), no confidence probabilities, no feature attributions.
[ANOMALY_OR_AUDIT_TRIGGER]: Application A (score 720) rejected; application B (score 690) approved — same employment_type and income, similar credit_history_years, different existing_debt_ratio.
Example output
## BLACK BOX AUDIT REPORT
---
### Executive Summary
The system under audit is a proprietary third-party credit scoring model that produces an approval/rejection decision alongside a numeric score (0–1000), with no architecture disclosure and no feature-level attribution. The system exhibits **high opacity** across all Lipton dimensions: no internal mechanism is observable, and vendor documentation is limited to marketing materials. The primary audit trigger — an apparent score-decision inversion where a higher-scoring applicant was rejected while a lower-scoring applicant was approved — constitutes a meaningful anomaly requiring direct explanation, as it suggests the score and the decision boundary may not share a monotonic relationship, or that `existing_debt_ratio` operates as a hard-rule threshold independent of aggregate score.
---
### 0. Signal Sufficiency
**SUFFICIENT.**
Five distinct input features are present. An anomaly is provided with two complete input/output pairs and associated score values. Signal inventory meets minimum threshold for full audit. Note: absence of confidence probabilities and feature attributions reduces inference resolution; this is flagged throughout.
---
### 1. Observable Space
**Input Features:**
| Feature | Type | Observed Range / Notes |
|---|---|---|
| `age` | Integer | Not specified; assumed adult (18+) per lending norms |
| `income` | Float | Identical across A and B — controlled variable |
| `employment_type` | Categorical | Identical across A and B — controlled variable |
| `credit_history_years` | Float | "Similar" across A and B — near-controlled, minor variance possible |
| `existing_debt_ratio` | Float | **Differs** between A and B — primary differentiating feature |
**Output Dimensions:**
| Output | Type | Observed Range |
|---|---|---|
| `decision` | Binary (approve/reject) | {approve, reject} |
| `score` | Integer | 0–1000 (full scale undisclosed in behavior) |
**Missing Signal:**
- No confidence or probability scores provided — decision certainty is unobservable.
- No feature attribution or partial scores.
- Score-to-decision mapping rule (cutoff, banding, or model) is undisclosed.
**Confidence/Probability Audit:**
*Not applicable* — no probability outputs are exposed. The integer `score` (0–1000) is present but its relationship to decision is non-transparent and potentially non-monotonic (see anomaly). **The score should be treated as an ordinal signal only, not as a calibrated probability proxy**, until the score-to-decision function is disclosed or reverse-engineered through additional observations.
---
### 2. Opacity Classification
*(Framework: Lipton 2018)*
**Simulatability:** NO
A human cannot trace the transformation from the five input features to the output score and decision for any single application. The scoring function is undisclosed, and marketing documentation does not describe computational steps.
*Evidence: No architecture documentation; anomaly demonstrates that score and decision are not straightforwardly linked.*
**Decomposability:** NO
No sub-components of the model are described or exposed. It is unknown whether the model contains separate scorecards, rule layers, or ensemble components.
*Evidence: Vendor documentation limited to marketing claims; no sub-model outputs available.*
**Algorithmic Transparency:** NO (undisclosed)
{ASSUMPTION — TYPE: VENDOR-CLAIM} The vendor provides only high-level marketing materials. Training algorithm, feature weighting methodology, cutoff logic, and model update cadence are entirely undisclosed.
*Evidence: System description states "no architecture disclosed."*
**Temporal Transparency** *(MONNA-Analysis-2026):* NO
It is unknown whether this model version is current, when it was last updated, or whether score distributions have drifted. No model version identifier or changelog is available.
*Evidence: Marketing-only documentation; no versioning disclosed.*
**Opacity Profile: {Simulatability: NO | Decomposability: NO | Algorithmic Transparency: NO | Temporal Transparency: NO}**
**This system is fully opaque** because no internal mechanism, sub-component, or algorithm is documented, and the observable anomaly demonstrates that even the score output does not straightforwardly predict the decision output — indicating either a non-monotonic decision boundary, a secondary rule layer, or a threshold that operates independently of aggregate score.
---
### 3. Interpretability Assessment
**Selected Method: Behavioral Boundary Testing** *(MONNA-Analysis-2026)*
**Justification:** The system's Opacity Profile is fully opaque across all four dimensions (NO/NO/NO/NO). No model internals, attention weights, gradients, or activations are accessible. The system can be queried as a black-box API (applications can be submitted and decisions observed). With two confirmed input/output pairs sharing controlled variables (`income`, `employment_type`, near-identical `credit_history_years`), systematic perturbation of `existing_debt_ratio` — and secondarily `age` and `credit_history_years` — can map the decision boundary without any internal access.
**Why alternatives are excluded:**
- **SHAP:** Requires model or surrogate access; unavailable.
- **LIME:** Requires ability to submit perturbed inputs programmatically at scale — feasible if API access is granted, but not yet confirmed; elevate to LIME if query access is established.
- **Attention Probing:** Requires transformer architecture with exposed attention weights — no architectural evidence supports this.
- **Grad-CAM:** Vision-based method; not applicable to tabular credit data.
---
### 4. Inferred Properties
---
{ASSUMPTION 1 — TYPE: SYSTEM | Evidence Strength: HIGH}
**Evidence:** Application A scores 720, Application B scores 690. A is rejected; B is approved. Income, employment_type, and credit_history_years are controlled or near-controlled. `existing_debt_ratio` is the sole confirmed differentiating variable.
**Inferred property:** The decision boundary is **not purely a monotonic function of the aggregate score**. Either (a) a hard rule or threshold on `existing_debt_ratio` overrides the aggregate score, or (b) the decision and score are produced by independent sub-models that can diverge.
**If wrong:**
- Impact: Audit conclusion invalid; the anomaly's cause would remain unidentified, creating compliance risk under fair lending obligations.
- Scope: Entire feature class — any applicant whose `existing_debt_ratio` approaches an undisclosed threshold could be misclassified relative to their score.
- Detail: If the score and decision are produced by the same monotonic function, then some other undisclosed input feature (not in the observable set) must be driving the divergence — which is itself a compliance risk.
**Validation:** Submit 10 synthetic applications holding all features constant except `existing_debt_ratio`, varied in uniform steps from 0.1 to 0.9. Observe: (a) at what ratio value the decision flips approve → reject, and (b) whether the score also shifts at that boundary or continues on a different trajectory.
---
{ASSUMPTION 2 — TYPE: SYSTEM | Evidence Strength: MEDIUM}
**Evidence:** The anomaly is fully explained by a single differing variable (`existing_debt_ratio`) given controlled comparators. The sharpness of this divergence (approve vs. reject with a score *advantage* to the rejected applicant) is consistent with threshold-based rule logic rather than smooth probabilistic weighting.
**Inferred property:** The system likely contains at **least one hard-rule or cutoff layer** (e.g., debt-ratio cap) applied post-scoring, which can override the aggregate score. This is common in regulatory-compliant credit systems that enforce bright-line exclusion criteria.
**If wrong:**
- Impact: Deployment decision unsafe — if no hard rule exists, the anomaly implies a hidden feature or model non-determinism, both of which require immediate escalation.
- Scope: System-wide — any hard cutoff applies to the entire applicant population, not a single case.
- Detail: Misidentifying the mechanism as a rule layer when it is actually a hidden feature would lead to an incomplete audit and unresolved compliance exposure.
**Validation:** Request vendor documentation on "knockout" or "hard-stop" criteria in the decision pipeline. If unavailable, submit applications with `existing_debt_ratio` at narrow increments around the suspected threshold (e.g., ±0.02 steps) to detect a step-function boundary in decision output.
---
{ASSUMPTION 3 — TYPE: VENDOR-CLAIM | Evidence Strength: LOW}
**Evidence:** Vendor marketing materials imply the model is a "credit scoring" system; the output range 0–1000 is consistent with conventional scorecard conventions (e.g., VantageScore, FICO-adjacent architectures).
**Inferred property:** The score may follow an industry-conventional direction (higher = lower credit risk), but this directional convention is not confirmed by the anomaly — in which A scores higher than B but is rejected.
**If wrong:**
- Impact: Compliance risk — if score direction is inverted or non-standard, all threshold-based monitoring and fair lending analysis built on score ordinal assumptions would be invalid.
- Scope: System-wide.
- Detail: Any audit conclusion using "720 > 690 implies lower risk" as a premise must be suspended until score directionality is confirmed.
**Validation:** Request vendor confirmation of score directionality. Cross-validate by submitting a clearly low-risk synthetic profile and a clearly high-risk synthetic profile; confirm which receives higher score.
---
{ASSUMPTION 4 — TYPE: HYBRID | Evidence Strength: LOW}
*AUTO-HYBRID: Evidence Strength is LOW for a putative SYSTEM inference.*
**Sub-assumption A {SYSTEM}:** The five observable features (`age`, `income`, `employment_type`, `credit_history_years`, `existing_debt_ratio`) are the **complete input set** received by the model — no hidden features are silently ingested via the API.
- Verify with: Vendor technical team / API contract documentation / DBA review of data pipeline.
- If wrong: The anomaly may be explained by an undisclosed sixth feature (e.g., bureau-sourced data, postal code, device fingerprint), making `existing_debt_ratio` a coincidental correlate rather than the causal variable. Audit conclusion would be invalid.
- Validation A: Request full API schema and data dictionary. Review all fields transmitted in the API call, not only those confirmed in the integration spec.
**Sub-assumption B {DATA}:** The two applications' feature values as submitted to this audit are **accurate transcriptions** of what was actually transmitted to the model — no preprocessing, imputation, or encoding step altered the values in transit.
- Verify with: Data engineering / integration owner; review ETL logs.
- If wrong: The true `existing_debt_ratio` values ingested by the model may differ from those observed, meaning the apparent anomaly does not exist in the model's actual input space.
- Validation B: Pull raw API request logs for both applications A and B and compare transmitted payloads against reported feature values.
**Resolution order:** Validate A before B. If the input set is incomplete (A fails), B is irrelevant to root cause analysis.
---
{ASSUMPTION 5 — TYPE: FAMILY-INFERENCE | Evidence Strength: MEDIUM}
**Evidence:** The system is described as a third-party proprietary credit scoring model with a 0–1000 output range and a binary decision output, consistent with the class of ensemble scorecard models (gradient boosting, logistic regression scorecards, or hybrid rule-ML systems) commonly deployed in consumer credit.
**Inferred property:** Models in this family typically assign **non-linear, interaction-sensitive weights** to debt-related features (`existing_debt_ratio`, `credit_history_years`). A high debt ratio may receive disproportionate negative weight relative to other features, which could produce the observed score-decision divergence if the score and decision layers weight features differently.
**If wrong:**
- Impact: Interpretability method selection may be sub-optimal; if the model is a simple linear scorecard, feature weight recovery is more tractable than assumed.
- Scope: Entire feature class.
- Detail: Overestimating model complexity could lead to unnecessarily resource-intensive audit procedures.
**Validation:** Submit edge-case applications with extreme values on each feature independently (all others held at median). A linear model will produce proportional score shifts; a non-linear model will produce disproportionate shifts at extremes. This distinguishes model class without white-box access.
---
### 5. Refused Inferences
**{REFUSED: Cannot determine the functional form or weight assigned to `existing_debt_ratio` within the scoring algorithm}**
- Requires: White-box access to model coefficients, feature importance rankings, or SHAP/LIME output from vendor-side execution.
- Why this matters: Without knowing whether `existing_debt_ratio` enters the model as a continuous weighted feature, a threshold trigger, or an interaction term, it is impossible to determine whether the anomaly reflects intentional design, a calibration error, or a proxy-discrimination pathway. This is the pivotal question for both the anomaly explanation and any fair lending assessment.
**{REFUSED: Cannot determine whether the decision output is generated by the same model component as the score, or by a separate rule engine operating downstream}**
- Requires: Vendor architectural documentation distinguishing scoring sub-model from decision logic; alternatively, a sufficiently dense set of behavioral observations to statistically reverse-engineer whether score and decision diverge systematically or only at specific feature values.
- Why this matters: If score and decision are produced by independent components, the score value is not a reliable proxy for rejection risk, and any SLA, appeal process, or regulatory reporting built on score thresholds is structurally compromised. This finding would also change the interpretability method — LIME applied to the decision output would need to be run independently from any score-explanation effort.
---
### 5.5 Red Flag Summary
**🚩 Non-determinism (unconfirmed, structurally suspected):** Cannot be ruled out. Only two observations are available. If additional submissions of identical inputs produce different outputs, a stochastic or session-dependent element is present. *Requires further testing.*
**🚩 Calibration failure / Score-Decision Inversion (CONFIRMED):** Application A (score 720) is rejected; Application B (score 690) is approved. This is the primary anomaly. The score does not predict the decision in a monotonically consistent direction for these two cases. This means the score **cannot be treated as a reliable ordinal proxy for approval likelihood** without further validation. If the vendor represents the score as an approval-likelihood indicator in client-facing documentation, this constitutes a potential misrepresentation.
**Proxy features:** `age` is present in the input space. Under ECOA and fair lending frameworks, age is a protected-class-adjacent variable. Its role in the scoring function is entirely undisclosed.
*Direct compliance risk: cannot confirm or deny whether `age` receives weight in violation of applicable regulations without white-box access or vendor attestation.*
**Reasoning inconsistency (CONFIRMED as structural):** The decision output contradicts what the score output implies under conventional credit scoring logic (higher score = lower risk = approve). No justification is provided by the system. This is not merely an absence of explanation — it is an affirmative inconsistency between two outputs produced by the same system for the same applicant.
---
### 6. Adversarial Hypotheses
**H1 — Non-malicious technical explanation: Tiered decision architecture**
The vendor's system may use the score for risk ranking but apply a separate, hard-rule decision layer that enforces regulatory or policy-mandated debt-to-income thresholds. Application A's `existing_debt_ratio` may exceed a hard cap (e.g., 0.50 DTI) that disqualifies applicants regardless of aggregate score. Application B falls below that cap and is approved despite a lower score. The score-decision inversion would then be a feature, not a defect — by design.
*Evidence needed to rule out:* Vendor confirmation of hard-stop criteria and their thresholds; submission of synthetic applications with `existing_debt_ratio` at granular increments around the suspected boundary.
**H2 — Data or process explanation outside model control: Input preprocessing divergence**
The feature values reported for this audit may not match what the model actually received. If the integration layer applies different preprocessing to `existing_debt_ratio` for applications A and B (e.g., different denominator in ratio calculation, stale bureau data pull, encoding error), the model may have received meaningfully different inputs than reported — making the apparent anomaly an artifact of the data pipeline, not the model.
*Evidence needed to rule out:* Raw API request logs for both applications, compared byte-for-byte against reported feature values; ETL audit trail from application intake to model API call.
---
### 7. Audit Conclusions
**Primary finding:** The score-decision inversion between Applications A and B is a **confirmed structural anomaly** that cannot be explained by the observable input/output space alone. The most parsimonious hypothesis is a hard-rule decision layer operating on `existing_debt_ratio` independently of the aggregate score — but this cannot be confirmed without vendor disclosure or systematic behavioral boundary testing.
**Confidence in primary finding:** MEDIUM. The controlled-variable structure of the anomaly strongly implicates `existing_debt_ratio` as the causal feature, but the mechanism (threshold rule vs. hidden feature vs. independent model component) is unresolved.
**Recommended next action:**
1. **Immediate:** Request vendor disclosure of (a) whether a separate decision rule layer exists, (b) any hard-stop criteria applied post-scoring, and (c) the complete API input schema.
2. **Short-term:** Execute Behavioral Boundary Testing on `existing_debt_ratio` across 10–15 synthetic applications to empirically locate the decision boundary.
3. **Compliance escalation:** Flag `age` as an undisclosed-weight protected-class-adjacent variable for legal review pending vendor attestation.
4. **Conditional:** If vendor disclosure is refused or incomplete, escalate to regulatory authority as a potential transparency violation under applicable consumer credit law.
---
### 8. Communication Tier
**Technical Lead:**
The observable anomaly — Application A (score 720, rejected) vs. Application B (score 690, approved) with `existing_debt_ratio` as the sole confirmed differentiating variable — is structurally consistent with a two-component architecture in which an aggregate score model and a separate hard-rule decision engine operate in parallel or in sequence, with the latter capable of overriding the former. The score and decision outputs are not monotonically consistent, meaning the score cannot be used as a reliable decision proxy for audit, appeal, or monitoring purposes without vendor confirmation of the score-to-decision mapping function. Priority test: systematic perturbation of `existing_debt_ratio` in 0.05 increments across synthetic applications to locate the decision boundary; concurrent API log review to confirm no hidden input features are present in the transmitted payload.
**Executive/Compliance:**
This system presents **HIGH transparency risk**: a higher-scoring applicant was rejected while a lower-scoring applicant was approved, the score-to-decision logic is entirely undisclosed, and a protected-class-adjacent variable (`age`) has an unknown role in scoring — immediate vendor disclosure request and legal review are required before further reliance on this system for credit decisions.
**Client/Litigation:**
| | |
|---|---|
| **What we can prove** | Application A received a higher score (720) than Application B (690) and was rejected while B was approved; the only confirmed input difference is `existing_debt_ratio`; the system provides no explanation for this divergence. |
| **What we cannot prove** | The internal mechanism causing the inversion; whether `age` influenced the outcome; whether the model is the same version that was in place at time of decision; whether the reported input values match what the model actually received. |
| **What access resolves it** | Vendor disclosure of decision rule architecture and hard-stop criteria; raw API request logs for both applications; model version identifier and changelog; feature weight or importance documentation sufficient to assess `age`'s role in scoring. |
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-6-SONNET
Black Box Auditor™ is an advanced interpretability prompt for Claude that turns raw input/output pairs into an AI audit report. It screens signal sufficiency, builds an opacity profile, analyzes confidence behavior, and recommends one method (SHAP, LIME, Attention, Grad‑CAM, or Behavioral Testing) while tagging every inference and refused conclusion for engineers and data teams.
...more
Added 1 week ago
