Prompt Details

Model

(claude-4-6-sonnet)

Token size

6,308

Example input

[CORPUS]: From: sarah.jones@acme.com To: it-helpdesk@acme.com Subject: RE: Laptop provisioning request Hi team, following up on ticket #HD-4421. The laptop assigned to the new hire has not been configured with VPN client or endpoint protection software. The device was delivered without the standard build image applied. Per our onboarding policy, all devices must pass the endpoint compliance check before being handed to the employee. Best regards, Sarah Jones | IT Operations | Acme Corp sarah.jones@acme.com | +1 (212) 555-0198 This email and any attachments are confidential. If you are not the intended recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication. --- From: it-helpdesk@acme.com To: sarah.jones@acme.com Subject: RE: RE: Laptop provisioning request Hi Sarah, the device was imaged but the VPN client deployment failed due to a Group Policy conflict on the user's OU. We are re-running the SCCM deployment task. Endpoint protection pushed successfully. Expected resolution: end of business today. Best regards, IT Helpdesk | Acme Corp it-helpdesk@acme.com | +1 (212) 555-0100 This email and any attachments are confidential. If you are not the intended recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication. [DOMAIN]: IT endpoint management and device provisioning within an enterprise IT operations function. Scope covers device imaging, software deployment, and compliance verification for new hire onboarding. Excludes network infrastructure and server management. Stakes: HIGH. [INTENDED_USE]: SKOS taxonomy extraction for an IT knowledge base

Example output

I'll run the Corpus Quality Auditor™ v1.1 on your inputs now. --- **PRE-FLIGHT CHECK** - `{CORPUS}`: Present. Contains readable natural language prose. Two email messages detected. - `{DOMAIN}`: Present and specific. - `{INTENDED_USE}`: Specified. - `{STAKES}`: Parsed from `{DOMAIN}` field → **HIGH** ⚠ **THIN CORPUS FLAG**: Corpus contains 2 email messages with fewer than 5 distinct topical statements across the full thread. Audit proceeds; scores will reflect limited content. Expand corpus before extraction for higher-confidence results. --- ### SECTION 1 — CORPUS PROFILE ``` CORPUS TYPE: emails CORPUS SEGMENTS: 2 messages (1 thread, 2 turns) DOMAIN: IT endpoint management and device provisioning, enterprise IT operations; new hire onboarding scope STAKES: HIGH INTENDED USE: SKOS taxonomy extraction for an IT knowledge base LANGUAGE: English APPROXIMATE LENGTH: short (<1000 words) VISIBLE STRUCTURE: 2-message email thread. Message 1: IT Operations reporting a provisioning gap. Message 2: IT Helpdesk responding with root cause and resolution timeline. Each message contains a 4-line signature block and a 3-line legal disclaimer. ``` ``` ALIGNMENT: ALIGNED EVIDENCE: Both messages directly address device provisioning, VPN deployment, endpoint protection, and compliance verification for a new hire — all within the stated domain scope. ``` --- ### SECTION 2 — BOILERPLATE INVENTORY ``` BOILERPLATE TYPE EXAMPLE FROM CORPUS FREQUENCY ──────────────────────────────────────────────────────────────────────────── Signature block "Sarah Jones | IT Operations | Acme Corp Appears in sarah.jones@acme.com | +1 (212) 555-0198" both messages Legal disclaimer "This email and any attachments are Appears in confidential. If you are not the intended both messages recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication." Email thread prefix "RE: RE: Laptop provisioning request" 1 instance ``` **Estimated boilerplate proportion:** ~35% of corpus word count. **Evidence:** Each message contains a 4-line signature block and a 3-line legal disclaimer. Across 2 messages of roughly 60–70 substantive words each, the boilerplate blocks account for approximately 7–8 lines per message against ~5–6 lines of substantive content. The legal disclaimer alone is repeated verbatim in both messages. --- ### SECTION 3 — DIMENSION SCORES ``` ┌─────────────────────────────────────────────────────────────┐ │ DIMENSION 1 — SUBJECT COHERENCE Score: 5 / 5 │ ├─────────────────────────────────────────────────────────────┤ │ Evidence: Both messages address a single provisioning │ │ incident: device imaging, VPN deployment, │ │ endpoint protection, and compliance check. │ │ Vocabulary is consistent throughout. │ │ Key signal: "standard build image", "endpoint compliance │ │ check", "SCCM deployment task", "Group Policy │ │ conflict" — all within a single domain thread. │ │ Risk: None from coherence. Domain is tightly scoped. │ └─────────────────────────────────────────────────────────────┘ ┌──────────────────────────────────────────────────────────────┐ │ DIMENSION 2 — BOILERPLATE DENSITY Score: 2 / 5 {BLOCKING}│ ├──────────────────────────────────────────────────────────────┤ │ Evidence: ~35% of corpus is boilerplate (see Section 2). │ │ Legal disclaimer repeated verbatim in both │ │ messages; signature block in both. │ │ Key signal: "This email and any attachments are │ │ confidential…" — 3-line block appearing twice, │ │ contributing no domain concepts. │ │ Risk: At HIGH stakes, extraction will surface │ │ disclaimer phrases ("confidential", │ │ "intended recipient", "disclaims liability") as │ │ candidate taxonomy terms. Signal-to-noise ratio │ │ is materially degraded relative to corpus size. │ └──────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ DIMENSION 3 — DOMAIN COVERAGE Score: 2 / 5 │ │ {BLOCKING} │ ├─────────────────────────────────────────────────────────────┤ │ Covered: Device imaging, VPN client deployment, endpoint │ │ protection software, Group Policy/OU │ │ configuration, SCCM deployment, onboarding │ │ compliance check — one incident thread. │ │ Missing: Asset inventory and tracking, device lifecycle │ │ management, hardware procurement, MDM/UEM │ │ platforms, patch management, BitLocker/disk │ │ encryption, certificate management, helpdesk │ │ ticketing workflow, escalation procedures, │ │ compliance reporting, offboarding/device │ │ return. None of these sub-areas have any │ │ representation in the corpus. │ │ Risk: A taxonomy extracted from this corpus would │ │ appear to define IT endpoint management as │ │ exclusively a VPN + SCCM deployment domain. │ │ Coverage gaps will look like intentional scope │ │ decisions to downstream consumers. │ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ DIMENSION 4 — DUPLICATION DENSITY Score: 4 / 5 │ ├─────────────────────────────────────────────────────────────┤ │ Evidence: Legal disclaimer is repeated verbatim in both │ │ messages, but this is accounted for in │ │ Dimension 2. No substantive content is │ │ duplicated. No thread-quoting observed │ │ (Message 2 does not re-quote Message 1). │ │ Key signal: Disclaimer repetition is the only duplication │ │ detected; substantive content is unique │ │ across both turns. │ │ Risk: Minimal frequency bias. "Confidential" and │ │ related disclaimer terms may receive slightly │ │ inflated frequency weight — minor at this scale.│ └─────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────┐ │ DIMENSION 5 — EXTRACTION SIGNAL STRENGTH Score: 3 / 5 │ ├─────────────────────────────────────────────────────────────┤ │ Evidence: The substantive content is technically precise │ │ and concept-rich for its length: "Group Policy │ │ conflict", "SCCM deployment task", "endpoint │ │ compliance check", "build image". However, the │ │ corpus is a single incident thread, not a │ │ domain knowledge artifact. Relationships between│ │ concepts are implicit (no definitions, no │ │ process documentation, no policy text). │ │ Key signal: "The VPN client deployment failed due to a │ │ Group Policy conflict on the user's OU" — good │ │ concept density, but concept relationships are │ │ described situationally, not definitionally. │ │ Risk: Concept yield will be narrow. Extraction will │ │ produce a handful of valid leaf-node terms with │ │ weak hierarchical structure above them. │ └─────────────────────────────────────────────────────────────┘ CORPUS READINESS SCORE: 16 / 25 ``` **Stakes adjustment:** STAKES = HIGH. Dimensions 2 and 3 each score 2/5 → both are **BLOCKING** findings. --- ### SECTION 4 — FINDINGS & REMEDIATION --- ``` FINDING: Boilerplate Density — Score 2/5 SEVERITY: BLOCKING OBSERVED: Both messages contain an identical 3-line legal disclaimer and a 4-line signature block. Combined, these account for approximately 35% of total corpus word count. Example from Message 1: "This email and any attachments are confidential. If you are not the intended recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication." IMPACT ON EXTRACTION: A SKOS taxonomy extracted from this corpus without cleanup will produce candidate preferred labels or scope notes drawn from disclaimer vocabulary: "confidential", "intended recipient", "notify the sender", "disclaims liability". These are legal boilerplate terms, not IT endpoint management concepts. At this corpus size (~130 substantive words), boilerplate is not diluted by volume — it constitutes a near-equal competitor to domain signal. REMEDIATION: Step 1: Apply a pre-processing pass to strip all email signature blocks (name, title, company, phone, email address lines). Step 2: Strip the legal disclaimer block — identify it by the phrase "This email and any attachments are confidential" and remove through end of the paragraph. Step 3: Strip thread prefixes (RE:, FWD:) from subject lines. Step 4: Validate by reviewing the cleaned corpus — confirm that remaining content is entirely substantive prose. RE-AUDIT NEEDED: YES ``` --- ``` FINDING: Domain Coverage — Score 2/5 SEVERITY: BLOCKING OBSERVED: The corpus contains a single 2-message email thread addressing one provisioning incident (VPN deployment failure for one device). The following sub-areas of the stated domain — IT endpoint management and device provisioning for new hire onboarding — have zero representation: asset inventory, device lifecycle management, hardware procurement, MDM/UEM platforms, disk encryption, patch management, certificate provisioning, helpdesk ticketing workflow, escalation procedures, compliance reporting, offboarding/device return. The corpus also contains no policy documents, process guides, runbooks, or knowledge base articles. IMPACT ON EXTRACTION: A SKOS taxonomy extracted from this corpus will have exactly three populated branches: VPN deployment, endpoint protection, and device imaging. All other sub-areas of the domain will be absent. To downstream consumers of the taxonomy, this will appear to reflect deliberate scope decisions rather than a corpus gap. Hierarchical depth above the leaf nodes (e.g., a meaningful "Device Provisioning" → "Software Deployment" → "VPN Configuration" structure) cannot be established from 2 emails alone. REMEDIATION: Step 1: Identify the full intended scope of the IT knowledge base this taxonomy will serve. Map the sub-domains explicitly (e.g., imaging, software deployment, compliance, asset tracking, offboarding). Step 2: Expand the corpus with source material covering each sub-domain: runbooks, SOPs, policy documents, prior helpdesk ticket archives (resolved tickets across incident types), KB articles, onboarding checklists. Step 3: Target a minimum of 5–10 substantive documents or 20+ distinct incident records across the full domain scope before re-running this audit. Step 4: Validate coverage by checking that each intended sub-domain is represented by at least 2–3 documents with substantive concept density before proceeding to extraction. RE-AUDIT NEEDED: YES ``` --- ``` FINDING: Extraction Signal Strength — Score 3/5 SEVERITY: MEDIUM OBSERVED: The substantive content is technically precise but situational. Concepts are named in the context of one incident ("the VPN client deployment failed due to a Group Policy conflict on the user's OU") rather than defined or described in terms of process or policy. No definitions, no hierarchical relationships, no process descriptions are present. IMPACT ON EXTRACTION: Concept yield will be limited to approximately 8–12 valid terms (VPN client, endpoint protection, build image, SCCM, Group Policy, OU, endpoint compliance check, device provisioning). Hierarchical structure above these terms cannot be inferred from the corpus — the extraction prompt would need to impose hierarchy rather than derive it. This is acceptable as a seed term set but not as a basis for a taxonomy with meaningful broader/narrower relationships. REMEDIATION: Step 1: Add process documentation (runbooks, SOPs) to the corpus — these explicitly name concepts and their relationships, which is what produces hierarchy in extraction. Step 2: Add policy or compliance documents — these define terms and establish scope, producing scope notes and definitions alongside preferred labels. Step 3: After corpus expansion (per Coverage remediation), signal strength should improve organically. Re-evaluate at that point rather than treating this as a standalone fix. RE-AUDIT NEEDED: YES (subsumed by Coverage re-audit) ``` --- **Refused inferences:** ``` REFUSED: That the SCCM deployment failure is a recurring or systemic issue in this environment. REASON: The corpus contains one incident. No prior tickets, trend data, or escalation history is present to support a pattern inference. REFUSED: That the corpus represents the full scope of IT operations activity at Acme Corp. REASON: Two emails from one thread are not a representative sample. The domain specification is broader than what a single provisioning incident can evidence. ``` --- ### SECTION 5 — READINESS VERDICT ``` ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ CORPUS READINESS VERDICT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SCORE: 16 / 25 STAKES: HIGH BLOCKING FLAGS: 2 (Boilerplate Density: 2/5; Domain Coverage: 2/5) VERDICT: HALT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ DIMENSION SUMMARY: Coherence: 5/5 GO Boilerplate: 2/5 BLOCKING Coverage: 2/5 BLOCKING Duplication: 4/5 GO Signal: 3/5 CAUTION Status definitions: GO → score 4–5 (no action needed) CAUTION → score 3 (finding raised in Section 4; non-blocking) BLOCKING → score 1–2 at applicable stakes threshold ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` **Verdict rationale:** Two BLOCKING findings drive this HALT. The corpus is too small and too narrow to support taxonomy extraction for the stated domain scope — a single incident thread cannot produce a representative IT endpoint management taxonomy. Boilerplate constitutes approximately a third of the corpus by word count, a proportion that becomes extraction noise rather than signal at this volume. Address both BLOCKING findings — strip boilerplate and substantially expand the corpus with multi-document, multi-sub-domain source material — then re-run this audit before proceeding. --- ### SECTION 6 — DOWNSTREAM HANDOFF **Verdict is HALT — do not proceed to extraction.** **For SKOS Taxonomy Extractor™ v1.1:** Do not pass this corpus to the extractor. Extraction will produce a narrow, boilerplate-contaminated term set that looks like a valid taxonomy but reflects corpus noise and a single incident rather than domain structure. After remediating both BLOCKING findings and re-running this audit to a GO or CONDITIONAL GO verdict, the recommended `{DEPTH}` based on current Dimension 5 signal score (3/5) is **shallow (2 levels)**. If corpus expansion improves signal to 4/5, upgrade to **standard (3 levels)**. **For RAG pipeline ingestion:** Boilerplate must be stripped before chunking. Legal disclaimers and signature blocks in chunks will degrade retrieval precision — queries about VPN deployment will surface chunks containing "disclaims any liability" as co-located text. The coverage gap is the larger concern for RAG: a knowledge base built from this corpus will silently fail on any query outside the VPN/SCCM/imaging sub-domain. No retrieval strategy compensates for absent source material. **For ontology design (Prompt Ontology Designer™ v1.0):** Domain coverage gaps here are extensive. Asset inventory, patch management, disk encryption, MDM platforms, offboarding — all are expected ontology classes for this domain that have no corpus evidence. Plan for substantial manual ontology extension. The corpus can contribute a small, reliable seed set of leaf-node concepts (VPN client, SCCM, Group Policy OU, endpoint protection, build image, compliance check) but cannot drive class hierarchy design.

🌀 Claude

Corpus Quality Auditor 1

Name: Corpus Quality Auditor 1 Claude Prompt
Brand: PromptBase
Price: 29.99 USD
Availability: InStock
Author: monna

@monna

$29.99

What do I get when I buy a prompt?

Add to Cart

Instant access

Commercial use

Money‑back

By purchasing this prompt, you agree to our terms of service

CLAUDE-4-6-SONNET

This prompt audits your corpus before you extract from it. It scores five quality dimensions, flags specific problems with actionable remediation steps, and tells you whether your corpus is ready to extract from — or what needs to be fixed first.

...more

Added 6 days ago

Browse Marketplace