Prompt Details
Model
(claude-4-6-sonnet)
Token size
6,308
Example input
[CORPUS]: From: sarah.jones@acme.com To: it-helpdesk@acme.com Subject: RE: Laptop provisioning request Hi team, following up on ticket #HD-4421. The laptop assigned to the new hire has not been configured with VPN client or endpoint protection software. The device was delivered without the standard build image applied. Per our onboarding policy, all devices must pass the endpoint compliance check before being handed to the employee. Best regards, Sarah Jones | IT Operations | Acme Corp sarah.jones@acme.com | +1 (212) 555-0198 This email and any attachments are confidential. If you are not the intended recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication. --- From: it-helpdesk@acme.com To: sarah.jones@acme.com Subject: RE: RE: Laptop provisioning request Hi Sarah, the device was imaged but the VPN client deployment failed due to a Group Policy conflict on the user's OU. We are re-running the SCCM deployment task. Endpoint protection pushed successfully. Expected resolution: end of business today. Best regards, IT Helpdesk | Acme Corp it-helpdesk@acme.com | +1 (212) 555-0100 This email and any attachments are confidential. If you are not the intended recipient, please delete this message immediately and notify the sender. Acme Corp disclaims any liability for loss or damage arising from this communication.
[DOMAIN]: IT endpoint management and device provisioning within an enterprise IT operations function. Scope covers device imaging, software deployment, and compliance verification for new hire onboarding. Excludes network infrastructure and server management. Stakes: HIGH.
[INTENDED_USE]: SKOS taxonomy extraction for an IT knowledge base
Example output
I'll run the Corpus Quality Auditor™ v1.1 on your inputs now.
---
**PRE-FLIGHT CHECK**
- `{CORPUS}`: Present. Contains readable natural language prose. Two email messages detected.
- `{DOMAIN}`: Present and specific.
- `{INTENDED_USE}`: Specified.
- `{STAKES}`: Parsed from `{DOMAIN}` field → **HIGH**
⚠ **THIN CORPUS FLAG**: Corpus contains 2 email messages with fewer than 5 distinct topical statements across the full thread. Audit proceeds; scores will reflect limited content. Expand corpus before extraction for higher-confidence results.
---
### SECTION 1 — CORPUS PROFILE
```
CORPUS TYPE: emails
CORPUS SEGMENTS: 2 messages (1 thread, 2 turns)
DOMAIN: IT endpoint management and device provisioning,
enterprise IT operations; new hire onboarding scope
STAKES: HIGH
INTENDED USE: SKOS taxonomy extraction for an IT knowledge base
LANGUAGE: English
APPROXIMATE LENGTH: short (<1000 words)
VISIBLE STRUCTURE: 2-message email thread. Message 1: IT Operations
reporting a provisioning gap. Message 2: IT Helpdesk
responding with root cause and resolution timeline.
Each message contains a 4-line signature block and a
3-line legal disclaimer.
```
```
ALIGNMENT: ALIGNED
EVIDENCE: Both messages directly address device provisioning, VPN
deployment, endpoint protection, and compliance verification
for a new hire — all within the stated domain scope.
```
---
### SECTION 2 — BOILERPLATE INVENTORY
```
BOILERPLATE TYPE EXAMPLE FROM CORPUS FREQUENCY
────────────────────────────────────────────────────────────────────────────
Signature block "Sarah Jones | IT Operations | Acme Corp Appears in
sarah.jones@acme.com | +1 (212) 555-0198" both messages
Legal disclaimer "This email and any attachments are Appears in
confidential. If you are not the intended both messages
recipient, please delete this message
immediately and notify the sender. Acme
Corp disclaims any liability for loss or
damage arising from this communication."
Email thread prefix "RE: RE: Laptop provisioning request" 1 instance
```
**Estimated boilerplate proportion:** ~35% of corpus word count.
**Evidence:** Each message contains a 4-line signature block and a 3-line legal disclaimer. Across 2 messages of roughly 60–70 substantive words each, the boilerplate blocks account for approximately 7–8 lines per message against ~5–6 lines of substantive content. The legal disclaimer alone is repeated verbatim in both messages.
---
### SECTION 3 — DIMENSION SCORES
```
┌─────────────────────────────────────────────────────────────┐
│ DIMENSION 1 — SUBJECT COHERENCE Score: 5 / 5 │
├─────────────────────────────────────────────────────────────┤
│ Evidence: Both messages address a single provisioning │
│ incident: device imaging, VPN deployment, │
│ endpoint protection, and compliance check. │
│ Vocabulary is consistent throughout. │
│ Key signal: "standard build image", "endpoint compliance │
│ check", "SCCM deployment task", "Group Policy │
│ conflict" — all within a single domain thread. │
│ Risk: None from coherence. Domain is tightly scoped. │
└─────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ DIMENSION 2 — BOILERPLATE DENSITY Score: 2 / 5 {BLOCKING}│
├──────────────────────────────────────────────────────────────┤
│ Evidence: ~35% of corpus is boilerplate (see Section 2). │
│ Legal disclaimer repeated verbatim in both │
│ messages; signature block in both. │
│ Key signal: "This email and any attachments are │
│ confidential…" — 3-line block appearing twice, │
│ contributing no domain concepts. │
│ Risk: At HIGH stakes, extraction will surface │
│ disclaimer phrases ("confidential", │
│ "intended recipient", "disclaims liability") as │
│ candidate taxonomy terms. Signal-to-noise ratio │
│ is materially degraded relative to corpus size. │
└──────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DIMENSION 3 — DOMAIN COVERAGE Score: 2 / 5 │
│ {BLOCKING} │
├─────────────────────────────────────────────────────────────┤
│ Covered: Device imaging, VPN client deployment, endpoint │
│ protection software, Group Policy/OU │
│ configuration, SCCM deployment, onboarding │
│ compliance check — one incident thread. │
│ Missing: Asset inventory and tracking, device lifecycle │
│ management, hardware procurement, MDM/UEM │
│ platforms, patch management, BitLocker/disk │
│ encryption, certificate management, helpdesk │
│ ticketing workflow, escalation procedures, │
│ compliance reporting, offboarding/device │
│ return. None of these sub-areas have any │
│ representation in the corpus. │
│ Risk: A taxonomy extracted from this corpus would │
│ appear to define IT endpoint management as │
│ exclusively a VPN + SCCM deployment domain. │
│ Coverage gaps will look like intentional scope │
│ decisions to downstream consumers. │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DIMENSION 4 — DUPLICATION DENSITY Score: 4 / 5 │
├─────────────────────────────────────────────────────────────┤
│ Evidence: Legal disclaimer is repeated verbatim in both │
│ messages, but this is accounted for in │
│ Dimension 2. No substantive content is │
│ duplicated. No thread-quoting observed │
│ (Message 2 does not re-quote Message 1). │
│ Key signal: Disclaimer repetition is the only duplication │
│ detected; substantive content is unique │
│ across both turns. │
│ Risk: Minimal frequency bias. "Confidential" and │
│ related disclaimer terms may receive slightly │
│ inflated frequency weight — minor at this scale.│
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ DIMENSION 5 — EXTRACTION SIGNAL STRENGTH Score: 3 / 5 │
├─────────────────────────────────────────────────────────────┤
│ Evidence: The substantive content is technically precise │
│ and concept-rich for its length: "Group Policy │
│ conflict", "SCCM deployment task", "endpoint │
│ compliance check", "build image". However, the │
│ corpus is a single incident thread, not a │
│ domain knowledge artifact. Relationships between│
│ concepts are implicit (no definitions, no │
│ process documentation, no policy text). │
│ Key signal: "The VPN client deployment failed due to a │
│ Group Policy conflict on the user's OU" — good │
│ concept density, but concept relationships are │
│ described situationally, not definitionally. │
│ Risk: Concept yield will be narrow. Extraction will │
│ produce a handful of valid leaf-node terms with │
│ weak hierarchical structure above them. │
└─────────────────────────────────────────────────────────────┘
CORPUS READINESS SCORE: 16 / 25
```
**Stakes adjustment:** STAKES = HIGH. Dimensions 2 and 3 each score 2/5 → both are **BLOCKING** findings.
---
### SECTION 4 — FINDINGS & REMEDIATION
---
```
FINDING: Boilerplate Density — Score 2/5
SEVERITY: BLOCKING
OBSERVED: Both messages contain an identical 3-line legal disclaimer
and a 4-line signature block. Combined, these account for approximately
35% of total corpus word count. Example from Message 1:
"This email and any attachments are confidential. If you are not the
intended recipient, please delete this message immediately and notify
the sender. Acme Corp disclaims any liability for loss or damage
arising from this communication."
IMPACT ON EXTRACTION:
A SKOS taxonomy extracted from this corpus without cleanup will
produce candidate preferred labels or scope notes drawn from
disclaimer vocabulary: "confidential", "intended recipient",
"notify the sender", "disclaims liability". These are legal
boilerplate terms, not IT endpoint management concepts. At this
corpus size (~130 substantive words), boilerplate is not diluted
by volume — it constitutes a near-equal competitor to domain
signal.
REMEDIATION:
Step 1: Apply a pre-processing pass to strip all email signature
blocks (name, title, company, phone, email address lines).
Step 2: Strip the legal disclaimer block — identify it by the
phrase "This email and any attachments are confidential"
and remove through end of the paragraph.
Step 3: Strip thread prefixes (RE:, FWD:) from subject lines.
Step 4: Validate by reviewing the cleaned corpus — confirm that
remaining content is entirely substantive prose.
RE-AUDIT NEEDED: YES
```
---
```
FINDING: Domain Coverage — Score 2/5
SEVERITY: BLOCKING
OBSERVED: The corpus contains a single 2-message email thread
addressing one provisioning incident (VPN deployment failure for one
device). The following sub-areas of the stated domain — IT endpoint
management and device provisioning for new hire onboarding — have
zero representation: asset inventory, device lifecycle management,
hardware procurement, MDM/UEM platforms, disk encryption, patch
management, certificate provisioning, helpdesk ticketing workflow,
escalation procedures, compliance reporting, offboarding/device
return. The corpus also contains no policy documents, process guides,
runbooks, or knowledge base articles.
IMPACT ON EXTRACTION:
A SKOS taxonomy extracted from this corpus will have exactly three
populated branches: VPN deployment, endpoint protection, and device
imaging. All other sub-areas of the domain will be absent. To
downstream consumers of the taxonomy, this will appear to reflect
deliberate scope decisions rather than a corpus gap. Hierarchical
depth above the leaf nodes (e.g., a meaningful "Device Provisioning"
→ "Software Deployment" → "VPN Configuration" structure) cannot be
established from 2 emails alone.
REMEDIATION:
Step 1: Identify the full intended scope of the IT knowledge base
this taxonomy will serve. Map the sub-domains explicitly
(e.g., imaging, software deployment, compliance, asset
tracking, offboarding).
Step 2: Expand the corpus with source material covering each
sub-domain: runbooks, SOPs, policy documents, prior
helpdesk ticket archives (resolved tickets across incident
types), KB articles, onboarding checklists.
Step 3: Target a minimum of 5–10 substantive documents or 20+
distinct incident records across the full domain scope
before re-running this audit.
Step 4: Validate coverage by checking that each intended sub-domain
is represented by at least 2–3 documents with substantive
concept density before proceeding to extraction.
RE-AUDIT NEEDED: YES
```
---
```
FINDING: Extraction Signal Strength — Score 3/5
SEVERITY: MEDIUM
OBSERVED: The substantive content is technically precise but
situational. Concepts are named in the context of one incident
("the VPN client deployment failed due to a Group Policy conflict
on the user's OU") rather than defined or described in terms of
process or policy. No definitions, no hierarchical relationships,
no process descriptions are present.
IMPACT ON EXTRACTION:
Concept yield will be limited to approximately 8–12 valid terms
(VPN client, endpoint protection, build image, SCCM, Group Policy,
OU, endpoint compliance check, device provisioning). Hierarchical
structure above these terms cannot be inferred from the corpus —
the extraction prompt would need to impose hierarchy rather than
derive it. This is acceptable as a seed term set but not as a
basis for a taxonomy with meaningful broader/narrower relationships.
REMEDIATION:
Step 1: Add process documentation (runbooks, SOPs) to the corpus —
these explicitly name concepts and their relationships,
which is what produces hierarchy in extraction.
Step 2: Add policy or compliance documents — these define terms
and establish scope, producing scope notes and definitions
alongside preferred labels.
Step 3: After corpus expansion (per Coverage remediation), signal
strength should improve organically. Re-evaluate at that
point rather than treating this as a standalone fix.
RE-AUDIT NEEDED: YES (subsumed by Coverage re-audit)
```
---
**Refused inferences:**
```
REFUSED: That the SCCM deployment failure is a recurring or systemic
issue in this environment.
REASON: The corpus contains one incident. No prior tickets, trend
data, or escalation history is present to support a pattern
inference.
REFUSED: That the corpus represents the full scope of IT operations
activity at Acme Corp.
REASON: Two emails from one thread are not a representative sample.
The domain specification is broader than what a single
provisioning incident can evidence.
```
---
### SECTION 5 — READINESS VERDICT
```
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
CORPUS READINESS VERDICT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
SCORE: 16 / 25
STAKES: HIGH
BLOCKING FLAGS: 2 (Boilerplate Density: 2/5; Domain Coverage: 2/5)
VERDICT: HALT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
DIMENSION SUMMARY:
Coherence: 5/5 GO
Boilerplate: 2/5 BLOCKING
Coverage: 2/5 BLOCKING
Duplication: 4/5 GO
Signal: 3/5 CAUTION
Status definitions:
GO → score 4–5 (no action needed)
CAUTION → score 3 (finding raised in Section 4; non-blocking)
BLOCKING → score 1–2 at applicable stakes threshold
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```
**Verdict rationale:** Two BLOCKING findings drive this HALT. The corpus is too small and too narrow to support taxonomy extraction for the stated domain scope — a single incident thread cannot produce a representative IT endpoint management taxonomy. Boilerplate constitutes approximately a third of the corpus by word count, a proportion that becomes extraction noise rather than signal at this volume. Address both BLOCKING findings — strip boilerplate and substantially expand the corpus with multi-document, multi-sub-domain source material — then re-run this audit before proceeding.
---
### SECTION 6 — DOWNSTREAM HANDOFF
**Verdict is HALT — do not proceed to extraction.**
**For SKOS Taxonomy Extractor™ v1.1:**
Do not pass this corpus to the extractor. Extraction will produce a narrow, boilerplate-contaminated term set that looks like a valid taxonomy but reflects corpus noise and a single incident rather than domain structure. After remediating both BLOCKING findings and re-running this audit to a GO or CONDITIONAL GO verdict, the recommended `{DEPTH}` based on current Dimension 5 signal score (3/5) is **shallow (2 levels)**. If corpus expansion improves signal to 4/5, upgrade to **standard (3 levels)**.
**For RAG pipeline ingestion:**
Boilerplate must be stripped before chunking. Legal disclaimers and signature blocks in chunks will degrade retrieval precision — queries about VPN deployment will surface chunks containing "disclaims any liability" as co-located text. The coverage gap is the larger concern for RAG: a knowledge base built from this corpus will silently fail on any query outside the VPN/SCCM/imaging sub-domain. No retrieval strategy compensates for absent source material.
**For ontology design (Prompt Ontology Designer™ v1.0):**
Domain coverage gaps here are extensive. Asset inventory, patch management, disk encryption, MDM platforms, offboarding — all are expected ontology classes for this domain that have no corpus evidence. Plan for substantial manual ontology extension. The corpus can contribute a small, reliable seed set of leaf-node concepts (VPN client, SCCM, Group Policy OU, endpoint protection, build image, compliance check) but cannot drive class hierarchy design.
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-6-SONNET
This prompt audits your corpus before you extract from it. It scores five quality dimensions, flags specific problems with actionable remediation steps, and tells you whether your corpus is ready to extract from — or what needs to be fixed first.
...more
Added 6 days ago
