Prompt Details

Model

(claude-4-6-sonnet)

Token size

1,800

Example input

[USE_CASE]: Store customer accounts, support tickets, products, and issue relationships. The graph must answer questions such as: "Which customers are affected by this bug?", "What issues are commonly reported together?", and "What product areas generate the most escalations?" [QUERY_PATTERNS]: Find all tickets connected to a specific product feature within 3 hops. Identify customers affected by related incidents. Discover recurring issue clusters across products. Trace escalation paths from customer complaint to engineering resolution. [SCALE_AND_WRITE]: 15 million nodes, 80 million edges, read-heavy with continuous ticket ingestion. [CONSTRAINTS]: Team already knows SQL and graph basics. Sub-second traversal latency required. Cloud deployment allowed. Moderate budget.

Example output

The document was provided inline in the conversation. I'll apply the system prompt directly to the inputs. --- ## 🧠 Knowledge Graph Backend Recommendation --- **VERDICT: F1 (network\_pg)** — Multi-hop ticket/product/customer traversals with sub-second latency and named-relationship semantics are the canonical property graph workload. **CONFIDENCE: 87%** — Scale (15M nodes / 80M edges) sits comfortably above the F3 threshold and the query shapes are traversal-first; the only uncertainty is whether issue clustering will demand ML-grade vector similarity at query time (which would elevate F8 from FLAG to partial co-VERDICT). **TAXONOMY NOTE: R1 drove the verdict** — Query shape (multi-hop traversals with relationship-typed paths, 3-hop fan-out, escalation path tracing) maps cleanly to Cypher-class property graphs before scale is even considered. R3 was evaluated and eliminated: 15M nodes at even 500 bytes average = ~7.5 GB nodes alone, plus 80M edges, puts this well past in-process F3 territory. R3 requires stating the threshold: F3 was ruled out at the ~10M-node server-backed preference boundary, and the 80M-edge count makes in-process memory pressure prohibitive on commodity hardware. **HYBRID FLAG: F8 as FLAG (not VERDICT).** Query pattern 3 — *"Discover recurring issue clusters across products"* — has a retrieval shape consistent with agent-memory patterns (semantic similarity over ticket text + graph context). If issue clustering is implemented as vector-similarity over ticket embeddings joined to graph traversal, F8 retrieval patterns apply *on top of* the F1 storage layer. Storage family remains F1; the retrieval layer (e.g., embedding ticket descriptions and doing ANN search to seed graph traversal) follows F8 conventions. This is **F8 FLAG, not F8 VERDICT** because the primary design drivers — latency-sensitive multi-hop traversal, relationship-typed paths, write throughput from ticket ingestion — are not retrieval-architecture concerns. --- ### TRADE-OFF MATRIX *(Verdict = F1 · Runner-ups = F3 · F2)* ``` +------------------+------------------+------------------+------------------+ | Dimension | F1 (network_pg) | F3 (embedded) | F2 (rdf/SPARQL) | +------------------+------------------+------------------+------------------+ | Query fit | 5 | 2 | 3 | | Traversal depth | 5 | 2 | 3 | | Write pattern | 4 | 2 | 3 | | Ops burden | 3 | 5 | 2 | | Team fit | 4 | 3 | 2 | | Vector hybrid* | 4 | 2 | 2 | +------------------+------------------+------------------+------------------+ ``` **Row justifications:** - **Query fit** — Cypher pattern-match syntax is purpose-built for "find all tickets linked to feature X within N hops" and escalation path queries; F3 has no query language (imperative traversal only); F2's SPARQL property paths work but are verbose for this shape. - **Traversal depth** — F1 server-backed graphs maintain adjacency indexes that sustain sub-second 3-hop fan-out at this edge count; F3 would require loading the full 80M-edge graph into RAM on a single process, which exceeds practical constraints; F2 traversal is possible but SPARQL property paths are not optimized for low-latency multi-hop in most open-source implementations. - **Write pattern** — Continuous ticket ingestion (write-heavy on the edge side) is well-handled by F1 ACID transaction support; F3 in-process structures are not designed for concurrent ingestion; F2 triple stores have adequate write paths but graph update semantics are more complex. - **Ops burden** — F1 managed cloud options (e.g., {GENERAL KNOWLEDGE — verify current versions}: Neo4j Aura, Amazon Neptune, Memgraph Cloud) reduce ops; F3 has near-zero ops since there's no server, but this is moot given scale; F2 managed options are fewer and less mature in cloud. - **Team fit** — SQL-familiar teams learn Cypher quickly (it is SQL-adjacent in intent, graph-native in syntax); F3 requires Python graph traversal code with no declarative layer; F2 requires SPARQL and RDF modeling, a steeper shift. - **Vector hybrid** *(F8 FLAG applies)* — F1 products increasingly offer native vector index support alongside graph traversal ({GENERAL KNOWLEDGE — verify current versions}: Neo4j vector index, Kuzu columnar+vector roadmap), making the F8 FLAG pattern implementable without a separate vector store; F3 would require a separate ANN library with manual join logic; F2 has limited native vector support. --- ### MIGRATION FORECAST **Pivot 1: Issue clustering evolves into a fully AI-driven semantic search product** (agents querying "find tickets semantically similar to this bug report + their graph neighborhood") → **cost to move: LOW–MED.** F1 with a native vector index already supports this pattern. Migration cost is low if the chosen F1 product supports vector indexes; medium if a dedicated vector store (e.g., Weaviate, Qdrant) needs to be introduced alongside and kept in sync. F8 would become co-VERDICT at this point. **Pivot 2: Compliance or semantic interoperability requirement forces RDF/OWL ontology** (e.g., integrating with an industry-standard product taxonomy using subclass reasoning) → **cost to move: HIGH.** Moving from a property graph model to a triple store model requires full schema re-modeling, ETL rewrites, and Cypher-to-SPARQL query rewrites. The team SQL familiarity advantage is lost. Avoid this pivot by designing the F1 schema with ontology alignment in mind from day one if this risk is visible. --- ### VERDICT TAXONOMY **FIT WITH HYBRID FLAG** — F1 is a clean fit for the primary workload; F8 retrieval patterns are a relevant secondary concern for issue-clustering queries and should be addressed as a retrieval layer design decision, not a storage family change. --- ### FALSIFICATION BLOCK - **F-1:** If benchmark traversal tests on your target F1 product show >1 second p95 latency for 3-hop fan-out queries on the 80M-edge graph *before* any query optimization, the sub-second constraint is at risk and you must evaluate whether columnar graph engines (e.g., Kuzu — {GENERAL KNOWLEDGE — verify current versions}) or a purpose-built engine better fits the latency budget. - **F-2:** If query pattern 3 (issue clustering) is found in practice to require vector similarity as the *primary* access path (i.e., most queries start with "find semantically similar tickets" rather than "start from this known ticket and traverse"), F8 becomes co-VERDICT and the storage choice should be re-evaluated around vector-first retrieval with graph as a secondary enrichment step. - **F-3:** If ticket ingestion throughput exceeds ~50,000 writes/second sustained and the team measures write latency degradation under load on the chosen F1 product, the continuous ingestion assumption needs re-architecting (e.g., async edge materialization via a queue) — not a family change, but a falsification of the write-pattern score as given.

🌀 Claude