PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
(claude-4-5-sonnet)
Token size
986
Example input
[DATA_SOURCES]: POS systems (MySQL), eCommerce API (Shopify), marketing spend CSVs, customer loyalty DB [ANALYTICS_NEEDS]: Sales performance dashboards, customer segmentation, marketing ROI reports [WAREHOUSE_PLATFORM]: Snowflake (cloud), integrated with Tableau [GOVERNANCE_REQUIREMENTS]: PCI compliance, access control by department, lineage tracking, encryption at rest
Example output
```json { "architecture_overview": { "platform": "Snowflake", "rationale": "Cloud-native separation of storage/compute, excellent scalability, native Tableau integration, strong security features for PCI compliance, automatic encryption at rest", "architecture_pattern": "Hub-and-spoke dimensional model with layered ELT approach", "data_flow": "Sources → Raw Layer (staged data) → Integration Layer (conformed dimensions) → Presentation Layer (dimensional marts) → Tableau", "layers": { "raw_database": "RAW_DB - Immutable source system replicas, minimal transformation, full audit trail", "integration_database": "INTEGRATION_DB - Conformed dimensions, master data management, data quality rules applied", "presentation_database": "ANALYTICS_DB - Star schema dimensional marts optimized for business domains", "governance_database": "GOVERNANCE_DB - Metadata, lineage, data quality metrics, audit logs" }, "compute_strategy": { "etl_warehouse": "Medium warehouse for ELT processing (auto-suspend after 5min idle)", "analytics_warehouse": "Large multi-cluster warehouse for Tableau queries (auto-scale 1-3 clusters)", "data_science_warehouse": "XL warehouse for ad-hoc analysis (manual start, used on-demand)" }, "storage_strategy": "Time-travel enabled (90 days), fail-safe for PCI compliance, micro-partitioning for query performance" }, "schema_design": { "modeling_approach": "Star schema with selective snowflaking for large dimensions", "dimensional_marts": { "sales_mart": { "fact_tables": { "fact_sales_transactions": { "grain": "One row per transaction line item", "measures": [ "transaction_amount", "quantity_sold", "discount_amount", "tax_amount", "cost_of_goods", "profit_amount" ], "dimension_keys": [ "date_key", "time_key", "product_key", "customer_key", "store_key", "payment_method_key", "promotion_key" ], "degenerate_dimensions": [ "transaction_id", "receipt_number", "order_id" ], "partitioning": "Clustered by (date_key, store_key) for query performance", "retention": "7 years for PCI compliance" }, "fact_daily_sales_summary": { "grain": "One row per product per store per day", "purpose": "Aggregate fact table for faster dashboard queries", "measures": [ "total_transactions", "total_revenue", "total_quantity", "total_profit", "avg_transaction_value", "unique_customers" ], "refresh_strategy": "Daily incremental, full rebuild monthly" } }, "dimensions": { "dim_date": { "type": "Type 0 (static)", "attributes": [ "date_key (PK)", "full_date", "day_of_week", "day_of_month", "month_name", "quarter", "fiscal_period", "is_weekend", "is_holiday", "holiday_name" ], "pre_populated": "10 years (2020-2030)" }, "dim_time": { "type": "Type 0 (static)", "grain": "Minute level", "attributes": [ "time_key (PK)", "hour", "minute", "time_of_day_segment", "is_business_hours" ] }, "dim_product": { "type": "Type 2 SCD (track history)", "natural_key": "product_sku", "attributes": [ "product_key (PK surrogate)", "product_sku (NK)", "product_name", "product_description", "brand", "category_level_1", "category_level_2", "category_level_3", "unit_cost", "unit_price", "is_active", "effective_date", "end_date", "is_current_flag" ], "scd_tracking": "Price changes, category changes tracked historically", "indexing": "Clustered on product_key, indexed on product_sku" }, "dim_customer": { "type": "Type 2 SCD (track loyalty tier changes)", "natural_key": "customer_id", "attributes": [ "customer_key (PK surrogate)", "customer_id (NK)", "customer_email (ENCRYPTED - PCI)", "customer_name (MASKED for non-PCI users)", "loyalty_tier", "customer_segment", "registration_date", "demographic_attributes", "effective_date", "end_date", "is_current_flag" ], "security": "Column-level encryption for PII, masking policies by role", "scd_tracking": "Loyalty tier and segment changes preserved" }, "dim_store": { "type": "Type 1 SCD (overwrite)", "attributes": [ "store_key (PK)", "store_id (NK)", "store_name", "store_format", "region", "district", "address", "square_footage", "manager_name", "opening_date" ] }, "dim_payment_method": { "type": "Type 1 SCD", "attributes": [ "payment_method_key (PK)", "payment_type", "payment_provider", "is_online", "pci_category" ] }, "dim_promotion": { "type": "Type 2 SCD", "attributes": [ "promotion_key (PK)", "promotion_code", "promotion_name", "discount_type", "discount_value", "start_date", "end_date", "is_active" ] } } }, "customer_mart": { "fact_tables": { "fact_customer_behavior": { "grain": "One row per customer per day with activity", "measures": [ "visit_count", "transaction_count", "total_spend", "avg_basket_value", "loyalty_points_earned", "loyalty_points_redeemed" ], "purpose": "Customer segmentation and RFM analysis" } }, "shared_dimensions": "Reuses dim_customer, dim_date from sales_mart (conformed dimensions)" }, "marketing_mart": { "fact_tables": { "fact_marketing_spend": { "grain": "One row per campaign per channel per day", "measures": [ "spend_amount", "impressions", "clicks", "conversions", "attributed_revenue" ], "dimension_keys": [ "date_key", "campaign_key", "channel_key", "customer_segment_key" ] }, "fact_marketing_roi": { "grain": "One row per campaign summarizing full campaign performance", "type": "Accumulated snapshot fact", "measures": [ "total_spend", "total_attributed_revenue", "roi_percentage", "customer_acquisition_cost", "customers_acquired" ] } }, "dimensions": { "dim_campaign": { "type": "Type 2 SCD", "attributes": [ "campaign_key (PK)", "campaign_id", "campaign_name", "campaign_type", "start_date", "end_date", "budget" ] }, "dim_channel": { "type": "Type 1 SCD", "attributes": [ "channel_key (PK)", "channel_name", "channel_type", "is_digital", "cost_model" ] } } } }, "conformed_dimensions": [ "dim_date (shared across all marts)", "dim_customer (single version of truth)", "dim_product (if needed in marketing for product-specific campaigns)" ], "naming_conventions": { "facts": "fact_<business_process>", "dimensions": "dim_<entity>", "keys": "<table>_key for surrogate, <entity>_id for natural keys", "measures": "Descriptive names with units implied (amount, count, percentage)" } }, "etl_strategy": { "architecture_pattern": "ELT (Extract-Load-Transform) leveraging Snowflake compute", "orchestration_tool": "Apache Airflow (Astronomer Cloud or MWAA)", "transformation_framework": "dbt (data build tool) for SQL-based transformations", "data_flow_layers": { "layer_1_extraction": { "raw_landing": "RAW_DB schema - exact replicas of source systems", "extraction_methods": { "pos_mysql": { "tool": "Fivetran or Airbyte connector", "frequency": "Every 15 minutes (near real-time)", "method": "CDC (Change Data Capture) using binary logs", "tables": [ "transactions", "transaction_items", "payments", "stores", "employees" ], "volume": "~50K transactions/day, 200K line items", "fallback": "Full snapshot nightly if CDC fails" }, "shopify_api": { "tool": "Custom Python extractor (Airflow DAG)", "frequency": "Hourly", "method": "REST API with incremental timestamp filter", "endpoints": [ "orders", "customers", "products", "inventory_levels" ], "rate_limiting": "2 calls/second, retry with exponential backoff", "error_handling": "Dead letter queue for failed records, alert on 3 consecutive failures" }, "marketing_spend_csv": { "tool": "Airflow S3 sensor + Snowpipe", "frequency": "Daily (uploaded by marketing team)", "method": "S3 bucket monitoring → Snowpipe auto-ingest", "validation": "Schema validation, row count checks, date range validation", "file_format": "CSV with header row, pipe-delimited, UTF-8 encoding", "archival": "Move to archive bucket after successful load" }, "loyalty_database": { "tool": "Fivetran PostgreSQL connector", "frequency": "Every 30 minutes", "method": "Incremental replication using updated_at timestamp", "tables": [ "customers", "loyalty_transactions", "loyalty_tiers", "points_balance" ], "deduplication": "Window function on load_timestamp in staging" } }, "data_quality_raw": { "checks": [ "Schema validation (column names, types)", "Row count thresholds (alert if <10% or >200% of average)", "Null checks on critical fields", "Duplicate primary key detection" ], "action_on_failure": "Quarantine bad records, alert data engineering team, continue processing good records" } }, "layer_2_staging": { "staging_schema": "RAW_DB.STAGING", "purpose": "Light transformations, data type conversions, deduplication, basic cleansing", "transformations": { "deduplication": "Window functions partitioned by natural key, ordered by extraction timestamp", "type_casting": "String dates → DATE/TIMESTAMP, numeric strings → NUMERIC", "null_handling": "Replace system nulls with NULL, standardize empty strings", "key_standardization": "Trim whitespace, upper/lowercase normalization" }, "retention": "7 days rolling window for debugging", "implementation": "dbt snapshots for point-in-time analysis" }, "layer_3_integration": { "integration_schema": "INTEGRATION_DB", "purpose": "Create conformed dimensions, apply business rules, master data management", "dimension_processing": { "scd_type_2_logic": { "implementation": "dbt macros with snapshot functionality", "process": [ "1. Compare incoming records with current dimension state", "2. Identify changes in tracked attributes", "3. Close out old record (set end_date, is_current_flag=FALSE)", "4. Insert new record (effective_date=today, is_current_flag=TRUE)", "5. Generate new surrogate key for changed records" ], "tracked_attributes": { "dim_product": ["unit_price", "category"], "dim_customer": ["loyalty_tier", "customer_segment"], "dim_promotion": ["discount_value", "is_active"] } }, "scd_type_1_logic": { "implementation": "MERGE statement (upsert)", "process": "Update in place, no history tracking" }, "dimension_defaults": { "unknown_member": "Insert -1 record for unknown/missing foreign keys", "not_applicable": "Insert -2 record for N/A scenarios" } }, "business_rules": { "customer_segmentation": "RFM scoring based on recency, frequency, monetary value", "product_hierarchy": "Standardize category mappings across POS and eCommerce", "date_adjustments": "Convert all timestamps to store local time, then to UTC for storage", "revenue_recognition": "Apply returns and refunds logic, exclude employee transactions" }, "data_quality_integration": { "referential_integrity": "Validate all foreign keys exist in dimensions before fact loading", "business_rule_validation": "Transaction amounts >$0, dates within valid ranges, required fields populated", "cross_source_reconciliation": "Compare POS totals with financial system daily", "anomaly_detection": "Statistical outlier detection on sales amounts, transaction counts" } }, "layer_4_presentation": { "presentation_schema": "ANALYTICS_DB", "purpose": "Denormalized dimensional models optimized for BI queries", "fact_loading": { "strategy": "Incremental append-only for transaction facts", "process": [ "1. Identify new/changed source records since last load (watermark)", "2. Join with current dimension surrogate keys", "3. Apply business transformations and calculations", "4. Validate data quality rules", "5. Load into fact table", "6. Update watermark metadata" ], "idempotency": "All loads are idempotent - can safely re-run without duplicates", "late_arriving_facts": "Accept facts up to 7 days late, backfill with correct dimension keys", "late_arriving_dimensions": "Type 2 SCD handles this - facts join to dimension based on effective date range" }, "aggregate_tables": { "fact_daily_sales_summary": { "refresh": "Daily at 6 AM, incremental for previous day", "full_rebuild": "Monthly for data quality and compaction" }, "materialized_views": [ "mv_sales_by_category_month - Pre-aggregated for dashboard performance", "mv_customer_ltv - Lifetime value calculations cached", "mv_inventory_snapshot - Current inventory state for operations" ] }, "query_optimization": { "clustering_keys": { "fact_sales_transactions": "CLUSTER BY (date_key, store_key)", "fact_customer_behavior": "CLUSTER BY (customer_key, date_key)", "fact_marketing_spend": "CLUSTER BY (date_key, campaign_key)" }, "search_optimization": "Enable on dimension tables for faster point lookups on natural keys", "result_caching": "24-hour cache for identical queries (Snowflake automatic)" } } }, "orchestration_design": { "airflow_dags": { "dag_extract_pos": { "schedule": "*/15 * * * * (every 15 minutes)", "tasks": [ "check_source_availability", "extract_cdc_changes", "load_to_raw", "run_staging_transformations", "data_quality_checks", "update_watermark" ], "sla": "30 minutes", "retries": 3, "alerts": "Email + Slack on failure" }, "dag_extract_shopify": { "schedule": "0 * * * * (hourly)", "similar_structure": "As above, adapted for API extraction" }, "dag_daily_dimension_refresh": { "schedule": "0 2 * * * (2 AM daily)", "tasks": [ "process_dim_product_scd2", "process_dim_customer_scd2", "process_dim_promotion_scd2", "process_dim_store_scd1", "validate_dimension_integrity", "update_dimension_metadata" ], "dependencies": "Waits for all extraction DAGs to complete" }, "dag_daily_fact_refresh": { "schedule": "0 3 * * * (3 AM daily)", "tasks": [ "load_fact_sales_transactions_incremental", "load_fact_customer_behavior_incremental", "load_fact_marketing_spend_daily", "refresh_fact_daily_sales_summary", "cross_source_reconciliation", "update_fact_metadata" ], "dependencies": "Waits for dimension refresh DAG" }, "dag_weekly_full_refresh": { "schedule": "0 4 * * 0 (Sunday 4 AM)", "purpose": "Full rebuild for data quality, handles late-arriving data", "tasks": [ "full_dimension_rebuild", "full_fact_rebuild_last_30_days", "rebuild_aggregate_tables", "comprehensive_data_quality_scan" ] } }, "dbt_project_structure": { "models/staging": "Source-specific staging models (one per source table)", "models/integration": "Conformed dimensions, SCD logic", "models/marts/sales": "Sales dimensional mart", "models/marts/customer": "Customer dimensional mart", "models/marts/marketing": "Marketing dimensional mart", "models/metrics": "Metric definitions for semantic layer", "tests": "dbt tests for data quality (unique, not_null, relationships, custom tests)", "macros": "Reusable SQL for SCD, data quality, transformations" }, "monitoring_and_alerting": { "airflow_monitoring": "Task duration tracking, failure alerts, SLA monitoring", "dbt_monitoring": "Test failures, model build times, freshness checks", "snowflake_monitoring": "Query performance, warehouse utilization, credit consumption", "data_quality_dashboard": "Daily freshness, row counts, test pass rates, anomaly detections", "alert_channels": [ "Email for critical failures", "Slack #data-engineering for all alerts", "PagerDuty for production down scenarios" ] } }, "error_handling_and_recovery": { "principles": [ "Fail fast - detect issues early in pipeline", "Isolate failures - quarantine bad data, continue processing good data", "Automated retry - exponential backoff for transient errors", "Manual intervention - clear runbooks for data engineering team" ], "scenarios": { "source_system_unavailable": "Retry 3 times, skip cycle, alert team, continue next cycle", "data_quality_failure": "Quarantine failing records to error table, load clean records, create incident ticket", "schema_change_in_source": "Halt pipeline, alert team immediately, require manual review before proceeding", "duplicate_records": "Deduplicate using business rules (most recent timestamp wins), log duplicates for investigation", "missing_dimension_keys": "Use 'Unknown' dimension member, log for investigation, backfill when resolved" }, "recovery_procedures": { "backfill_process": "Parameterized Airflow DAGs accept date ranges for historical reload", "point_in_time_recovery": "Snowflake time-travel for accidental deletions or bad loads", "full_rebuild_capability": "Can rebuild entire warehouse from source systems in <24 hours if catastrophic failure" } }, "performance_optimization": { "incremental_loading": "Process only changed records using watermarks (updated_at timestamps, CDC)", "parallel_processing": "Airflow DAGs fan out source extractions, dbt models parallelize automatically", "batch_sizing": "Tune batch sizes for API calls (1000 records/call) and warehouse inserts (10K rows/batch)", "compression": "Snowflake automatic compression, typically 4-6x reduction", "partitioning": "Micro-partitions automatically managed by Snowflake based on clustering keys" } }, "governance_framework": { "pci_compliance_implementation": { "scope": "Payment card data in POS transactions requires PCI DSS compliance", "data_classification": { "cardholder_data": [ "Primary Account Number (PAN) - NOT STORED (tokenized at source)", "Cardholder name - ENCRYPTED", "Payment method details - MASKED for non-PCI roles" ], "sensitive_authentication_data": "NEVER STORED (CVV, PIN, magnetic stripe data)" }, "encryption_at_rest": { "implementation": "Snowflake automatic encryption (AES-256) for all data", "key_management": "Snowflake-managed keys with periodic automatic rotation", "future_enhancement": "Customer-managed keys (Tri-Secret Secure) for additional control" }, "encryption_in_transit": { "implementation": "TLS 1.2+ for all connections to Snowflake", "certificate_validation": "Enforce certificate validation in all connectors" }, "tokenization": { "implementation": "Payment processor tokenizes PANs at point of capture", "storage": "Only tokens stored in data warehouse, never full PANs", "detokenization": "Only authorized payment reconciliation users can detokenize via secure API" }, "data_masking": { "dynamic_masking_policies": { "mask_payment_card": "Show last 4 digits only (****-****-****-1234) for non-PCI roles", "mask_customer_email": "Show first character and domain only (j***@example.com)", "mask_customer_name": "Show initials only for analysts (J.D.)" }, "role_based_unmasking": "Only PCI_COMPLIANCE_ROLE and FINANCE_ADMIN_ROLE see unmasked data" }, "network_segmentation": { "snowflake_network_policy": "Restrict connections to corporate VPN and approved BI tools only", "ip_whitelist": "Airflow orchestration IPs, Tableau server IP, VPN egress IPs", "future": "PrivateLink for dedicated network connection to Snowflake" }, "retention_and_disposal": { "pci_data_retention": "7 years per payment industry requirements", "automated_purge": "After 7 years, secure deletion with cryptographic shredding", "audit_logging": "Retain audit logs for 10 years for compliance evidence" } }, "access_control_by_department": { "role_based_access_control": { "principle": "Least privilege - users get minimum access needed for job function", "role_hierarchy": { "EXECUTIVE_ROLE": { "access": "All marts, all metrics, masked PII", "granted_to": "C-level executives, VPs", "restrictions": "Cannot see unmasked PII/PCI data" }, "FINANCE_ROLE": { "access": "Sales mart (full), customer mart (masked PII), marketing mart (read-only)", "granted_to": "Finance team, FP&A analysts", "restrictions": "No access to customer PII details" }, "MARKETING_ROLE": { "access": "Marketing mart (full), customer mart (aggregates only), sales mart (summary level)", "granted_to": "Marketing team, demand generation", "restrictions": "Cannot see individual transaction details or customer PII" }, "SALES_OPS_ROLE": { "access": "Sales mart (full), customer mart (full), store operations data", "granted_to": "Sales operations, regional managers", "restrictions": "No access to marketing spend details" }, "ANALYST_ROLE": { "access": "All marts (read-only), masked PII, pre-built dashboards", "granted_to": "Business analysts across departments", "restrictions": "Cannot create schemas or modify data" }, "DATA_ENGINEER_ROLE": { "access": "All databases (full), administrative functions", "granted_to": "Data engineering team", "restrictions": "Production changes require peer review" }, "PCI_COMPLIANCE_ROLE": { "access": "Unmasked payment data, audit logs, compliance reports", "granted_to": "Compliance officer, auditors (temporary access only)", "restrictions": "All access logged and reviewed quarterly" }, "TABLEAU_SERVICE_ACCOUNT": { "access": "Read-only on presentation layer, uses ANALYST_ROLE privileges", "security": "Service account with rotated credentials, restricted to Tableau server IP" } }, "role_assignment_process": { "request": "Submit access request ticket with business justification", "approval": "Manager approval + data governance approval", "provisioning": "Data engineering grants role, documented in access registry", "review": "Quarterly access review, annual re-certification for PCI roles" } }, "row_level_security": { "regional_segmentation": { "implementation": "Row access policies based on user's assigned region", "example": "REGIONAL_MANAGER_ROLE only sees stores in their region", "table": "Secure views with WHERE region = CURRENT_USER_REGION()" }, "customer_segmentation": { "implementation": "Data science team sees anonymized customer_key only", "marketing_team": "Can see aggregate customer segments, not individual customers" } }, "column_level_security": { "implementation": "Masking policies attached to sensitive columns", "sensitive_columns": [ "customer_email (EMAIL_MASK_POLICY)", "customer_name (NAME_MASK_POLICY)", "payment_method_details (PAYMENT_MASK_POLICY)", "customer_phone (PHONE_MASK_POLICY)" ], "policy_attachment": "Policies follow columns even in views and clones" } }, "data_lineage_tracking": { "tooling": "Snowflake Object Dependencies + dbt docs + Monte Carlo Data Observability", "lineage_scope": { "source_to_target": "Full lineage from source systems → raw → staging → integration → presentation", "column_level_lineage": "Track which source columns feed which target columns through transformations", "bi_lineage": "Extend to Tableau dashboards showing which reports use which tables/columns" }, "implementation": { "dbt_lineage": { "dbt_docs": "Automatically generates lineage DAG from dbt models", "access": "Hosted on internal portal, searchable by table/column name", "documentation": "Inline dbt model documentation explains business logic" }, "snowflake_metadata": { "access_history": "Query SNOWFLAKE.ACCOUNT_USAGE.ACCESS_HISTORY for usage patterns", "object_dependencies": "View direct dependencies between objects", "query_history": "Analyze which queries touch sensitive data" }, "monte_carlo": { "data_catalog": "Automated discovery and cataloging of all data assets", "impact_analysis": "Identify downstream dependencies before schema changes", "lineage_visualization": "Interactive lineage graph from source to BI" } }, "use_cases": { "impact_analysis": "Before changing a source schema, identify all downstream impacts", "root_cause_analysis": "When data quality issue found, trace back to source", "compliance_reporting": "Demonstrate data flow for PCI audit", "deprecation_planning": "Identify unused tables/columns for cleanup" } }, "data_quality_framework": { "quality_dimensions": [ "Accuracy - data reflects reality", "Completeness - no missing required data", "Consistency - data agrees across sources", "Timeliness - data is fresh and current", "Validity - data conforms to business rules", "Uniqueness - no unintended duplicates" ], "implementation_layers": { "source_quality_checks": { "location": "Raw layer immediately after extraction", "checks": [ "Schema validation - columns match expected structure", "Row count thresholds - alert if anomalous", "Null checks on primary keys", "Date range validation - no future dates for historical data" ], "action": "Quarantine and alert, do not stop pipeline" }, "staging_quality_checks": { "location": "Staging layer after basic transformations", "checks": [ "Data type validation - numeric fields are numeric", "Referential integrity - foreign keys exist", "Business rule validation - amounts > 0, valid state codes", "Duplicate detection - same natural key multiple times" ], "action": "Fail pipeline if critical, warn if minor" }, "integration_quality_checks": { "location": "Integration layer before dimension loading", "checks": [ "Dimension SCD logic correctness - no overlapping date ranges", "Conformed dimension consistency - customer records match across sources", "Master data quality - product hierarchy is valid", "Cross-source reconciliation - POS totals match GL totals" ], "action": "Halt pipeline, require manual investigation" }, "presentation_quality_checks": { "location": "Presentation layer after fact loading", "checks": [ "Fact-dimension relationship integrity - all FKs exist", "Aggregate reconciliation - fact totals match summary tables", "Historical consistency - yesterday's data unchanged", "Business KPI validation - revenue within expected range" ], "action": "Alert business users, create incident" } }, "dbt_testing_strategy": { "generic_tests": [ "unique - primary keys and natural keys", "not_null - required fields", "relationships - foreign key integrity", "accepted_values - categorical fields" ], "custom_tests": [ "test_sales_amount_positive - revenue > 0", "test_transaction_date_range - dates within valid range", "test_customer_email_format - valid email format", "test_daily_row_count_anomaly - statistical outlier detection" ], "test_execution": "Every dbt run, failures break pipeline", "test_documentation": "Each test documents what it checks and why it matters" }, "monitoring_and_alerting": { "monte_carlo_monitoring": { "freshness_checks": "Alert if tables not updated within expected SLA", "volume_anomalies": "ML-based detection of unusual row count changes", "schema_changes": "Alert on schema drift in source systems", "distribution_shifts": "Detect changes in data distributions (e.g., sudden spike in returns)" }, "custom_sql_monitors": { "daily_reconciliation": "Compare POS revenue to GL revenue", "customer_count_stability": "Alert if customer count drops >5%", "inventory_accuracy": "Compare warehouse inventory to POS inventory", "marketing_attribution": "Verify attributed revenue <= total revenue" }, "dashboard": "Executive data quality dashboard showing test pass rates, freshness, anomalies", "alerting": "Slack #data-quality channel, email for critical issues" } }, "audit_logging_and_compliance": { "snowflake_audit_logging": { "access_history": "Log all queries, who ran them, what data accessed, when", "retention": "10 years in SNOWFLAKE.ACCOUNT_USAGE views", "monitoring": "Weekly report of PCI data access for compliance review" }, "sensitive_data_access_tracking": { "implementation": "Query ACCESS_HISTORY for any access to PCI-flagged tables/columns", "alerting": "Real-time alert if unmasked PCI data accessed by non-authorized role", "review_process": "Quarterly review of all PCI data access with compliance officer" }, "change_tracking": { "dbt_version_control": "All transformation code in Git, peer-reviewed PRs", "airflow_dag_history": "All DAG changes version controlled", "schema_change_log": "Document all DDL changes with business justification", "access_change_log": "Audit trail of all role grants/revokes" }, "compliance_reporting": { "pci_dss_reports": [ "Quarterly: Access log review for PCI data", "Quarterly: Data encryption verification", "Annually: Full compliance assessment with external auditor" ], "internal_reports": [ "Monthly: Data quality scorecard", "Monthly: Pipeline SLA performance", "Quarterly: Access review and recertification" ] } }, "metadata_management": { "data_catalog": { "tool": "Monte Carlo Data Catalog + dbt docs", "contents": { "technical_metadata": "Tables, columns, data types, relationships, lineage", "business_metadata": "Business definitions, owners, data stewards, usage notes", "operational_metadata": "Load frequencies, SLAs, dependencies, freshness" }, "governance": "Data stewards responsible for maintaining business metadata" }, "documentation_standards": { "dimension_documentation": "Business definition, source system, SCD type, key attributes", "fact_documentation": "Business process, grain, measures, calculation logic", "metric_documentation": "Business definition, formula, ownership, approved use cases", "dashboard_documentation": "Purpose, audience, refresh schedule, data sources" }, "data_stewardship": { "roles": { "executive_sponsor": "CFO - overall data program accountability", "data_governance_lead": "Director of BI - policies and standards", "domain_stewards": [ "Sales domain - VP Sales", "Customer domain - VP Customer Success", "Marketing domain - CMO", "Finance domain - Controller" ], "technical_stewards": "Data engineering team - implementation and tooling" }, "responsibilities": [ "Define and maintain business glossary", "Approve access requests for their domain", "Ensure data quality in their domain", "Review and approve schema changes", "Participate in quarterly governance meetings" ] } }, "security_best_practices": { "principle_of_least_privilege": "Users get minimum access needed, no more", "separation_of_duties": "Developers cannot approve their own production changes", "defense_in_depth": "Multiple layers of security (network, authentication, authorization, encryption)", "regular_access_reviews": "Quarterly review of all user access, annual for PCI", "secure_credential_management": "All service account credentials in AWS Secrets Manager with rotation", "mfa_enforcement": "Multi-factor authentication required for all users", "session_timeouts": "Idle sessions timeout after 30 minutes", "security_training": "Annual security awareness training for all users with data access" } }, "bi_enablement_and_adoption": { "tableau_integration": { "connection": "Snowflake Partner Connect or native Snowflake connector", "authentication": "OAuth for individual users, service account for published data sources", "data_sources": { "certified_data_sources": [ "Sales Analysis - points to ANALYTICS_DB.sales_mart", "Customer Insights - points to ANALYTICS_DB.customer_mart", "Marketing Performance - points to ANALYTICS_DB.marketing_mart" ], "governance": "Only certified data sources approved for production dashboards" }, "performance_optimization": { "live_connection": "Use Snowflake compute, leverage query result caching", "extracts": "Avoid extracts, use live connections for fresh data", "aggregate_tables": "Point to pre-aggregated tables for large dashboards", "custom_sql": "Minimize custom SQL, use published data sources where possible" } }, "semantic_layer": { "implementation": "dbt Semantic Layer (dbt metrics)", "benefits": [ "Single source of truth for metric definitions", "Consistent calculations across all BI tools", "Version controlled metric logic in dbt", "Reduces burden on analysts to calculate metrics correctly" ], "key_metrics": { "revenue_metrics": [ "Total Revenue", "Revenue Growth %", "Average Transaction Value", "Revenue per Customer" ], "customer_metrics": [ "Customer Lifetime Value (LTV)", "Customer Acquisition Cost (CAC)", "Churn Rate", "Active Customers" ], "marketing_metrics": [ "Return on Ad Spend (ROAS)", "Cost per Acquisition (CPA)", "Marketing Attributed Revenue", "Campaign ROI" ], "operational_metrics": [ "Inventory Turn", "Sales per Square Foot", "Transactions per Hour", "Average Basket Size" ] }, "governance": "Metrics owned by domain stewards, changes require approval" }, "pre_built_dashboards": { "executive_dashboard": { "audience": "C-level, VPs", "refresh": "Real-time (live connection)", "content": "KPI scorecards, trend charts, exception reports", "security": "Restricted to EXECUTIVE_ROLE" }, "sales_performance_dashboard": { "audience": "Sales ops, regional managers", "refresh": "Hourly", "content": "Sales by store/region/product, YoY comparisons, targets vs actuals", "security": "SALES_OPS_ROLE, row-level security by region" }, "customer_segmentation_dashboard": { "audience": "Marketing, customer success", "refresh": "Daily", "content": "RFM segments, cohort analysis, churn predictions, LTV", "security": "MARKETING_ROLE, masked PII" }, "marketing_roi_dashboard": { "audience": "Marketing team, CMO", "refresh": "Daily", "content": "Campaign performance, channel effectiveness, attribution analysis, spend vs revenue", "security": "MARKETING_ROLE" } }, "self_service_enablement": { "analyst_training": { "onboarding": "2-day data warehouse training for new analysts", "curriculum": [ "Dimensional modeling concepts", "Understanding star schemas", "Navigating the data catalog", "Using certified Tableau data sources", "Best practices for performant queries", "Data governance and security policies" ], "ongoing": "Monthly office hours, quarterly advanced training sessions" }, "documentation": { "data_dictionary": "Business definitions for all tables and columns", "dashboard_user_guides": "How to use each pre-built dashboard", "faq": "Common questions and troubleshooting", "video_tutorials": "Short videos on common analysis tasks" }, "support_model": { "tier_1": "Slack #data-questions channel for quick questions", "tier_2": "Email data-support for complex requests", "tier_3": "Data engineering for technical issues or new data source requests" }, "feedback_loop": { "quarterly_user_survey": "Gather feedback on data quality, dashboard usefulness, support quality", "dashboard_usage_analytics": "Track which dashboards are used, by whom, how often", "feature_requests": "Prioritize backlog based on user needs and business value" } }, "data_literacy_program": { "goal": "Build data-driven culture across the organization", "initiatives": [ "Lunch-and-learn sessions on key metrics and how to interpret them", "Data champion program - power users in each department", "Executive dashboards in every leadership meeting", "Celebrate wins driven by data insights" ] } }, "implementation_roadmap": { "phase_1_foundation": { "duration": "Weeks 1-6", "deliverables": [ "Snowflake account setup with security policies", "Network access policies and role hierarchy configured", "RAW_DB established with source extractions for POS and Shopify", "Airflow environment setup with basic extraction DAGs", "dbt project initialized with staging models", "Initial data quality checks implemented" ], "success_criteria": "Data flowing from source systems to RAW layer with quality checks" }, "phase_2_dimensional_models": { "duration": "Weeks 7-12", "deliverables": [ "Sales mart dimensional model (facts and dimensions) implemented", "SCD Type 2 logic for key dimensions (product, customer)", "Integration layer transformations in dbt", "Presentation layer with optimized fact/dimension tables", "Data quality tests in dbt covering all tables", "Lineage documentation in dbt docs" ], "success_criteria": "Sales mart queryable with accurate historical data" }, "phase_3_bi_integration": { "duration": "Weeks 13-16", "deliverables": [ "Tableau connected to Snowflake with certified data sources", "3 pre-built dashboards (executive, sales performance, customer segmentation)", "User training completed for pilot group", "Documentation and user guides published", "Semantic layer (dbt metrics) for key KPIs" ], "success_criteria": "Business users running reports and dashboards independently" }, "phase_4_marketing_and_governance": { "duration": "Weeks 17-22", "deliverables": [ "Marketing mart dimensional model implemented", "Marketing spend CSV ingestion automated", "Full PCI compliance implementation and audit", "Access controls and masking policies fully enforced", "Monte Carlo data observability deployed", "Quarterly governance processes established" ], "success_criteria": "Marketing ROI reports available, PCI audit passed" }, "phase_5_optimization_and_scale": { "duration": "Weeks 23+", "deliverables": [ "Query performance tuning based on usage patterns", "Additional aggregate tables for dashboard performance", "Advanced analytics enablement (data science workloads)", "Self-service data exploration tools", "Continuous improvement based on user feedback" ], "success_criteria": "Warehouse scales to support growing user base and data volumes" } }, "success_metrics_and_kpis": { "technical_kpis": { "pipeline_reliability": "99.5% uptime for daily refreshes", "data_freshness": "Sales data <1 hour old, marketing data <24 hours old", "query_performance": "95th percentile query <10 seconds for dashboards", "data_quality": "95%+ test pass rate on dbt tests", "warehouse_cost": "Monthly cost within budget ($X per TB data stored, $Y compute credits)" }, "business_kpis": { "user_adoption": "75% of analysts using warehouse within 6 months", "dashboard_usage": "Pre-built dashboards used daily by 80% of target audience", "self_service": "50% reduction in ad-hoc data requests to data engineering", "decision_speed": "Reduce time to generate monthly reports from 5 days to 1 day", "insights_value": "Document 5+ business decisions informed by warehouse data in first year" } } } ``` This comprehensive data warehouse architecture provides a scalable, governed, and performant analytics platform for your retail organization. The design prioritizes: 1. **PCI Compliance** - Encryption, masking, access controls, and audit logging throughout 2. **Dimensional Modeling Best Practices** - Star schemas optimized for analytical queries 3. **Reliable ELT Pipelines** - Robust orchestration with error handling and monitoring 4. **Role-Based Access Control** - Department-specific access with data masking 5. **Data Quality** - Multi-layer validation from source to presentation 6. **Lineage Tracking** - Full visibility from source systems through to BI dashboards 7. **BI Enablement** - Tableau integration with semantic layer and pre-built dashboards The phased implementation approach allows you to deliver value incrementally while building a solid foundation for future growth.
🌀 Claude

Compressed Data Warehouse Architect

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
CLAUDE-4-5-SONNET
Tested icon
Guide icon
4 examples icon
Free credits icon
Token-compressed system designing data warehouses through schema modeling, ETL pipelines, query optimization, data governance, and BI integration enabling centralized analytics, reporting, and data-driven decision-making at scale.
...more
Added over 1 month ago
Report
Browse Marketplace