PromptBase
Upgrade
Close icon
General
Home
Marketplace
Create
Hire
Login
Chat
Sell
Explore

Prompt Details

Model
Chat - (gpt-5.2)
Token size
301
Example input
[DATA_SOURCE]: AWS S3 (JSON remittance logs, hourly) + Kafka (CDC from PostgreSQL) [TARGET_SYSTEM]: Snowflake finance-grade analytics warehouse [KEY_TRANSFORMS]: Currency normalization, sanctions enrichment, PII tokenization, deduplication, SCD Type 2 [SLA_AND_QA]: 99.5% availability, <10-min latency, 0.1% error tolerance, automated validation, 24-hr SLA
Example output
Below is an implementation-grade specification aligned to 99.5% monthly availability, <10-minute settlement latency, ≤0.1% reconciliation error tolerance, automated Great Expectations validation, and 24-hour incident resolution SLA. --- 1) YAML Pipeline Specification pipeline: name: remittance_analytics_pipeline version: 1.0 owner: data_platform availability_target: 99.5_percent_monthly settlement_latency_sla: "<10_minutes" reconciliation_error_tolerance: "0.1_percent" sources: s3_batch: type: aws_s3 format: json schedule: hourly path: s3://remittance-logs/YYYY/MM/DD/HH/ schema_registry: internal_glue_catalog expected_volume_per_hour: 50k-200k_records kafka_cdc: type: kafka topic: payments.cdc.postgres connector: debezium_postgres delivery_semantics: at_least_once expected_tps: 200-500 staging_layer: storage: snowflake_raw tables: - raw_remittance_logs - raw_payments_cdc ingestion_mode: s3_batch: snowpipe_auto_ingest kafka_cdc: snowflake_kafka_connector canonical_model: dimensions: dim_customer: surrogate_key: customer_sk natural_key: customer_id scd_type: 2 effective_from: timestamp effective_to: timestamp current_flag: boolean dim_currency: surrogate_key: currency_sk natural_key: currency_code dim_sanctions_status: surrogate_key: sanctions_sk attributes: [match_score, list_name, screened_at] facts: fact_transaction: grain: one_row_per_unique_transaction_event composite_dedupe_key: - transaction_id - event_type - event_timestamp measures: - amount_original - amount_usd - fx_rate_applied - fee_amount fks: - customer_sk - currency_sk - sanctions_sk transformations: currency_normalization: priority_order: - use_fx_rate_from_transaction_if_present - else_lookup_internal_daily_fx_table - else_flag_for_manual_review rule: "amount_usd = amount_original * fx_rate" rounding: 2_decimal_bankers_rounding pii_tokenization: method: sha256_with_salt fields: [customer_name, email, phone, id_number] token_store: vault_hsm_backed sanctions_enrichment: provider: ofac_un_consolidated_list match_logic: fuzzy_name_match_threshold_0.85 duplicate_suppression: strategy: merge_on_composite_key data_quality: framework: great_expectations checks: - non_null: [transaction_id, amount_original, currency_code] - range_check: amount_original > 0 - uniqueness: composite_dedupe_key - fx_rate_reasonable_range: 0.0001 - 1000 cdc_strategy: snapshot_mode: initial_snapshot_then_incremental watermark_column: updated_at backfill: strategy: time_range_replay_from_s3_archives idempotent_merge: true schema_evolution: approach: additive_only_non_breaking contract_tests: required_before_deploy backward_compatibility_window: 30_days idempotency: guarantee: merge_into_with_deterministic_hash_key hash_key: sha256(transaction_id||event_type||event_timestamp) observability: metrics: - ingestion_lag_seconds - duplicate_rate_percent - fx_missing_rate_percent - dq_failure_rate alerting: pagerduty_on_threshold_breach security: encryption: at_rest: snowflake_tde in_transit: tls_1_2_plus access_control: role_based_access_control pii_access: masked_views retention_policy: raw: 90_days curated: 7_years_finance_requirement fault_tolerance: retry_policy: exponential_backoff_5_retries dlq_topic: payments_dead_letter replay_supported: true --- 2) CSV-Ready ETL Mapping Table (Sample Rows) source_system,source_field,target_table,target_field,transformation_rule,example_source,example_target S3_JSON,txn_id,fact_transaction,transaction_id,Direct Map,"TX123","TX123" S3_JSON,amount,fact_transaction,amount_original,Cast Decimal(18,4),"1500.50","1500.50" S3_JSON,currency,fact_transaction,currency_sk,Lookup dim_currency,"NGN","23" S3_JSON,amount,fact_transaction,amount_usd,"amount_original * fx_rate","1500.50 NGN","1.92" Kafka_CDC,customer_email,dim_customer,email_token,"SHA256(email+salt)","user@email.com","A94A8FE5..." Kafka_CDC,updated_at,dim_customer,effective_from,Direct Map,"2026-02-28T09:00:00Z","2026-02-28T09:00:00Z" Kafka_CDC,name,dim_sanctions_status,match_score,FuzzyMatch(),"Ali Musa","0.91" S3_JSON,txn_id,fact_transaction,hash_key,"SHA256(txn_id||event_type||event_ts)","TX123|SETTLED|ts","8C9F..." --- 3) Runnable Runbook A. Snowflake Table DDL (Canonical Layer) CREATE TABLE dim_customer ( customer_sk NUMBER AUTOINCREMENT, customer_id STRING, email_token STRING, effective_from TIMESTAMP, effective_to TIMESTAMP, current_flag BOOLEAN ); CREATE TABLE fact_transaction ( transaction_id STRING, event_type STRING, event_timestamp TIMESTAMP, amount_original NUMBER(18,4), amount_usd NUMBER(18,4), fx_rate_applied NUMBER(18,6), hash_key STRING, customer_sk NUMBER ); B. Idempotent MERGE (Duplicate Suppression) MERGE INTO fact_transaction tgt USING staging_transactions src ON tgt.hash_key = src.hash_key WHEN NOT MATCHED THEN INSERT (...columns...) VALUES (...values...); C. Great Expectations CLI great_expectations checkpoint run remittance_checkpoint D. Backfill Replay aws s3 cp s3://remittance-logs/2026/01/01/ ./ --recursive snowflake ingest --stage raw_stage --pattern ".*2026/01/01.*" E. Performance Benchmark Target 500 TPS sustained streaming ingest <2 min raw-to-curated transform ≤0.05% duplicate rate DQ runtime <5 min per batch --- 4) Executive Summary Architecture Summary Hybrid batch (S3 hourly) + streaming (Kafka CDC) pipeline into Snowflake using deterministic merge logic and canonical dimensional modeling. Estimated Timeline Design & Contracts: 3 weeks Build & Integrations: 6–8 weeks Hardening & Compliance Validation: 4 weeks Total: ~12–15 weeks Estimated Monthly Cost (Rough Order) Snowflake: $8k–$20k (compute + storage, mid-volume) Kafka infra: $3k–$8k S3 + Data transfer: $1k–$3k Sanctions API: $2k–$10k depending on provider Total: $14k–$41k/month Commercial Risks FX rate dependency inconsistencies Sanctions false positives (operational overhead) CDC schema drift breaking ingestion Network constraints increasing lag --- Acceptance Criteria 1. End-to-end latency <10 min P95 2. 99.5% uptime monthly 3. Reconciliation variance ≤0.1% 4. 100% PII tokenized before curated layer 5. DQ failure rate <1% --- Automated Tests Unit Hash key deterministic across runs FX conversion correct to 2 decimal places Tokenization irreversible Integration CDC replay produces no duplicates Backfill merge does not override current_flag incorrectly Sanctions enrichment logs match_score ≥ threshold --- Security & Compliance Checklist TLS enforced end-to-end AES-256 encryption at rest Role-based access + least privilege Column masking for PII Key rotation every 90 days Audit logs immutable (7 years retention) Data residency confirmed --- Operations Implementation Checklist [ ] Create Snowflake roles & warehouses [ ] Configure Kafka connector with DLQ [ ] Deploy Great Expectations suites [ ] Configure PagerDuty alerts [ ] Run synthetic load test [ ] Document replay playbook [ ] Validate reconciliation report --- Realistic Limitations 99.5% availability excludes upstream outages Fuzzy sanctions matching can create false positives Near-real-time latency dependent on Snowflake warehouse sizing Backfill reprocessing may temporarily breach SLA --- If desired, I can next produce: Full ER diagram (logical + physical) Detailed SCD Type 2 merge SQL Detailed CDC event envelope schema Cost-optimized variant for constrained African network environments Terraform IaC blueprint
🤖 GPT

Etl Schema Stabilizer

Add to Cart
Instant accessInstant access
Usage rightsCommercial use
Money-back guaranteeMoney‑back
By purchasing this prompt, you agree to our terms of service
GPT-5.2
Tested icon
Guide icon
4 examples icon
Free credits icon
Built for engineers and product teams, this prompt produces an implementation-ready data pipeline specification that shortens design cycles and reduces integration risk. It tackles ambiguous mappings, schema drift, brittle backfills, and compliance gaps by generating ER/field-level mappings, deterministic transforms, CDC/backfill strategies, and repair playbooks. Ideal for migrations, analytics platforms, regulatory compliance projects, and vendor evaluations where reproducible specs, contract..
...more
Added over 1 month ago
Report
Browse Marketplace