Data Engineer35 prompts5 categoriesBeginner → Advanced30 prompts · 4 chains · 1 templates

Data Engineer AI Prompts

Data Engineer AI prompt library with 35 prompts in 5 categories. Copy templates for real workflows in analysis, modeling, and reporting. Browse 5 categories and copy prompts you can use as-is or adapt to your stack.

Browse Data Engineer prompt categories

5 categories

Pipeline Design

AI prompts for data pipeline design, ETL and ELT processes, orchestration, ingestion, and end-to-end data workflows.

10 promptsBackfill StrategyDAG Design for Airflow

→

Data Quality

AI prompts for data quality checks, validation rules, audits, constraints, and monitoring data reliability in production systems.

8 promptsData Lineage TrackingData Quality Framework Chain

→

Data Warehouse Patterns

AI prompts for data warehouse design, dimensional modeling, data marts, slowly changing dimensions, and analytical architecture.

8 promptsData Vault DesignFact Table Loading Pattern

→

Data Contracts

AI prompts for data contracts, schema validation, data expectations, interface definitions, and upstream-downstream reliability.

5 promptsBreaking Change MigrationContract Validation Pipeline

→

Infrastructure and Platform

AI prompts for data and ML infrastructure, platform architecture, orchestration, scalability, and system reliability design.

4 promptsCompute Sizing GuideData Lake File Format Selection

→

Advanced search and filtering

Browse all prompts in this role with category, skill-level, type, and text filtering.

Showing 35 of 35 prompts

Pipeline Design

10 prompts

Pipeline DesignAdvancedPrompt

Backfill Strategy

Design a safe, efficient backfill strategy for re-processing historical data in this pipeline. Pipeline: {{pipeline_description}} Data range to backfill: {{date_range}} Estimated data volume: {{volume}} Downstream dependencies: {{downstream_tables}} 1. Backfill isolation: - Never write backfill output to the production table directly during processing - Write to a staging table or partition-isolated location first - Swap into production atomically after validation 2. Partitioned backfill approach: - Process one date partition at a time to limit blast radius - Use a date loop: for each date in the range, submit an independent job - Parallelism: how many partitions can safely run in parallel without overloading the source system or cluster? - Checkpoint completed partitions: re-running the backfill skips already-completed dates 3. Source system protection: - Throttle extraction queries to avoid overwhelming the source (use LIMIT/offset pagination or time-boxed micro-batches) - Schedule backfill during low-traffic hours if source is OLTP - Use read replicas if available 4. Downstream impact management: - Notify downstream consumers before starting the backfill - If downstream tables are materialized from this table, suspend their refresh until backfill is complete - After backfill: re-run downstream tables in dependency order 5. Validation before cutover: - Row count: does the backfilled output match expected counts? - Key uniqueness: no duplicate primary keys in the output - Metric spot check: compare aggregated metrics for a sample of dates to the source system 6. Rollback plan: - If validation fails: what is the procedure to restore the previous state? Return: backfill execution script, validation checks, downstream notification template, and rollback procedure.

Browse Data Engineer prompt categories

Pipeline Design

Data Quality

Data Warehouse Patterns

Data Contracts

Infrastructure and Platform

Advanced search and filtering

Pipeline Design

Data Quality

Data Warehouse Patterns

Data Contracts

Infrastructure and Platform

Other AI prompt roles