Interview a data engineer by probing how they design reliable ELT pipelines, model data in the warehouse, and guarantee data quality. Move from concrete SQL and dbt decisions to orchestration, cost optimization, and incident handling, so you can assess whether they build trustworthy, observable, and cost-aware data platforms rather than brittle one-off scripts.
Run this interview as a mix of design discussion and hands-on probing rather than algorithm puzzles. Ask candidates to walk through a pipeline they actually shipped, then push on the trade-offs they made around modeling, testing, and cost. Strong candidates think in terms of idempotency, lineage, data contracts, and observability, and can explain failures and what they changed afterward.
Walk me through how you would design an ELT pipeline that ingests data from a transactional database, a third-party API, and event streams into a cloud warehouse.
What to look for: Distinguishes extraction/load from transformation, mentions tools like Fivetran/Airbyte or Kafka for ingestion, dbt for transformation, and reasons about incremental loads, idempotency, and handling schema drift from each source.
How do you decide between a star schema, one-big-table, and a medallion (bronze/silver/gold) layout for a given dataset?
What to look for: Ties the choice to query patterns, BI tool behavior, warehouse cost, and consumer needs rather than dogma; explains where denormalization helps and where conformed dimensions matter.
How do you write a dbt model so that it runs incrementally and stays correct when late-arriving or updated records show up?
What to look for: References incremental materialization, a unique key, merge/upsert strategy, lookback windows for late data, and dbt tests on uniqueness and freshness to catch regressions.
A nightly pipeline's warehouse costs have doubled. How do you investigate and bring them down?
What to look for: Profiles expensive queries, looks at clustering/partitioning, warehouse sizing and auto-suspend, materialization choices, scanned bytes, and avoiding full refreshes where incremental works.
How do you instrument a pipeline so you know about bad data before your analysts do?
What to look for: Covers dbt tests, Great Expectations or equivalent, freshness and volume checks, anomaly alerting, and observability tooling like Monte Carlo, plus where alerts route and who owns them.
Explain how you guarantee a pipeline is idempotent and safe to re-run after a mid-run failure.
What to look for: Discusses deterministic transformations, atomic swaps or staging tables, deduplication on natural keys, and orchestration retries that don't double-load data.
Tell me about a data quality incident you were responsible for. How did you detect it, fix it, and prevent a recurrence?
What to look for: Honest ownership, root-cause analysis, a concrete prevention (test, contract, alert) added afterward, and clear communication to downstream consumers.
Describe onboarding a new data source where the source schema kept changing underneath you.
What to look for: Shows working with source-system owners, building tolerant ingestion, schema-change detection, and contracts or alerting so silent breakages surface early.
Tell me about a time you migrated or re-architected a pipeline or warehouse. What drove it and how did you de-risk the cutover?
What to look for: Describes the motivation (cost, scale, reliability), backfill and parallel-run strategy, validation against the old system, and rollback planning.
Describe a time you partnered with a data scientist to productionize a model. What was your part?
What to look for: Explains building reliable feature/serving pipelines, scheduling, monitoring for drift or staleness, and a clear handoff contract rather than a thrown-over-the-wall script.
Give an example of toil you automated away in your data work.
What to look for: Identifies a repetitive manual task, the automation built, and measurable time saved or errors reduced, showing a bias toward leverage.
An analyst reports that a dashboard's numbers are wrong but the pipeline shows green. How do you triage?
What to look for: Reproduces against source-of-truth, checks lineage from dashboard back through models, validates tests actually cover the affected logic, and questions whether 'green' means 'correct.'
You need to backfill two years of history into a new dbt model without breaking nightly runs or blowing the budget. What's your plan?
What to look for: Chunked or partitioned backfill, off-peak scheduling, separate warehouse sizing, validation of row counts and aggregates, and keeping the incremental run untouched during backfill.
Stakeholders want a metric updated every five minutes, but the current pipeline is nightly batch. How do you respond?
What to look for: Probes the real business need, weighs streaming/micro-batch against complexity and cost, and proposes the simplest architecture that meets the actual freshness requirement.
How would you design data governance and access controls so analysts get what they need without exposing sensitive PII?
What to look for: Mentions role-based access, masking or row/column-level security, separating raw from curated layers, and least-privilege IAM in the warehouse.
An upstream team plans a breaking schema change next sprint. What do you do?
What to look for: Establishes a data contract or versioning, sets up alerting on the change, plans a compatibility window, and coordinates the migration with consumers rather than reacting after breakage.
How do you document data lineage and model definitions so analysts and scientists can self-serve?
What to look for: Uses dbt docs, a data catalog, clear model descriptions and ownership, and treats documentation as part of delivery rather than an afterthought.
How do you handle a disagreement with an analyst about how a metric should be defined in the warehouse?
What to look for: Seeks a single source of truth, drives toward a documented, tested definition, and involves the right business owner rather than maintaining two conflicting versions.
How do you keep data engineering work visible and prioritized when it's mostly invisible plumbing?
What to look for: Frames work around consumer impact and reliability, communicates SLAs and incidents clearly, and partners with stakeholders on roadmap trade-offs.
What's your approach to code review and standards on a data team?
What to look for: Values reviewing SQL/dbt for correctness, tests, and maintainability, shared style and modeling conventions, and CI that runs tests before merges.
Get a personalized walkthrough of Pitch N Hire on your own roles and workflow. No slides, no obligation.
Prefer to talk? Book a demo · View pricing
Free 1-user plan · No credit card · Talk to a real hiring expert
See how Pitch N Hire automates sourcing, screening and AI interviews on your real roles. Start with your work email — no credit card.
★ Free 1-user plan · No spam · Talk to a real hiring expert