20 Interview Questions

Interview Questions for a Data Engineer

Interview a data engineer by probing how they design reliable ELT pipelines, model data in the warehouse, and guarantee data quality. Move from concrete SQL and dbt decisions to orchestration, cost optimization, and incident handling, so you can assess whether they build trustworthy, observable, and cost-aware data platforms rather than brittle one-off scripts.

Run this interview as a mix of design discussion and hands-on probing rather than algorithm puzzles. Ask candidates to walk through a pipeline they actually shipped, then push on the trade-offs they made around modeling, testing, and cost. Strong candidates think in terms of idempotency, lineage, data contracts, and observability, and can explain failures and what they changed afterward.

Technical & Role-Specific

Walk me through how you would design an ELT pipeline that ingests data from a transactional database, a third-party API, and event streams into a cloud warehouse.

What to look for: Distinguishes extraction/load from transformation, mentions tools like Fivetran/Airbyte or Kafka for ingestion, dbt for transformation, and reasons about incremental loads, idempotency, and handling schema drift from each source.

How do you decide between a star schema, one-big-table, and a medallion (bronze/silver/gold) layout for a given dataset?

What to look for: Ties the choice to query patterns, BI tool behavior, warehouse cost, and consumer needs rather than dogma; explains where denormalization helps and where conformed dimensions matter.

How do you write a dbt model so that it runs incrementally and stays correct when late-arriving or updated records show up?

What to look for: References incremental materialization, a unique key, merge/upsert strategy, lookback windows for late data, and dbt tests on uniqueness and freshness to catch regressions.

A nightly pipeline's warehouse costs have doubled. How do you investigate and bring them down?

What to look for: Profiles expensive queries, looks at clustering/partitioning, warehouse sizing and auto-suspend, materialization choices, scanned bytes, and avoiding full refreshes where incremental works.

How do you instrument a pipeline so you know about bad data before your analysts do?

What to look for: Covers dbt tests, Great Expectations or equivalent, freshness and volume checks, anomaly alerting, and observability tooling like Monte Carlo, plus where alerts route and who owns them.

Explain how you guarantee a pipeline is idempotent and safe to re-run after a mid-run failure.

What to look for: Discusses deterministic transformations, atomic swaps or staging tables, deduplication on natural keys, and orchestration retries that don't double-load data.

Behavioral & Past Experience

Tell me about a data quality incident you were responsible for. How did you detect it, fix it, and prevent a recurrence?

What to look for: Honest ownership, root-cause analysis, a concrete prevention (test, contract, alert) added afterward, and clear communication to downstream consumers.

Describe onboarding a new data source where the source schema kept changing underneath you.

What to look for: Shows working with source-system owners, building tolerant ingestion, schema-change detection, and contracts or alerting so silent breakages surface early.

Tell me about a time you migrated or re-architected a pipeline or warehouse. What drove it and how did you de-risk the cutover?

What to look for: Describes the motivation (cost, scale, reliability), backfill and parallel-run strategy, validation against the old system, and rollback planning.

Describe a time you partnered with a data scientist to productionize a model. What was your part?

What to look for: Explains building reliable feature/serving pipelines, scheduling, monitoring for drift or staleness, and a clear handoff contract rather than a thrown-over-the-wall script.

Give an example of toil you automated away in your data work.

What to look for: Identifies a repetitive manual task, the automation built, and measurable time saved or errors reduced, showing a bias toward leverage.

Situational & Problem-Solving

An analyst reports that a dashboard's numbers are wrong but the pipeline shows green. How do you triage?

What to look for: Reproduces against source-of-truth, checks lineage from dashboard back through models, validates tests actually cover the affected logic, and questions whether 'green' means 'correct.'

You need to backfill two years of history into a new dbt model without breaking nightly runs or blowing the budget. What's your plan?

What to look for: Chunked or partitioned backfill, off-peak scheduling, separate warehouse sizing, validation of row counts and aggregates, and keeping the incremental run untouched during backfill.

Stakeholders want a metric updated every five minutes, but the current pipeline is nightly batch. How do you respond?

What to look for: Probes the real business need, weighs streaming/micro-batch against complexity and cost, and proposes the simplest architecture that meets the actual freshness requirement.

How would you design data governance and access controls so analysts get what they need without exposing sensitive PII?

What to look for: Mentions role-based access, masking or row/column-level security, separating raw from curated layers, and least-privilege IAM in the warehouse.

An upstream team plans a breaking schema change next sprint. What do you do?

What to look for: Establishes a data contract or versioning, sets up alerting on the change, plans a compatibility window, and coordinates the migration with consumers rather than reacting after breakage.

Collaboration & Culture

How do you document data lineage and model definitions so analysts and scientists can self-serve?

What to look for: Uses dbt docs, a data catalog, clear model descriptions and ownership, and treats documentation as part of delivery rather than an afterthought.

How do you handle a disagreement with an analyst about how a metric should be defined in the warehouse?

What to look for: Seeks a single source of truth, drives toward a documented, tested definition, and involves the right business owner rather than maintaining two conflicting versions.

How do you keep data engineering work visible and prioritized when it's mostly invisible plumbing?

What to look for: Frames work around consumer impact and reliability, communicates SLAs and incidents clearly, and partners with stakeholders on roadmap trade-offs.

What's your approach to code review and standards on a data team?

What to look for: Values reviewing SQL/dbt for correctness, tests, and maintainability, shared style and modeling conventions, and CI that runs tests before merges.

FAQ

Frequently asked questions

What skills should a strong Data Engineer have? +
Strong SQL and dbt for transformation and modeling, solid Python for pipeline development and automation, and hands-on experience with a cloud warehouse like Snowflake, BigQuery, or Redshift. They should also know orchestration tools such as Airflow, batch and streaming ingestion patterns, and data quality testing and observability practices.
How many interview rounds does hiring a Data Engineer usually take? +
Typically three to four rounds: an initial screen, a SQL and data-modeling exercise or take-home, a pipeline-design or system-design discussion, and a collaboration or stakeholder conversation. Some teams add a short Python/dbt pairing session, depending on seniority and how hands-on the role is.
What is the most important quality to screen for in a Data Engineer? +
Reliability-mindedness: a candidate who builds idempotent, tested, observable pipelines and treats data quality and lineage as first-class concerns. Trustworthy data is the product, so the engineers who instrument, alert, and document proactively are far more valuable than those who only ship transformations.
Built for recruiters & hiring teams

See how much faster your team could hire

Get a personalized walkthrough of Pitch N Hire on your own roles and workflow. No slides, no obligation.

Prefer to talk? Book a demo · View pricing

Free 1-user plan · No credit card · Talk to a real hiring expert

One Hiring Infrastructure.
Zero Tool Chaos.

Demos are consultative. We respect privacy and enterprise
governance. No lock-ins.

Sign up free Book a demo