To interview a data scientist, test statistics, experimentation, and machine learning fundamentals alongside how they translate a business question into a measurable analysis. This set covers hypothesis testing, A/B test design, model selection and validation, bias and overfitting, and how candidates communicate uncertainty to non-technical stakeholders.
Run a data scientist interview with a case that starts from a fuzzy business question and ends with a defensible recommendation, plus targeted statistics and modeling probes. Reward rigor about uncertainty and clear communication as much as algorithmic knowledge.
Explain the difference between a p-value and a confidence interval, and what a p-value does not tell you.
What to look for: Defines both correctly, notes a p-value isn't the probability the hypothesis is true, and prefers intervals for communicating effect size.
How would you design an A/B test for a new checkout flow, including how you'd size it?
What to look for: Defines a primary metric, picks a minimum detectable effect, computes sample size and duration, randomizes correctly, and pre-registers the analysis.
What is the bias-variance trade-off, and how does it show up in a model that performs well in training but poorly in production?
What to look for: Connects overfitting to high variance, explains regularization, cross-validation, and the danger of training-test leakage.
When is correlation enough and when do you need a causal design?
What to look for: Knows when decisions require causality, can describe randomized experiments, difference-in-differences, or instrumental variables, and warns about confounders.
Your classifier has 95% accuracy but the business is unhappy. What do you investigate?
What to look for: Class imbalance, the right metric (precision, recall, F1, AUC, calibration), and aligning the metric with the business cost of errors.
How do you handle missing data, and how does your choice affect the result?
What to look for: Distinguishes missing-at-random mechanisms, weighs deletion vs imputation, and acknowledges that naive imputation can bias estimates.
How would you detect and explain a model whose performance is degrading over time?
What to look for: Monitors for data and concept drift, compares feature distributions, retrains on a schedule, and investigates upstream data changes.
How do you choose between a simple interpretable model and a more accurate black-box one?
What to look for: Weighs the cost of errors, regulatory and explainability needs, and stakeholder trust against marginal accuracy gains.
Tell me about an analysis whose conclusion surprised stakeholders. How did you present it?
What to look for: Communicates clearly to non-technical audiences, stands behind rigorous findings, and frames uncertainty honestly.
Describe a time your initial hypothesis was wrong. What did you do?
What to look for: Intellectual honesty, willingness to follow the data, and resisting the pressure to confirm a preferred narrative.
When have you had to push back on a request to 'find data that supports' a decision?
What to look for: Scientific integrity, distinguishing exploratory from confirmatory work, and avoiding p-hacking or cherry-picking.
Tell me about a model you built that didn't get adopted. Why, and what did you learn?
What to look for: Understands that impact requires stakeholder trust and integration, not just accuracy, and reflects on the gap honestly.
A product manager asks, 'why did revenue drop last week?' How do you approach it?
What to look for: Segments the metric, checks data quality first, forms hypotheses, isolates causes, and separates signal from noise.
An experiment shows a tiny but statistically significant lift. Do you ship it?
What to look for: Weighs practical significance and cost against statistical significance, considers sample size and multiple-comparison risk.
You have two weeks and messy data to answer a strategic question. How do you scope it?
What to look for: Prioritizes the decision the analysis informs, time-boxes data cleaning, and delivers a defensible answer with caveats rather than perfection.
Two segments respond oppositely to a treatment that looks neutral overall. How do you interpret it?
What to look for: Recognizes Simpson's paradox and heterogeneous effects, digs into subgroups, and avoids a misleading aggregate conclusion.
Get a personalized walkthrough of Pitch N Hire on your own roles and workflow. No slides, no obligation.
Prefer to talk? Book a demo · View pricing
Free 1-user plan · No credit card · Talk to a real hiring expert
See how Pitch N Hire automates sourcing, screening and AI interviews on your real roles. Start with your work email — no credit card.
★ Free 1-user plan · No spam · Talk to a real hiring expert