Q: How do you handle missing data?

First, understand why data is missing. Missing completely at random (MCAR) is the easiest case — you can safely exclude those rows. Missing not at random (MNAR) is dangerous — excluding those rows biases your results. For MNAR, try to recover data from another source or flag it as a known limitation. For imputation: use median for skewed numerical fields, mode for categoricals, or model-based imputation for complex cases. Always document what you did and why — stakeholders need to know the analysis has known gaps so they can weigh the conclusions appropriately.

Q: You found a surprising insight in the data. How do you present it to stakeholders?

Lead with the business implication, not the methodology. Open with a one-sentence headline: 'Customers who use Feature X in their first week are 2x more likely to still be active at 90 days.' Then show the evidence clearly — a chart is usually better than a table. Anticipate objections: Is this correlation or causation? Is the sample large enough? What confounds might explain it? Close with a concrete recommendation or a next step for validation. Keep the method slides in an appendix for anyone who wants to dig in. The goal is a decision, not a stats lecture.

Question 1

Write a query to find the top 5 customers by revenue this year.

Accepted Answer

Group by customer_id, SUM the revenue column, filter to the current year in a WHERE clause, ORDER BY revenue DESC, and LIMIT 5. If you need the customer name, JOIN to the customers table. Example:  SELECT c.name, SUM(o.revenue) AS total_revenue FROM orders o JOIN customers c ON c.id = o.customer_id WHERE YEAR(o.created_at) = YEAR(CURRENT_DATE) GROUP BY c.id, c.name ORDER BY total_revenue DESC LIMIT 5;

Question 2

Find users who purchased in March but not in April.

Accepted Answer

Use a subquery or LEFT JOIN approach. With NOT IN:  SELECT DISTINCT user_id FROM orders WHERE MONTH(created_at) = 3   AND user_id NOT IN (     SELECT user_id FROM orders WHERE MONTH(created_at) = 4   );  With LEFT JOIN (often faster on large tables):  SELECT DISTINCT m.user_id FROM orders m LEFT JOIN orders a ON a.user_id = m.user_id AND MONTH(a.created_at) = 4 WHERE MONTH(m.created_at) = 3   AND a.user_id IS NULL;

Question 3

What is the difference between INNER JOIN and LEFT JOIN?

Accepted Answer

INNER JOIN returns only rows where the join condition matches in both tables — unmatched rows are dropped from both sides. LEFT JOIN returns all rows from the left table plus matching rows from the right; when there is no match, the right-side columns come back as NULL. Use LEFT JOIN when you want to keep records even if the related table has no entry — for example, listing all customers including those with zero orders.

Question 4

How would you calculate 30-day retention?

Accepted Answer

Define Day 1 as the cohort date (e.g., first purchase or sign-up). Retention = users who performed the target action at least once between Day 2 and Day 30, divided by total users in the cohort. In SQL: count DISTINCT user_ids in the activity table where the activity_date falls within 1–30 days of their first_event_date, then divide by the cohort size. Express as a percentage. Be prepared to clarify whether the question means any-day retention (did they come back at all?) or Day-30 retention (were they active on that exact day?).

Question 5

What is a window function? Give an example.

Accepted Answer

A window function computes a value across a set of rows related to the current row without collapsing them into a single output row like GROUP BY does. Common examples: RANK() OVER (PARTITION BY category ORDER BY revenue DESC) ranks products within each category. ROW_NUMBER() assigns a unique sequential number. LAG(revenue, 1) OVER (ORDER BY month) gives you the previous month's revenue in the same row, making month-over-month comparisons easy without a self-join.

Question 6

Our DAU dropped 20% last Tuesday. How would you investigate?

Accepted Answer

Start by ruling out data problems: check whether the tracking pipeline had an outage or whether a dashboard query changed. Then check external context: was there a holiday, a major news event, or a marketing pause? Next, look at whether any product change shipped that day. Finally, break the drop into segments — platform (iOS vs Android vs web), geography, user cohort, acquisition channel. Narrow down where the drop is concentrated. If it's one segment, that points you toward a specific cause. Bring a structured hypothesis list to stakeholders, not just 'DAU dropped.'

Question 7

How would you measure the success of a new feature?

Accepted Answer

First, clarify what the feature is supposed to do — that defines your primary success metric (e.g., increased activation rate, more sessions per user). Then pick guardrail metrics that should not get worse (revenue, core retention). Define a time window long enough to see the behavior you care about. Ideally, run an A/B test so you have a clean comparison group. If A/B is not possible, use a pre/post analysis with controls. Summarize with: did the primary metric move? Did guardrails hold? Is the effect statistically significant and practically meaningful?

Question 8

We have two versions of the checkout page. How do you decide which is better?

Accepted Answer

Run an A/B test. Randomly assign users to Control (version A) or Variant (version B) — random assignment removes selection bias. Decide the primary metric upfront (conversion rate) and how long to run it (enough to reach statistical power, typically at least one full business cycle). Measure secondary metrics too: average order value, return rate, support contacts. At the end, check for statistical significance (p < 0.05) and practical significance (is the lift large enough to matter?). Avoid peeking at results early — that inflates false-positive rates.

Question 9

How do you handle missing data?

Accepted Answer

First, understand why data is missing. Missing completely at random (MCAR) is the easiest case — you can safely exclude those rows. Missing not at random (MNAR) is dangerous — excluding those rows biases your results. For MNAR, try to recover data from another source or flag it as a known limitation. For imputation: use median for skewed numerical fields, mode for categoricals, or model-based imputation for complex cases. Always document what you did and why — stakeholders need to know the analysis has known gaps so they can weigh the conclusions appropriately.

Question 10

You found a surprising insight in the data. How do you present it to stakeholders?

Accepted Answer

Lead with the business implication, not the methodology. Open with a one-sentence headline: 'Customers who use Feature X in their first week are 2x more likely to still be active at 90 days.' Then show the evidence clearly — a chart is usually better than a table. Anticipate objections: Is this correlation or causation? Is the sample large enough? What confounds might explain it? Close with a concrete recommendation or a next step for validation. Keep the method slides in an appendix for anyone who wants to dig in. The goal is a decision, not a stats lecture.

Question 11

Tell me about a time you used data to change a decision.

Accepted Answer

Use the STAR format. Situation: set the business context briefly. Task: what was the decision on the table? Action: walk through the analysis you ran — what data you pulled, how you structured it, what you found. Result: what decision changed, and what was the measurable outcome? Strong answers are specific ('the team was about to cut the loyalty program; I showed it drove 40% of repeat purchases; they kept it and restructured it instead') rather than vague ('I used data to help the team make a better choice').

Question 12

Describe a situation where your analysis was wrong.

Accepted Answer

This question is a filter for self-awareness. Interviewers are not looking for perfection — they are checking whether you have a growth mindset and a process for catching errors. A strong answer: describe the mistake honestly (wrong assumption, insufficient sample size, a confound you missed), then explain how you discovered it, how you communicated it to stakeholders, and what you changed in your process to prevent it. Avoid saying 'I cannot think of a time' — that reads as defensive. Everyone has gotten an analysis wrong.

Question 13

How do you manage multiple requests from different stakeholders?

Accepted Answer

Acknowledge all requests quickly so stakeholders feel heard. Then assess each by business impact (what decision does this enable? how large is the affected surface?) and effort (hours required). Stack-rank and share your prioritization with the requesters so there are no surprises. When two requests have equal priority, escalate the tie to your manager or a shared stakeholder meeting rather than making a unilateral call. Set clear ETAs and update proactively if something changes. The goal is to be predictable — analysts lose trust when they go quiet.

Data Analyst Interview Questions
(With Example Answers)

How data analyst interviews work

Technical questions — SQL

Business and analytical thinking questions

Behavioral questions

Questions to ask the interviewer