Skip to main content

Interview prep

Data Analyst Interview Questions
(With Example Answers)

25 real questions across SQL, analytical thinking, and behavioral rounds — each with a structured answer framework so you know exactly what interviewers are looking for.

How data analyst interviews work

Most DA interview processes follow four rounds. Knowing what each round tests lets you prepare the right material for the right stage.

Round 1

Recruiter screen

Background, motivation, and salary expectations. The recruiter is checking whether you're a plausible fit and whether your comp range matches the role. Have a one-minute summary of your background ready and a clear number for salary.

Round 2

Technical screen — SQL or Python

Live coding or a take-home task. You will write SQL queries, often against a real-looking schema. Some companies add Python (pandas) for data manipulation. Expect GROUP BY, JOINs, window functions, and simple aggregations. Practice on a real SQL interface — not just on paper.

Round 3

Case study

Given a business problem and a dataset (or a description of one), what would you do? Interviewers are testing whether you can translate data into decisions. Structure your answer: clarify the goal, define your metrics, describe the analysis approach, and state what you would recommend.

Round 4

Panel or culture round

Stakeholder communication and problem-solving approach. You will meet the team — often including a product manager, a data engineer, or a business partner. They are checking whether you communicate clearly, handle ambiguity well, and would be good to work with.

Technical questions — SQL

Q1–5

Click any question to see the example answer framework. SQL questions are almost always about GROUP BY, JOINs, subqueries, and window functions — master those four and you can answer 90% of what gets asked.

1

Write a query to find the top 5 customers by revenue this year.

GROUP BY + ORDER BY + LIMIT

Example answer framework

Group by customer_id, SUM the revenue column, filter to the current year in a WHERE clause, ORDER BY revenue DESC, and LIMIT 5. If you need the customer name, JOIN to the customers table. Example:

SELECT c.name, SUM(o.revenue) AS total_revenue FROM orders o JOIN customers c ON c.id = o.customer_id WHERE YEAR(o.created_at) = YEAR(CURRENT_DATE) GROUP BY c.id, c.name ORDER BY total_revenue DESC LIMIT 5;

2

Find users who purchased in March but not in April.

LEFT JOIN or NOT IN with subquery

Example answer framework

Use a subquery or LEFT JOIN approach. With NOT IN:

SELECT DISTINCT user_id FROM orders WHERE MONTH(created_at) = 3 AND user_id NOT IN ( SELECT user_id FROM orders WHERE MONTH(created_at) = 4 );

With LEFT JOIN (often faster on large tables):

SELECT DISTINCT m.user_id FROM orders m LEFT JOIN orders a ON a.user_id = m.user_id AND MONTH(a.created_at) = 4 WHERE MONTH(m.created_at) = 3 AND a.user_id IS NULL;

3

What is the difference between INNER JOIN and LEFT JOIN?

INNER: only matching rows; LEFT: all from left table

Example answer framework

INNER JOIN returns only rows where the join condition matches in both tables — unmatched rows are dropped from both sides. LEFT JOIN returns all rows from the left table plus matching rows from the right; when there is no match, the right-side columns come back as NULL. Use LEFT JOIN when you want to keep records even if the related table has no entry — for example, listing all customers including those with zero orders.

4

How would you calculate 30-day retention?

COUNT users active Days 2–30 / COUNT users active Day 1

Example answer framework

Define Day 1 as the cohort date (e.g., first purchase or sign-up). Retention = users who performed the target action at least once between Day 2 and Day 30, divided by total users in the cohort. In SQL: count DISTINCT user_ids in the activity table where the activity_date falls within 1–30 days of their first_event_date, then divide by the cohort size. Express as a percentage. Be prepared to clarify whether the question means any-day retention (did they come back at all?) or Day-30 retention (were they active on that exact day?).

5

What is a window function? Give an example.

Calculates across rows without grouping — RANK(), ROW_NUMBER(), LAG()

Example answer framework

A window function computes a value across a set of rows related to the current row without collapsing them into a single output row like GROUP BY does. Common examples: RANK() OVER (PARTITION BY category ORDER BY revenue DESC) ranks products within each category. ROW_NUMBER() assigns a unique sequential number. LAG(revenue, 1) OVER (ORDER BY month) gives you the previous month's revenue in the same row, making month-over-month comparisons easy without a self-join.

Sharpen your SQL before the interview

Reading about SQL is not the same as writing it. Practice in our interactive SQL environment with real DA-style challenges.

SQL practice

Business and analytical thinking questions

Q6–10

These questions test whether you can move from data to decision. The best answers are structured and speak to business impact — not just analytical technique.

6

Our DAU dropped 20% last Tuesday. How would you investigate?

External factors → product changes → data pipeline issues → segment breakdown

Example answer framework

Start by ruling out data problems: check whether the tracking pipeline had an outage or whether a dashboard query changed. Then check external context: was there a holiday, a major news event, or a marketing pause? Next, look at whether any product change shipped that day. Finally, break the drop into segments — platform (iOS vs Android vs web), geography, user cohort, acquisition channel. Narrow down where the drop is concentrated. If it's one segment, that points you toward a specific cause. Bring a structured hypothesis list to stakeholders, not just 'DAU dropped.'

7

How would you measure the success of a new feature?

Define goal metric, guardrail metrics, time window, comparison group

Example answer framework

First, clarify what the feature is supposed to do — that defines your primary success metric (e.g., increased activation rate, more sessions per user). Then pick guardrail metrics that should not get worse (revenue, core retention). Define a time window long enough to see the behavior you care about. Ideally, run an A/B test so you have a clean comparison group. If A/B is not possible, use a pre/post analysis with controls. Summarize with: did the primary metric move? Did guardrails hold? Is the effect statistically significant and practically meaningful?

8

We have two versions of the checkout page. How do you decide which is better?

A/B test: assign randomly, run for statistical significance, measure primary + secondary metrics

Example answer framework

Run an A/B test. Randomly assign users to Control (version A) or Variant (version B) — random assignment removes selection bias. Decide the primary metric upfront (conversion rate) and how long to run it (enough to reach statistical power, typically at least one full business cycle). Measure secondary metrics too: average order value, return rate, support contacts. At the end, check for statistical significance (p < 0.05) and practical significance (is the lift large enough to matter?). Avoid peeking at results early — that inflates false-positive rates.

9

How do you handle missing data?

Check if missing is random or systematic; impute or exclude; document the decision

Example answer framework

First, understand why data is missing. Missing completely at random (MCAR) is the easiest case — you can safely exclude those rows. Missing not at random (MNAR) is dangerous — excluding those rows biases your results. For MNAR, try to recover data from another source or flag it as a known limitation. For imputation: use median for skewed numerical fields, mode for categoricals, or model-based imputation for complex cases. Always document what you did and why — stakeholders need to know the analysis has known gaps so they can weigh the conclusions appropriately.

10

You found a surprising insight in the data. How do you present it to stakeholders?

Lead with the implication, not the method; one-sentence headline; show evidence; recommend action

Example answer framework

Lead with the business implication, not the methodology. Open with a one-sentence headline: 'Customers who use Feature X in their first week are 2x more likely to still be active at 90 days.' Then show the evidence clearly — a chart is usually better than a table. Anticipate objections: Is this correlation or causation? Is the sample large enough? What confounds might explain it? Close with a concrete recommendation or a next step for validation. Keep the method slides in an appendix for anyone who wants to dig in. The goal is a decision, not a stats lecture.

Behavioral questions

Q11–13

Behavioral questions are answered best with the STAR method: Situation, Task, Action, Result. Prepare two or three stories from your own experience that can be adapted to different prompts.

S

Situation

Set the scene briefly

T

Task

Your specific responsibility

A

Action

Exactly what you did

R

Result

Measurable outcome

11

Tell me about a time you used data to change a decision.

STAR: Situation, Task, Action, Result

Example answer framework

Use the STAR format. Situation: set the business context briefly. Task: what was the decision on the table? Action: walk through the analysis you ran — what data you pulled, how you structured it, what you found. Result: what decision changed, and what was the measurable outcome? Strong answers are specific ('the team was about to cut the loyalty program; I showed it drove 40% of repeat purchases; they kept it and restructured it instead') rather than vague ('I used data to help the team make a better choice').

12

Describe a situation where your analysis was wrong.

Shows humility and process — this question is designed to filter out overconfident candidates

Example answer framework

This question is a filter for self-awareness. Interviewers are not looking for perfection — they are checking whether you have a growth mindset and a process for catching errors. A strong answer: describe the mistake honestly (wrong assumption, insufficient sample size, a confound you missed), then explain how you discovered it, how you communicated it to stakeholders, and what you changed in your process to prevent it. Avoid saying 'I cannot think of a time' — that reads as defensive. Everyone has gotten an analysis wrong.

13

How do you manage multiple requests from different stakeholders?

Prioritization framework: impact vs effort, escalate ties

Example answer framework

Acknowledge all requests quickly so stakeholders feel heard. Then assess each by business impact (what decision does this enable? how large is the affected surface?) and effort (hours required). Stack-rank and share your prioritization with the requesters so there are no surprises. When two requests have equal priority, escalate the tie to your manager or a shared stakeholder meeting rather than making a unilateral call. Set clear ETAs and update proactively if something changes. The goal is to be predictable — analysts lose trust when they go quiet.

Questions to ask the interviewer

Always come with questions. Asking nothing signals low interest. These three questions are specific enough to show you are serious and vague enough to work at almost any company.

What does a great analyst look like on this team at the 6-month mark?

What data tools and stack does the team use?

What is the biggest data challenge the team is working through right now?

Practice your SQL skills first

The technical screen is where most DA candidates get filtered out. Get comfortable writing queries before the interview, not during it.

Go to SQL practice