Data engineering primer
Data engineering for non-engineers
You do not need to be a data engineer to work with one effectively. Learn what data engineers build, what the modern data stack looks like, and how to collaborate without speaking fluent Python.
Why analysts and PMs need to understand data engineering
If you have ever asked "why is this dashboard wrong?" or "why is there no data for last Tuesday?", you have worked with data engineering problems.
Understanding what data engineers build helps you ask better questions, escalate correctly, and not make commitments about data that are impossible to deliver. The analysts and PMs who navigate data problems well are the ones who understand what is happening one layer below their work.
What data engineers actually build
Data engineering is the infrastructure work that makes analytics possible. Here are the four main things your data engineering team is responsible for.
Data pipelines
Automated processes that move data from source systems (apps, databases, APIs) to storage (data warehouses) on a schedule or in real-time.
Data warehouses
Centralized storage optimized for analytics queries. Examples: Snowflake, BigQuery, Redshift. Not the same as a production database — much slower to write, very fast to query.
Data transformations (dbt)
Clean, join, and reshape raw data into analytics-ready tables. The tables your dashboards and reports query are the output of transformation work.
Data quality monitoring
Checks that data is arriving, complete, and accurate. Alerts when something breaks before a stakeholder notices a number that looks wrong.
The modern data stack (simplified)
Most data teams use a version of this five-layer architecture. Each layer has specialists. Data engineers typically own ingestion, storage, and transformation. Analysts work at the visualization layer, with increasing involvement in transformation.
The arrow runs left to right: Sources → Ingestion → Storage → Transform → Visualize. Your dashboards sit at the far right. Every layer to the left is infrastructure you depend on but may not directly control.
Common things that go wrong (and why)
Most "why is the data wrong?" conversations trace back to one of these four root causes. Knowing them helps you ask the right question instead of assuming the analyst made an error.
Data pipeline failures
The connection to the source broke, or the source changed its schema. Data stops flowing.
Schema changes
Engineering changed a database column name without telling the data team. Dashboards break.
Late-arriving data
Transactions from the night before take hours to appear. Time-sensitive dashboards lag.
Duplicate records
The same event was counted twice due to a pipeline retry.
How to collaborate effectively
The most productive data teams are not the ones where everyone knows everything — they are the ones where each role knows enough about adjacent roles to communicate clearly. Here is what that looks like in practice.
When you need new data
'I need X data by Y date to answer Z question' is a better request than 'can you add this to the warehouse?'
When data looks wrong
Bring a specific example (this user ID, this timestamp, this number) — not 'the numbers look weird'.
When scoping analytics work
Always ask 'what data do we actually have?' before committing to an analysis.
Keep building
Become a Data Analyst
SQL, visualization, data engineering fundamentals, and stakeholder communication — the full skill set for working effectively with data at any company.
Explore the Data Analyst track