Data engineering primer

Data engineering for non-engineers

You do not need to be a data engineer to work with one effectively. Learn what data engineers build, what the modern data stack looks like, and how to collaborate without speaking fluent Python.

Why analysts and PMs need to understand data engineering

If you have ever asked "why is this dashboard wrong?" or "why is there no data for last Tuesday?", you have worked with data engineering problems.

Understanding what data engineers build helps you ask better questions, escalate correctly, and not make commitments about data that are impossible to deliver. The analysts and PMs who navigate data problems well are the ones who understand what is happening one layer below their work.

What data engineers actually build

Data engineering is the infrastructure work that makes analytics possible. Here are the four main things your data engineering team is responsible for.

Data pipelines

Automated processes that move data from source systems (apps, databases, APIs) to storage (data warehouses) on a schedule or in real-time.

Data warehouses

Centralized storage optimized for analytics queries. Examples: Snowflake, BigQuery, Redshift. Not the same as a production database — much slower to write, very fast to query.

Data transformations (dbt)

Clean, join, and reshape raw data into analytics-ready tables. The tables your dashboards and reports query are the output of transformation work.

Data quality monitoring

Checks that data is arriving, complete, and accurate. Alerts when something breaks before a stakeholder notices a number that looks wrong.

The modern data stack (simplified)

Most data teams use a version of this five-layer architecture. Each layer has specialists. Data engineers typically own ingestion, storage, and transformation. Analysts work at the visualization layer, with increasing involvement in transformation.

Sources

Your app databases, third-party APIs, event tracking

Postgres, Salesforce, Segment

Ingestion

Pull data from sources and land it in storage

Fivetran, Airbyte

Storage

Centralized warehouse optimized for analytics queries

Snowflake, BigQuery, Redshift

Transform

Clean, join, and reshape raw data into analytics-ready tables

dbt

Visualize

Dashboards and reports your stakeholders actually see

Looker, Tableau, Mode

The arrow runs left to right: Sources → Ingestion → Storage → Transform → Visualize. Your dashboards sit at the far right. Every layer to the left is infrastructure you depend on but may not directly control.

Common things that go wrong (and why)

Most "why is the data wrong?" conversations trace back to one of these four root causes. Knowing them helps you ask the right question instead of assuming the analyst made an error.

Data pipeline failures

The connection to the source broke, or the source changed its schema. Data stops flowing.

Schema changes

Engineering changed a database column name without telling the data team. Dashboards break.

Late-arriving data

Transactions from the night before take hours to appear. Time-sensitive dashboards lag.

Duplicate records

The same event was counted twice due to a pipeline retry.

How to collaborate effectively

The most productive data teams are not the ones where everyone knows everything — they are the ones where each role knows enough about adjacent roles to communicate clearly. Here is what that looks like in practice.

When you need new data

'I need X data by Y date to answer Z question' is a better request than 'can you add this to the warehouse?'

When data looks wrong

Bring a specific example (this user ID, this timestamp, this number) — not 'the numbers look weird'.

When scoping analytics work

Always ask 'what data do we actually have?' before committing to an analysis.

Keep building

Become a Data Analyst

SQL, visualization, data engineering fundamentals, and stakeholder communication — the full skill set for working effectively with data at any company.

Explore the Data Analyst track