Start Here
Interview prep for Senior Data Analytics Engineer and Data Engineer roles — especially at modern AI / GPU / inference platform companies.
The role, in plain English
This guide is for engineers preparing for roles that sit in the modern data stack — usually titled Senior Data Analytics Engineer, Analytics Engineer, Senior Data Engineer, or some hybrid. These titles overlap heavily; the work is recognizable:
- Own the transformation layer — raw operational data lands in the warehouse; you turn it into analytics-ready, well-tested, documented models that the rest of the company queries.
- Live in SQL + dbt — the core craft. Everything else is plumbing around these two.
- Bridge engineering and analytics — you understand data engineering (pipelines, infra) and you understand analyst needs (metrics, dimensions, reports). You translate.
- Build the metrics layer — ARR, retention, GPU utilization, inference cost-per-token, whatever the business runs on. You own the definitions and the freshness.
- Drive data quality — tests, lineage, ownership, contracts. When dashboards break, you're who they call.
At an AI infrastructure company (GPU marketplace, inference cloud, training compute platform) the data is unusually rich: GPU telemetry, inference request logs, billing events, multi-tenant usage, model performance. The shape of the work is the same; the domain is specific. See 17-ai-compute-domain.
What the rounds typically test
A typical loop for these roles includes:
- SQL live coding — almost always. Window functions, CTEs, complex joins, dedup, gaps-and-islands. This is the highest-signal screen.
- Data modeling discussion — given a business, design the warehouse model. Star schema, fact/dimension tables, SCDs, grain.
- System design / data architecture — design a pipeline end-to-end. Source → land → transform → serve. Cover freshness, idempotency, schema evolution, observability.
- dbt / transformation layer — how do you organize models, test them, version them, document them.
- Domain conversation — what would you build for their business. How would you measure their key metrics.
- Behavioral — communication with stakeholders, prioritization, debugging tough data issues, dealing with bad data.
Pure DSA / LeetCode is less common for data analytics engineering than for software engineering — but Python for data manipulation is fair game, and one tricky-algorithm round isn't unusual at infrastructure-heavy companies.
The folder, in reading order
Section A — Orient (read first)
| File | Why |
|---|---|
| 01-the-role | What the role actually involves, the data engineer vs analytics engineer vs analyst triangle, what to ask them |
| 02-positioning-from-scratch | How to interview honestly when your prior stack doesn't perfectly match theirs |
Section B — Core technical
| File | Why |
|---|---|
| 03-modern-data-stack | How warehouse + dbt + orchestrator + BI fit together |
| 04-sql-deep-dive | SQL fluency — the single most-tested skill in this loop |
| 05-dbt-deep-dive | The defining tool of analytics engineering — models, sources, tests, snapshots, macros |
| 06-data-modeling | Kimball star schema, slowly-changing dimensions, OBT, grain |
| 07-data-pipelines | ETL vs ELT, batch vs streaming, idempotency, backfills |
| 08-data-quality | Tests, freshness, anomaly detection, contracts |
| 09-warehouses | Snowflake, BigQuery, Redshift, Databricks, Iceberg/Delta — pricing, partitioning, performance |
Section C — Coding
| File | Why |
|---|---|
| 10-sql-patterns | The 12 SQL patterns that cover 80% of interview questions |
| 11-sql-problems | 11 SQL problems worked out, drill mode |
| 12-python-for-data | pandas, polars, when to leave SQL |
Section D — Production / cloud
| File | Why |
|---|---|
| 13-orchestration | Airflow, Dagster, Prefect — DAGs, retries, idempotency |
| 14-observability | Lineage, freshness, volume monitoring, anomaly detection |
Section E — Domain & execution
| File | Why |
|---|---|
| 17-ai-compute-domain | GPU telemetry, inference logs, billing events, unit economics — what data looks like at an AI infra company |
| 15-interview-questions | ~30 practice Q&As. Drill these out loud |
| 16-day-of | Tactics, traps, questions to ask them. Re-read morning of |
Suggested study schedule
If you have 7+ days
- Day 1: 01, 02 (orient) → 03 (modern data stack).
- Day 2: 04 (SQL deep dive) + 10 (SQL patterns).
- Day 3: 11 (SQL problems — drill on a timer).
- Day 4: 05 (dbt) + 06 (data modeling).
- Day 5: 07 (pipelines) + 08 (quality) + 09 (warehouses).
- Day 6: 12 (Python) + 13 (orchestration) + 14 (observability) + 17 (AI compute domain).
- Day 7: Drill 15. Read 16. Sleep.
If you have 2-3 days
01, 02, 04, 05, 06, 11 (drill SQL), 15, 16. Skim everything else.
If you have < 24 hours
01, 02, 11 (the SQL problems — most likely to come up), 15, 16. Skim 04, 05, 06 headings only.
Two practical things to do before interview day
Reading is cheaper than building, but building sticks. If you can find an evening or two:
- Spin up dbt against a real warehouse. Free options: dbt Cloud's free tier, BigQuery sandbox, Snowflake's trial. Clone
jaffle_shop(dbt's reference project) or one of the dbt sample projects. Rundbt build. Inspect the lineage graph. Add a test that fails on purpose. The whole stack clicks once you've watched it run. - Solve 10 SQL window-function problems on a real warehouse. Pick problems from 11-sql-problems or LeetCode SQL. Solve them in the warehouse UI, not in a code editor. Watching the query plan execute is where intuition comes from.
These two evenings would close more of your gap than the same time spent rereading.
What "winning" looks like in these rounds
For data analytics engineering, winning is:
- SQL fluency under pressure — you can write CTEs and window functions without flinching. Pattern recognition: "this is a gaps-and-islands problem" in 10 seconds.
- Modeling instinct — given a business, you reach for grain, dimensions, facts, SCDs in the right order. You ask "what's the grain of this table?" first.
- Failure-mode thinking — late-arriving data, duplicate events, schema drift, time-zone confusion, NULL handling. Mention these unprompted.
- Stakeholder fluency — you talk about metrics in business terms (cohort retention, gross margin, unit economics) and translate to SQL.
- Operational maturity — tests, alerts, lineage, ownership. Data is software; treat it accordingly.
- Honesty at the edge — "I haven't used Iceberg in production but I understand the table-format shape and the operational story" beats faking.