Section A · Orient · Read first

Start Here

Interview prep for Senior Data Analytics Engineer and Data Engineer roles — especially at modern AI / GPU / inference platform companies.

The role, in plain English

This guide is for engineers preparing for roles that sit in the modern data stack — usually titled Senior Data Analytics Engineer, Analytics Engineer, Senior Data Engineer, or some hybrid. These titles overlap heavily; the work is recognizable:

Own the transformation layer — raw operational data lands in the warehouse; you turn it into analytics-ready, well-tested, documented models that the rest of the company queries.
Live in SQL + dbt — the core craft. Everything else is plumbing around these two.
Bridge engineering and analytics — you understand data engineering (pipelines, infra) and you understand analyst needs (metrics, dimensions, reports). You translate.
Build the metrics layer — ARR, retention, GPU utilization, inference cost-per-token, whatever the business runs on. You own the definitions and the freshness.
Drive data quality — tests, lineage, ownership, contracts. When dashboards break, you're who they call.

At an AI infrastructure company (GPU marketplace, inference cloud, training compute platform) the data is unusually rich: GPU telemetry, inference request logs, billing events, multi-tenant usage, model performance. The shape of the work is the same; the domain is specific. See 17-ai-compute-domain.

What the rounds typically test

A typical loop for these roles includes:

SQL live coding — almost always. Window functions, CTEs, complex joins, dedup, gaps-and-islands. This is the highest-signal screen.
Data modeling discussion — given a business, design the warehouse model. Star schema, fact/dimension tables, SCDs, grain.
System design / data architecture — design a pipeline end-to-end. Source → land → transform → serve. Cover freshness, idempotency, schema evolution, observability.
dbt / transformation layer — how do you organize models, test them, version them, document them.
Domain conversation — what would you build for their business. How would you measure their key metrics.
Behavioral — communication with stakeholders, prioritization, debugging tough data issues, dealing with bad data.

Pure DSA / LeetCode is less common for data analytics engineering than for software engineering — but Python for data manipulation is fair game, and one tricky-algorithm round isn't unusual at infrastructure-heavy companies.

The folder, in reading order

Section A — Orient (read first)

File	Why
01-the-role	What the role actually involves, the data engineer vs analytics engineer vs analyst triangle, what to ask them
02-positioning-from-scratch	How to interview honestly when your prior stack doesn't perfectly match theirs

Section B — Core technical

File	Why
03-modern-data-stack	How warehouse + dbt + orchestrator + BI fit together
04-sql-deep-dive	SQL fluency — the single most-tested skill in this loop
05-dbt-deep-dive	The defining tool of analytics engineering — models, sources, tests, snapshots, macros
06-data-modeling	Kimball star schema, slowly-changing dimensions, OBT, grain
07-data-pipelines	ETL vs ELT, batch vs streaming, idempotency, backfills
08-data-quality	Tests, freshness, anomaly detection, contracts
09-warehouses	Snowflake, BigQuery, Redshift, Databricks, Iceberg/Delta — pricing, partitioning, performance

Section C — Coding

File	Why
10-sql-patterns	The 12 SQL patterns that cover 80% of interview questions
11-sql-problems	11 SQL problems worked out, drill mode
12-python-for-data	pandas, polars, when to leave SQL

Section D — Production / cloud

File	Why
13-orchestration	Airflow, Dagster, Prefect — DAGs, retries, idempotency
14-observability	Lineage, freshness, volume monitoring, anomaly detection

Section E — Domain & execution

File	Why
17-ai-compute-domain	GPU telemetry, inference logs, billing events, unit economics — what data looks like at an AI infra company
15-interview-questions	~30 practice Q&As. Drill these out loud
16-day-of	Tactics, traps, questions to ask them. Re-read morning of

Suggested study schedule

If you have 7+ days

Day 1: 01, 02 (orient) → 03 (modern data stack).
Day 2: 04 (SQL deep dive) + 10 (SQL patterns).
Day 3: 11 (SQL problems — drill on a timer).
Day 4: 05 (dbt) + 06 (data modeling).
Day 5: 07 (pipelines) + 08 (quality) + 09 (warehouses).
Day 6: 12 (Python) + 13 (orchestration) + 14 (observability) + 17 (AI compute domain).
Day 7: Drill 15. Read 16. Sleep.

If you have 2-3 days

01, 02, 04, 05, 06, 11 (drill SQL), 15, 16. Skim everything else.

If you have < 24 hours

01, 02, 11 (the SQL problems — most likely to come up), 15, 16. Skim 04, 05, 06 headings only.

Two practical things to do before interview day

Reading is cheaper than building, but building sticks. If you can find an evening or two:

Spin up dbt against a real warehouse. Free options: dbt Cloud's free tier, BigQuery sandbox, Snowflake's trial. Clone jaffle_shop (dbt's reference project) or one of the dbt sample projects. Run dbt build. Inspect the lineage graph. Add a test that fails on purpose. The whole stack clicks once you've watched it run.
Solve 10 SQL window-function problems on a real warehouse. Pick problems from 11-sql-problems or LeetCode SQL. Solve them in the warehouse UI, not in a code editor. Watching the query plan execute is where intuition comes from.

These two evenings would close more of your gap than the same time spent rereading.

What "winning" looks like in these rounds

For data analytics engineering, winning is:

SQL fluency under pressure — you can write CTEs and window functions without flinching. Pattern recognition: "this is a gaps-and-islands problem" in 10 seconds.
Modeling instinct — given a business, you reach for grain, dimensions, facts, SCDs in the right order. You ask "what's the grain of this table?" first.
Failure-mode thinking — late-arriving data, duplicate events, schema drift, time-zone confusion, NULL handling. Mention these unprompted.
Stakeholder fluency — you talk about metrics in business terms (cohort retention, gross margin, unit economics) and translate to SQL.
Operational maturity — tests, alerts, lineage, ownership. Data is software; treat it accordingly.
Honesty at the edge — "I haven't used Iceberg in production but I understand the table-format shape and the operational story" beats faking.