Section A · Orient · Read first

Start Here

Interview prep for Senior Data Analytics Engineer and Data Engineer roles — especially at modern AI / GPU / inference platform companies.

The role, in plain English

This guide is for engineers preparing for roles that sit in the modern data stack — usually titled Senior Data Analytics Engineer, Analytics Engineer, Senior Data Engineer, or some hybrid. These titles overlap heavily; the work is recognizable:

  • Own the transformation layer — raw operational data lands in the warehouse; you turn it into analytics-ready, well-tested, documented models that the rest of the company queries.
  • Live in SQL + dbt — the core craft. Everything else is plumbing around these two.
  • Bridge engineering and analytics — you understand data engineering (pipelines, infra) and you understand analyst needs (metrics, dimensions, reports). You translate.
  • Build the metrics layer — ARR, retention, GPU utilization, inference cost-per-token, whatever the business runs on. You own the definitions and the freshness.
  • Drive data quality — tests, lineage, ownership, contracts. When dashboards break, you're who they call.

At an AI infrastructure company (GPU marketplace, inference cloud, training compute platform) the data is unusually rich: GPU telemetry, inference request logs, billing events, multi-tenant usage, model performance. The shape of the work is the same; the domain is specific. See 17-ai-compute-domain.

What the rounds typically test

A typical loop for these roles includes:

  1. SQL live coding — almost always. Window functions, CTEs, complex joins, dedup, gaps-and-islands. This is the highest-signal screen.
  2. Data modeling discussion — given a business, design the warehouse model. Star schema, fact/dimension tables, SCDs, grain.
  3. System design / data architecture — design a pipeline end-to-end. Source → land → transform → serve. Cover freshness, idempotency, schema evolution, observability.
  4. dbt / transformation layer — how do you organize models, test them, version them, document them.
  5. Domain conversation — what would you build for their business. How would you measure their key metrics.
  6. Behavioral — communication with stakeholders, prioritization, debugging tough data issues, dealing with bad data.

Pure DSA / LeetCode is less common for data analytics engineering than for software engineering — but Python for data manipulation is fair game, and one tricky-algorithm round isn't unusual at infrastructure-heavy companies.

The folder, in reading order

Section A — Orient (read first)

FileWhy
01-the-roleWhat the role actually involves, the data engineer vs analytics engineer vs analyst triangle, what to ask them
02-positioning-from-scratchHow to interview honestly when your prior stack doesn't perfectly match theirs

Section B — Core technical

FileWhy
03-modern-data-stackHow warehouse + dbt + orchestrator + BI fit together
04-sql-deep-diveSQL fluency — the single most-tested skill in this loop
05-dbt-deep-diveThe defining tool of analytics engineering — models, sources, tests, snapshots, macros
06-data-modelingKimball star schema, slowly-changing dimensions, OBT, grain
07-data-pipelinesETL vs ELT, batch vs streaming, idempotency, backfills
08-data-qualityTests, freshness, anomaly detection, contracts
09-warehousesSnowflake, BigQuery, Redshift, Databricks, Iceberg/Delta — pricing, partitioning, performance

Section C — Coding

FileWhy
10-sql-patternsThe 12 SQL patterns that cover 80% of interview questions
11-sql-problems11 SQL problems worked out, drill mode
12-python-for-datapandas, polars, when to leave SQL

Section D — Production / cloud

FileWhy
13-orchestrationAirflow, Dagster, Prefect — DAGs, retries, idempotency
14-observabilityLineage, freshness, volume monitoring, anomaly detection

Section E — Domain & execution

FileWhy
17-ai-compute-domainGPU telemetry, inference logs, billing events, unit economics — what data looks like at an AI infra company
15-interview-questions~30 practice Q&As. Drill these out loud
16-day-ofTactics, traps, questions to ask them. Re-read morning of

Suggested study schedule

If you have 7+ days
  • Day 1: 01, 02 (orient) → 03 (modern data stack).
  • Day 2: 04 (SQL deep dive) + 10 (SQL patterns).
  • Day 3: 11 (SQL problems — drill on a timer).
  • Day 4: 05 (dbt) + 06 (data modeling).
  • Day 5: 07 (pipelines) + 08 (quality) + 09 (warehouses).
  • Day 6: 12 (Python) + 13 (orchestration) + 14 (observability) + 17 (AI compute domain).
  • Day 7: Drill 15. Read 16. Sleep.
If you have 2-3 days

01, 02, 04, 05, 06, 11 (drill SQL), 15, 16. Skim everything else.

If you have < 24 hours

01, 02, 11 (the SQL problems — most likely to come up), 15, 16. Skim 04, 05, 06 headings only.

Two practical things to do before interview day

Reading is cheaper than building, but building sticks. If you can find an evening or two:

  1. Spin up dbt against a real warehouse. Free options: dbt Cloud's free tier, BigQuery sandbox, Snowflake's trial. Clone jaffle_shop (dbt's reference project) or one of the dbt sample projects. Run dbt build. Inspect the lineage graph. Add a test that fails on purpose. The whole stack clicks once you've watched it run.
  2. Solve 10 SQL window-function problems on a real warehouse. Pick problems from 11-sql-problems or LeetCode SQL. Solve them in the warehouse UI, not in a code editor. Watching the query plan execute is where intuition comes from.

These two evenings would close more of your gap than the same time spent rereading.

What "winning" looks like in these rounds

For data analytics engineering, winning is:

  • SQL fluency under pressure — you can write CTEs and window functions without flinching. Pattern recognition: "this is a gaps-and-islands problem" in 10 seconds.
  • Modeling instinct — given a business, you reach for grain, dimensions, facts, SCDs in the right order. You ask "what's the grain of this table?" first.
  • Failure-mode thinking — late-arriving data, duplicate events, schema drift, time-zone confusion, NULL handling. Mention these unprompted.
  • Stakeholder fluency — you talk about metrics in business terms (cohort retention, gross margin, unit economics) and translate to SQL.
  • Operational maturity — tests, alerts, lineage, ownership. Data is software; treat it accordingly.
  • Honesty at the edge — "I haven't used Iceberg in production but I understand the table-format shape and the operational story" beats faking.