SQL Pattern Menu
The 12 patterns that cover ~80% of analytics-engineer SQL interview questions. Memorize the cues. Drill the implementations in 11-sql-problems.
Recognition cues — the senior tell
The fastest path to looking senior in SQL interviews is naming the pattern out loud before writing code. The interviewer relaxes; they know you'll get there.
"Most recent" / "latest"
→ ROW_NUMBER pattern, latest-per-group. Always specify a tiebreaker.
"Each customer" / "per group"
→ PARTITION BY. Window function if you want all rows; GROUP BY if you want aggregates.
"Cumulative" / "running" / "rolling"
→ Window aggregate with explicit ROWS BETWEEN frame.
"vs previous" / "MoM" / "WoW"
→ LAG. Don't forget NULLIF on the denominator for percentages.
"Continuous" / "streak" / "consecutive"
→ Gaps-and-islands. Subtract row number from date.
"Sessions" / "events within X minutes"
→ Sessionization: LAG + cumulative-sum island.
"Unique by X" / "dedupe"
→ ROW_NUMBER with multi-key ORDER BY for tiebreaking.
"Cohort retention"
→ Cohort assignment → activity join → optional pivot to wide.
"Funnel" / "conversion rate"
→ MIN of timestamp per event-type per user, then ordering checks.
"Wide format" / "one row per X with columns"
→ Pivot. CASE WHEN sums are portable; PIVOT operator is shorter on supported warehouses.
Per-group calculations — when window vs GROUP BY
- GROUP BY: collapse to one row per group. You lose the original rows.
- Window function (OVER PARTITION BY): compute per group but keep all original rows.
Common pattern: compute in a window, filter, then aggregate. Example: "average revenue per customer's top 3 orders" — window-rank, filter to rank ≤ 3, then group + avg.
Time-based analysis
- Period bucketing:
DATE_TRUNC('month', ts)for monthly aggregation. - Filling gaps: LEFT JOIN to dim_date to ensure every day has a row.
- Period-over-period: LAG with the right ordering.
- Year-over-year: self-join on
year = year - 1, or LAG with offset 12 if monthly. - Window over time: cumulative + rolling + same-period-last-year are all just frame variations.
Deduplication
The pattern: ROW_NUMBER over the candidate key with a tiebreaker that picks the "right" duplicate.
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY canonical_key
ORDER BY
prefer_this DESC, -- primary preference (e.g. updated_at DESC)
(col IS NOT NULL) DESC, -- prefer non-null
id ASC -- final tiebreaker — deterministic
) AS rn
FROM source
)
SELECT * FROM ranked WHERE rn = 1;
The final deterministic tiebreaker (a unique column) matters — without it, ties are non-deterministic and re-runs give different "winners."
Sequential / funnel analysis
The "what % of users went from step A to step B to step C" pattern.
Canonical approach: one row per user, MIN of timestamp for each event type, then comparisons:
WITH user_events AS (
SELECT user_id,
MIN(CASE WHEN event = 'A' THEN ts END) AS first_a,
MIN(CASE WHEN event = 'B' THEN ts END) AS first_b,
MIN(CASE WHEN event = 'C' THEN ts END) AS first_c
FROM events GROUP BY user_id
)
SELECT
COUNT(*) AS step_a,
COUNT(*) FILTER (WHERE first_b >= first_a) AS step_b, -- B after A
COUNT(*) FILTER (WHERE first_c >= first_b) AS step_c -- C after B
FROM user_events
WHERE first_a IS NOT NULL;
Reshape — pivot & unpivot
CASE WHEN pivot works everywhere; PIVOT/UNPIVOT operators are shorter where supported.
For dynamic pivot (unknown column values at write time), either generate SQL in app code, use dbt_utils.pivot macro, or just return long format and pivot in the BI tool.
Pattern recognition is the high-value drill. 11-sql-problems has worked examples of each pattern with multiple approaches. Solve them on a real warehouse, not in your head.