Product Metrics
How to talk about metrics the way a great PM does — north star, leading vs lagging, funnels, retention shapes, guardrails, and the protocols for "this metric dropped, what do we do."
Why metric fluency matters
Product DS loops always test product sense — the ability to reason about a product through its metrics. The questions sound conversational:
- "How would you measure success for [feature]?"
- "What's the right north star metric for [product]?"
- "Metric X dropped 5% — what do you do?"
- "We're launching [thing]; what's the metric set you'd track?"
These are easy to answer badly and hard to answer well. Bad answers list metrics generically. Good answers commit to a hierarchy, defend the primary, anticipate failure modes, and tell the interviewer what you'd ignore on purpose.
North star metrics
A north star is the one metric that, if it moves consistently up, the company is winning. It has to be:
- Tied to user value — moving it should require giving users something they want, not gaming.
- Movable — the team can affect it within a meaningful time horizon. "Lifetime customer happiness" is a poor north star; "weekly active creators publishing > 1 video" might work.
- Sensitive — small product changes should produce visible movement in a reasonable window.
- Resistant to gaming — chosen so that the easy ways to boost it are also good for users.
Canonical shapes
| Product | North star |
|---|---|
| Video creation (HeyGen-like) | Weekly videos published per active creator |
| Enterprise AI API (Cohere-like) | Weekly active enterprise accounts with > N API calls |
| Social / messaging | Weekly sends per active user |
| Marketplace | Successful transactions per session |
| Subscription SaaS | Net revenue retention (cohort-based) |
The "compound" trap
"DAU × sessions per DAU × actions per session" feels comprehensive but obscures which lever is moving. Pick one metric, accept the loss of granularity, and rely on a secondary set to triangulate.
Leading vs lagging indicators
Lagging indicators reflect the final business outcome — revenue, retention, NPS. They're authoritative but slow. Leading indicators predict the lagging outcome and move faster. Good teams instrument both and use leading indicators to make decisions within a quarter without waiting for lagging confirmation.
When asked about a metric set, a senior DS pairs leading and lagging: "the lagging metric is paid conversion within 30 days. The leading metric I'd watch weekly is users who hit the upgrade modal at least once." Naming the pair shows you've thought about decision velocity, not just measurement completeness.
Patterns
- Activation events (completed setup, first publish, first API call) lead retention.
- Engagement frequency (sessions/week) leads churn.
- Quality signals (rating, abandonment) lead refund / cancellation.
- Support volume on a specific surface leads cancellation tied to that surface.
Funnels and conversion
A funnel is the canonical product diagnostic. Define every step that gates the next, measure conversion step-by-step, find the biggest drop. Two things to get right:
Step definition
Each step must be unambiguous and time-ordered. "Signed up" → "uploaded a video" → "published a video" is clean. "Engaged with the product" is not — define it.
Denominator discipline
Conversion at step N can be measured against: (a) the step N-1 base (step-conversion), or (b) the funnel-top base (overall conversion). They diverge and matter for different questions. Stakeholders often want overall; product teams often want step-conversion. Always say which one you're showing.
Retention shapes
Retention curves come in three families. Knowing which family you're in shapes everything downstream:
- Decaying to zero — like a leaky bucket. Most consumer apps. Optimization target: shift the curve up at every horizon.
- Decaying to a floor — a stable "habit" cohort survives. Like Spotify or Slack. Optimization target: raise the floor.
- "Smile curve" — retention dips then rises (re-engagement). Common in transactional products. Means there's a re-engagement loop worth investing in.
Cohort vs aggregate
Always look at cohort retention, not aggregate "active users this week / active users last week." The latter conflates cohort effects with growth and can hide deterioration when new users mask the loss of old ones.
If MAU grows 20% per month and W4 retention is dropping 5% per month, aggregate metrics look great. The cohort analysis shows the product is rotting. This is one of the most common product-DS blind spots, and one interviewers love to test.
Guardrails
Guardrails are metrics you'd halt an experiment or roll back a launch on, even if the primary moved positively. The classic set:
- Latency / performance — p50, p95, p99. A primary that lifts conversion 5% by adding 2 seconds to TTI is usually not a ship.
- Crash / error rate — non-negotiable.
- Customer support tickets — a leading indicator of user pain that doesn't always show up in retention quickly.
- Revenue per user — even if conversion lifts, watch ARPU.
- Adverse-event metrics specific to the product — for AI: refusal rate, hallucination rate; for marketplaces: dispute rate; for fintech: chargeback rate.
Defining a metric from scratch
Common loop prompt: "We're launching X. What's the metric?" The script:
- What's the decision the metric is going to inform? Ship/no-ship a feature? Allocate marketing budget? Set product strategy?
- What's the unit of analysis? User, session, account, request.
- Pick the metric definition: numerator, denominator, time window, inclusion rule.
- State what it doesn't measure: every metric blinds you to something. Name what.
- Name the secondary metrics that triangulate: the leading and lagging pair.
- Name the guardrails: what would make you stop, even if the primary is positive.
"For an AI video creation product, the north star I'd propose is weekly active creators who published at least one video. Decision: where to invest in onboarding vs creation tools. Unit: user. Inclusion: any user who created an account ever. What it misses: video quality and downstream use of the video. Leading: weekly users who uploaded any source asset. Lagging: paid conversion within 30 days. Guardrails: render time p95, refund rate, support tickets about exports."
Diagnosing a metric drop
"Conversion dropped 5% last week — what do you do?" The protocol:
- Is it real? Check the data pipeline first. Did event volume drop? Did the schema change? Did a new release suppress an event? Half of "metric drops" are instrumentation breakage.
- Is it the same denominator? If the funnel-top grew (a marketing push pulled in a wider audience), conversion-rate of that wider audience might be lower without anything getting worse for the original audience.
- Slice it: platform, country, browser, traffic source, plan, new vs returning. Find where the drop concentrates.
- Correlate with releases: what shipped that week? Both your team's releases and dependencies (upstream API changes, marketing campaigns).
- Form a hypothesis: based on slices + releases, what's the most likely cause?
- Validate or kill the hypothesis: confirm with a focused query, or design an experiment to test.
- Communicate: a one-paragraph "what we know, what we suspect, what we're doing next" update to stakeholders. Update daily.
When you reach for slicing, name which slices first and why. "I'd start with platform because mobile releases are independent and a regression there wouldn't show up in web. Then traffic source — if a campaign pulled in lower-intent users, the drop is composition not regression." Specificity here is the difference between "looks like a DS" and "is a DS."
Interview probes
Show probe 1: "Pick a north star metric for HeyGen and defend it."
"Weekly active creators who published ≥ 1 video." Defense: ties to user value (a video is the user's primary output), movable on a weekly horizon, sensitive (a friction in the create→publish flow shows up immediately), and hard to game (you can't fake a publish without producing a video). The lagging pair is paid conversion within 30 days; the leading pair is weekly users who uploaded any source asset. The thing it doesn't measure is video quality — handled via a guardrail on refund rate and a quality metric tracked separately.
Show probe 2: "What's the difference between funnel-step conversion and overall conversion?"
Step conversion measures users who completed step N out of those who reached step N-1 (so it's a per-step efficiency). Overall conversion measures users who completed step N out of all users who entered the funnel. Step conversion isolates the per-step friction; overall conversion is what the business actually cares about. Always say which one you're showing, because they can move in opposite directions when the funnel-top mix changes.
Show probe 3: "Walk me through what you'd do if D7 retention dropped 3 points."
"First, is it real? Check instrumentation — did the events that define 'active' change recently? Then is it composition or quality? Slice by acquisition source, new-vs-returning, platform, country. If a marketing campaign pulled in lower-intent users, the drop is composition. If retention dropped across all existing cohorts uniformly, it's a quality issue with the product. Correlate with releases that week. Form a hypothesis, validate with a focused query. Communicate findings within 24 hours, even if incomplete, with a 'what we know / suspect / plan' framing."
Show probe 4: "How do you measure success for an AI assistant feature?"
"Layered. Adoption: weekly users who invoked the assistant at least once. Engagement: invocations per active user. Quality: task-completion rate (did the user accept the assistant's output?) and refusal/hallucination rate as inverse signals. Business impact: lift in the downstream product metric the assistant is supposed to help — fewer support tickets, more successful publishes, whatever the assistant was scoped to improve. The guardrail is latency p95 and refusal rate."
Show probe 5: "When would you not optimize the north star?"
When optimizing it would harm users or undermine durability of the business. If your north star is 'weekly videos published' and the easiest lift is auto-generating clickbait, you've moved the metric but degraded user trust — that shows up in retention and NPS over months. Senior DS work is in spotting these tradeoffs before launching, not after.