Section A · Orient

The Roles, Decoded

HeyGen founding-DS and Cohere Lead-DS side by side. What each team actually does, what each line of the JD really means, and where the two roles overlap vs diverge.

HeyGen — Data Scientist

Posting at a glance

Role: Data Scientist · Comp: $170k–$220k · Locations: SF, Palo Alto, LA · Seniority: 3+ years

What HeyGen is

HeyGen is an AI-powered video creation SaaS — avatars, voice cloning, language dubbing, scripted video at scale. The product space is "AI tools that make professional video accessible." They're growing fast (the JD reads "rapidly growing startup," "highly dynamic, fast-paced environment") and the DS function is being established, not inherited.

What this role actually is

Read literally, this is a Data Scientist req. Read carefully, it's a founding product DS req. The first three responsibilities are infrastructure-coded:

  • "Define and establish data/experimentation stack"
  • "Partner with engineering on data quality, scalability, and deployment constraints"
  • "Build best practices for data-driven decision-making"

That phrasing means there is no robust experimentation platform yet. There may not be a clean event-tracking schema. There almost certainly isn't a metric layer. You're hired to build the foundation, not run experiments on a mature stack.

The rest of the responsibilities are classical product DS: analyze datasets, define metrics, run A/B tests, build dashboards, translate analyses for stakeholders. The order matters — they listed infrastructure first.

What "real-world deployment constraints" implies

This phrase in the qualifications section is doing work. Read it as: "you'll need to discuss your analyses with engineers in a way that respects how the product is actually built". Concretely: if you propose a metric that requires logging an event the system doesn't emit, you need to be able to negotiate that with engineering, not just file a ticket.

The signal in "champion best practices"

This is cultural-leadership language. At a "first DS" role, you don't just do the analysis — you teach the rest of the company how to think about data. PMs will ask you "is this lift significant?" and you have to answer in a way that improves their next instinct, not just resolves the current question.

Cohere — Lead Data Scientist, Analytics & Data Insights

Posting at a glance

Role: Lead Data Scientist · Team: Analytics & Data Insights, in the Agentic Platform org · Locations: US/Canada remote (offices in Toronto, NYC, SF, London, Paris) · Seniority: Senior IC + team lead

What Cohere is

Cohere trains and deploys frontier LLMs for enterprises. Their products include the Command model family, embeddings (Embed), and rerankers (Rerank). The customer base is enterprises building RAG, agents, and search on top of Cohere models — banks, telcos, healthcare, governments. They compete with OpenAI/Anthropic on the enterprise side specifically.

What this role actually is

The job title is "Lead Data Scientist" but the team is "Analytics & Data Insights." That's a deliberate framing. This isn't a model-training role on the research team. It's a business analytics leadership role sitting in the Agentic Platform org — close to product, sales, and finance, helping the company make GTM decisions about technology that's still being invented.

The responsibilities cluster into three buckets:

  1. Experimentation programs: "A/B tests, multi-armed bandits, causal inference studies, that directly map to product and go-to-market decisions." The MAB + causal call-out is unusual — they're telling you the bar is "rigorous quasi-experimental design," not just two-sample t-tests.
  2. Predictive models for the business: "forecasting, segmentation, propensity scoring, and opportunity sizing across Cohere's core business lines." These are sales/finance/marketing models — predicting which enterprise accounts will convert, sizing the API revenue from a customer segment, forecasting model usage.
  3. Team leadership: "manage a team of analysts and data scientists. Set the technical bar, mentor aggressively, and create an environment where exceptional people do their best work."

What "act like an owner" implies

The JD says "no waiting around. You'll define analytical priorities, allocate resources, and push initiatives from question to production." Translation: you're not waiting for a roadmap. You're identifying which analytical bets matter most for the business and pushing your team toward them — including killing work that doesn't.

The signal in "shape strategy"

The line "your team's work will get built into products and implemented into strategy" is the punchline. The role sits close enough to leadership that good analytical work becomes the strategy, not just a report deck. That implies stakeholder altitude is partway up the org — you'll present to VPs and the C-suite.

The stacks, decoded

Both JDs list explicit tools. Here's what each one signals.

HeyGen stack

JD phraseWhat it means
"Expert SQL"SQL screen is the gatekeeper. Window functions, CTEs, self-joins. You will be asked one or two timed problems.
"R or Python"Python in practice. R fluency is a nice-to-have for someone with stats heritage; nobody at an AI startup is greenfielding in R.
"A/B testing frameworks (unspecified)"Tells you they don't have one yet. Be ready to discuss Optimizely vs Statsig vs GrowthBook vs roll-your-own, with tradeoffs.
"BI tools (unspecified)"Same. Looker vs Mode vs Hex vs Metabase. Know what each is good at.
"Data experimentation stack (to be defined)"The literal text. They want you to define it.

Cohere stack

JD phraseWhat it means
"Strong command of SQL, Python, and Git"Standard. Git is called out, which signals they want production-grade contributions, not notebook hand-offs.
"Statistical inference, experimental design, predictive modeling"The three pillars they'll probe in interviews. Expect a stats round, an experimental design round, and a modeling round.
"BigQuery, dbt, Looker, or Airflow (nice to have, not essential)"Soft signal that the stack is modern. They explicitly say it's not essential, but mentioning your experience with these scores points.
"Genuine excitement about AI - you follow the research"Be ready to talk about a recent paper or product you've found compelling. Specific, not generic.

The shared spine

Strip away the level and the company and both roles converge on the same five-pillar bar:

  1. SQL fluency at speed — window functions, cohorts, funnels, retention, gotchas. Both loops will test this.
  2. Rigorous experimentation — A/B design from a fuzzy product question, sample size, peeking. Cohere extends this into MABs and causal inference.
  3. Predictive modeling for business decisions — forecasting, segmentation, propensity. Practical, not exotic.
  4. Product metrics fluency — north star, leading vs lagging, guardrails. Both expect you to talk like a PM does.
  5. Communication to non-technical stakeholders — both JDs say it explicitly. The work is only as good as the recommendation it produces.

Sections B, C, and the drill set of this guide all target this spine.

The seniority delta

HeyGen is mid-level (3+ years). Cohere is senior IC + manager. The delta shows up in three places:

  • Scope of ambiguity. HeyGen says "highly dynamic." Cohere says "tackle problems that don't have textbook answers yet." Both want comfort with ambiguity, but Cohere is hiring someone who will operate at a level where the question itself is unclear, not just the answer.
  • Team leadership. HeyGen wants someone who collaborates and champions best practices. Cohere wants someone who manages a team, sets the technical bar, and mentors. This is the biggest single difference between the two loops — Cohere will have at least one "tell me about leading a team" round; HeyGen probably won't.
  • Methodological depth. HeyGen lists A/B testing generically. Cohere names "multi-armed bandits, causal inference studies" specifically. Expect a methods round at Cohere that goes deeper than two-sample tests.

Soft signals to expect

Both JDs lean into the same cultural patterns. Reading these in advance helps you not be caught off guard:

  • "Real ownership" / "act like an owner." Both companies say this. They mean it. In interviews, frame your past work as decisions you owned, not tasks you were assigned.
  • "Fast-paced" / "high-velocity." Both. The implicit question is: can you produce a defensible recommendation in days, not months?
  • "Ambiguity" / "no textbook answers." Both. They're testing whether you reach for a textbook (bad) or for a framing of the problem (good) when stuck.
  • "Lay the foundation" (HeyGen) / "build the foundational infrastructure" (Cohere). Both companies are signaling that what exists today is incomplete. You're not optimizing a working system — you're building parts of it.

What to ask them

The questions you ask are part of the signal you send. A handful that land well in either loop:

  • "What does the current experimentation infrastructure look like, and what's the biggest gap?" — Forces specificity. Lets them tell you what they actually need help with.
  • "Walk me through a recent analysis that changed a product or GTM decision." — Gauges whether their analytical work has teeth or is decorative.
  • "What's the one metric the team is most uncertain about, and why?" — Signals you think in terms of measurement uncertainty, not just lift.
  • (Cohere) "How does the analytics team partition work between embedded DS and central platform?" — Senior-level org question. Shows you've thought about structure.
  • (HeyGen) "Who owns event-tracking schema decisions today? Where do those decisions live?" — Lets you find out if you'd be greenfielding or backfilling.
  • "What does failure look like in this role at 90 days? At a year?" — Best generic question in the bank. Forces them to describe the role concretely.