Domain Context
The vocabulary and stakes for both target domains — fraud & identity verification (SentiLink) and multimodal sensor AI (Archetype). Enough to sound credible in interviews and ask good questions back.
A · Fraud & identity (SentiLink-flavored)
SentiLink sits between credit applications and lenders, scoring identity authenticity. Knowing the domain language is part of the loop.
Vocabulary
| Term | Meaning |
|---|---|
| Synthetic identity | An identity manufactured from real and fake components — e.g., a real SSN with a fabricated name and DOB. |
| First-party fraud | A real person applies in their own name, intending to default. "Bust-out fraud" is the canonical pattern — pay down small balances, then max out and abscond. |
| Third-party fraud | Identity theft — someone else's data used to obtain credit. |
| Account takeover (ATO) | Adversary gains access to a real customer's account. |
| KYC | Know Your Customer — regulated identity verification, US-driven by USA PATRIOT Act and CIP rules. |
| AML | Anti-Money Laundering — broader regulatory framework targeting transaction patterns indicating laundering. |
| SAR | Suspicious Activity Report — filing to FinCEN required when a bank detects suspected illegal activity. |
| Bureau | Credit bureau (Experian, Equifax, TransUnion) — historical credit data. |
| eCBSV | Electronic Consent-Based SSN Verification — SSA service to confirm SSN/name/DOB match. SentiLink was first to go live. |
| OFAC | Office of Foreign Assets Control — US sanctions enforcement; institutions must screen counterparties against the SDN list. |
| Chargeback | A reversal of a card transaction, typically initiated by the issuer at the customer's request. A delayed fraud label. |
| Bust-out | Fraud pattern: build credit history, max out, vanish. Often a synthetic identity executed over months. |
| CIP | Customer Identification Program — required US bank policy specifying how new accounts verify identity. |
Types of fraud relevant to SentiLink's products
Synthetic identity fraud
Modeled by SentiLink at the application layer. The hard problem: a synthetic identity can have real bureau data (because the SSN was issued and may be associated with light real history). The signals are usually structural: implausible age vs SSN issuance window, address-history sparsity, device or behavioral anomalies during application.
First-party fraud (intent)
Hardest to model because the applicant uses real, accurate data — only their intent is fraudulent. Signals are behavioral: bust-out patterns, geographic anomalies in transaction history, application-to-credit-pull timing.
Identity theft
Real victim's data used. Easier in some ways — the legitimate person and their patterns exist on record, the fraudster's behavior often diverges (different device, geography, application velocity).
Identity verification
The pipeline at a typical lender:
- Applicant fills out an application.
- Identity verification: name + SSN + DOB + address validated via bureau and eCBSV.
- Fraud scoring: SentiLink-style score on the identity itself.
- Credit decisioning: combine identity confidence with credit history → approve / decline / manual review.
- If reviewed: a human investigator looks at the application, decides, often files documentation.
SentiLink's score lives in step 3, with an output that influences whether step 4 trusts the input.
Regulatory context
Knowing these names lets you discuss the stakes credibly:
- Bank Secrecy Act (BSA): foundational US AML law. Requires SARs, CTRs (Currency Transaction Reports), and CIP.
- FFIEC: Federal Financial Institutions Examination Council — issues guidance banks follow.
- FinCEN: Financial Crimes Enforcement Network — collects SARs and CTRs.
- OCC, FDIC, Federal Reserve: bank regulators. Examine compliance programs periodically.
- CFPB: Consumer Financial Protection Bureau — consumer protection in financial services. Fair-lending implications for AI models.
- FCRA: Fair Credit Reporting Act — governs use of consumer reports for credit decisions. Models that use bureau data must comply.
- ECOA / Reg B: Equal Credit Opportunity Act — prohibits discrimination in credit. Fair-lending model validation is a big deal.
A staff DS at SentiLink doesn't need to be a compliance expert, but should know these names exist and how they shape model design (specifically: adverse-action notices, fair-lending considerations, model explainability for regulator review).
B · Multimodal sensor AI (Archetype-flavored)
Archetype builds Newton — a multimodal LLM for physical-world AI. The customer base spans industrial monitoring, mobility, retail analytics, IoT applications.
Vocabulary
| Term | Meaning |
|---|---|
| Multimodal | A model that processes multiple input types (image + text + sensor) jointly. |
| Sensor fusion | Combining inputs from multiple sensors to produce a more reliable estimate than any single sensor. |
| Lens | (Archetype-specific) a configurable analytical operation on Newton — e.g., "count people," "detect anomalies in equipment vibration." |
| Edge inference | Model runs on a device (camera, sensor, gateway) rather than the cloud. |
| Inertial sensors / IMU | Inertial Measurement Unit — accelerometer + gyroscope + sometimes magnetometer. |
| Computer vision tasks | Detection, segmentation, tracking, action recognition, depth estimation. |
| Time-series classification | Predict a label from a window of time-series data (activity recognition, equipment failure). |
| Anomaly detection | Flag observations that don't fit the normal pattern — common ask in industrial monitoring. |
| Synchronization | Aligning multiple sensor streams to a common time base. |
| Foundation model (multimodal) | Large pretrained model handling multiple modalities — CLIP, GPT-4V, Gemini, etc. Newton is positioned in this category. |
Use cases
Common applications for multimodal sensor AI platforms:
- Industrial monitoring: predictive maintenance from vibration, temperature, acoustic sensors on machines.
- Retail analytics: foot traffic, dwell time, queue lengths from cameras.
- Mobility: driver behavior, fleet management, traffic flow analysis.
- Safety: PPE compliance, fall detection, perimeter monitoring.
- Process optimization: throughput analysis on assembly lines, anomalies in operations.
- Healthcare: patient monitoring, gait analysis, sleep tracking.
What makes these hard
- Customer data is heterogeneous — every customer's sensors and conditions are different.
- Labels are scarce and expensive — manual annotation of video / sensor traces.
- Edge cases are operationally critical (the rare event you must catch) but rare in data.
- Customer expectations vary — some want point estimates, some want intervals, some want narratives.
Interview probes
Show probe 1: "What's synthetic identity fraud, and why is it hard to detect?"
An identity manufactured by combining real elements (often a real SSN, sometimes belonging to a child) with fabricated elements (name, DOB, contact info). Hard because synthetic identities can have legitimate bureau records — the fraudster builds credit slowly first. Detection signals are structural: implausible age vs SSN issuance window, address-history sparsity, application velocity at the same device or address, behavioral anomalies during application (typing patterns, copy-paste of fields).
Show probe 2: "What is eCBSV?"
Electronic Consent-Based SSN Verification — an SSA service banks can use, with the applicant's consent, to confirm SSN matches the name and DOB. Replaces older indirect methods. SentiLink was the first private company to go live with it.
Show probe 3: "Why is fair-lending validation a big deal for fraud models?"
Even though fraud models aren't credit-decision models, they affect who gets credit. If a fraud model declines or flags applicants from a protected class at disproportionate rates, that's a fair-lending issue (ECOA / Reg B). Staff DS work includes disparate-impact analysis, often comparing decline rates across racial and geographic groups, and identifying features that may proxy for protected attributes (ZIP code is the classic example).
Show probe 4: "What's a 'lens' at Archetype, in your understanding?"
A configurable analytical operation on Newton — basically a templated combination of prompts and parameters that performs a specific task ("count people in a region," "detect machinery anomalies," "classify activities"). The lens is what Solutions Engineers and customers actually invoke. The DS role configures lens parameters and the prompts driving them, per POC.
Show probe 5: "What makes multimodal sensor AI hard compared to a single-modality model?"
Three sources of difficulty. (1) Synchronization — aligning streams from sensors with different rates, clocks, and reliabilities. (2) Modality fusion strategy — how to combine an image with a vibration trace meaningfully. (3) Data heterogeneity — every deployment has different sensors, conditions, and labels, making transfer between deployments hard. The strongest practitioners reduce these to common scaffolding: standardized preprocessing, modality-agnostic feature extraction where possible, and prompt-based composition over a flexible foundation model.