Section B · Quality

Data Quality & Testing

Tests, freshness, anomaly detection, contracts. How data teams build trust — and lose it.

Three layers of data quality

  1. Schema / structural — does the data have the right shape? (column existence, types, nullability)
  2. Logical / business rules — does the data satisfy expected invariants? (primary keys unique, foreign keys resolve, values in expected set, business arithmetic holds)
  3. Statistical / distributional — does the data look right? (row counts in expected range, column distributions stable, no anomalous spikes/drops)

Different tools handle each. dbt tests cover layers 1 and 2 well. Layer 3 needs observability tooling — Monte Carlo, Anomalo, Elementary, or custom anomaly checks.

Tests — dbt and beyond

The dbt built-ins

  • unique — column has no duplicates.
  • not_null — column has no nulls.
  • accepted_values — column values are within a list.
  • relationships — every value in column A exists in column B of another table (foreign key check).

dbt-utils additions

  • unique_combination_of_columns — composite unique key.
  • not_null_proportion — at least X% of rows have non-null.
  • recency — most recent row is within X hours.
  • at_least_one — table has at least one row.
  • expression_is_true — arbitrary boolean expression.

dbt-expectations (Great Expectations port)

Distribution tests, regex tests, statistical bounds. Useful for layer 3 checks within dbt.

Singular tests

Custom SQL in tests/ that returns failing rows. Empty result = pass. Use for business-logic checks no built-in covers:

tests/order_revenue_matches_line_items.sql
SELECT
  o.order_id,
  o.revenue AS order_revenue,
  SUM(oi.line_revenue) AS line_revenue_sum
FROM {{ ref('fct_orders') }} o
JOIN {{ ref('fct_order_items') }} oi USING (order_id)
GROUP BY o.order_id, o.revenue
HAVING ABS(o.revenue - SUM(oi.line_revenue)) > 0.01

Freshness

Data that's stale lies. Worse, it lies silently — the dashboard looks the same, the numbers just stop moving.

dbt source freshness

_sources.yml
sources:
  - name: raw_app
    loaded_at_field: _ingested_at
    freshness:
      warn_after:  {count: 2, period: hour}
      error_after: {count: 6, period: hour}
    tables:
      - name: orders

Run dbt source freshness in CI / scheduled job. Page or alert on errors.

Model freshness

Test that downstream models have recent data:

recency test (dbt-utils)
models:
  - name: fct_orders
    tests:
      - dbt_utils.recency:
          datepart: hour
          field: ordered_at
          interval: 2

Anomaly detection

Tests catch boolean invariants. They don't catch "revenue dropped 60% today and no one noticed." That's anomaly territory.

Built-in patterns

  • Row count anomalies — today's row count is N standard deviations from the trailing average.
  • Schema anomalies — column added / removed / type changed unexpectedly.
  • Distribution anomalies — column's distribution shifted (mean, p50, p95).
  • NULL anomalies — null proportion changed.

Tools

  • Elementary — open-source, dbt-native. Stores test results + anomaly detection in your warehouse.
  • Monte Carlo, Anomalo, Bigeye — commercial, ML-driven, less config.
  • Roll your own — a few dbt-utils tests + scheduled queries can do a lot.

Data contracts

An explicit, versioned agreement between data producer and consumer. The producer commits to: these columns, these types, this freshness, these uniqueness guarantees. The consumer commits to: only relying on what's documented.

Implementation:

  • dbt model contracts — declare expected columns + types in YAML; CI fails if the model output doesn't match.
  • Schema registry for streaming events (Confluent, Glue Schema Registry).
  • Backstage / data catalogs for org-wide visibility.

The cultural piece matters as much as the technical piece. Contracts shift "data team gets blindsided when engineer drops a column" to "producer can't ship without breaking the build."

Severity strategy

If every test is "error severity, page on-call," tests become noise and get ignored. Smart severity:

  • Severity error — true contracts. Page if violated. Examples: PK uniqueness, freshness on source data, business rules where wrong data = wrong decisions.
  • Severity warn — signal, not gate. Investigate when convenient. Examples: distribution shifts, low-importance accepted-values lists, soft thresholds.
  • Threshold-based errors — allow some failures. Example: error_if: >100 means warn at 1-100 failing rows, error at 101+.
The trap

Adding tests without owners. A test that fails and no one fixes is noise. Every test should have a clear owner in the model's YAML and a runbook for what to do when it fires.

Incident response

When a quality alert fires, the discipline is:

  1. Triage — is it real, or a flaky test? Look at the failing rows.
  2. Communicate — if downstream dashboards are affected, post in the data channel. Don't wait until stakeholders find it.
  3. Fix the data — the immediate priority. Backfill, re-run, mark records, whatever.
  4. Root-cause — what change introduced this? Was it a schema change, a logic bug, a data drift?
  5. Prevent — add a test that would have caught this earlier. Document the gotcha in the model's yaml.
  6. Postmortem for the painful ones. Mature data teams run them like SREs run incident reviews.

Talking points

"How would you onboard data quality?"

"Three layers. First, dbt tests on every model — unique, not_null, relationships, accepted_values at minimum. Custom singular tests for business rules. Second, source freshness checks that alert when upstream stops loading. Third, an observability layer — Elementary or Monte Carlo — for anomalies tests can't catch: distribution shifts, volume drops, schema drift. Critical operational piece: severity strategy. Severity-error for true contracts (PK uniqueness, freshness), severity-warn for signal-not-gate. And every test needs an owner — a test that fails and nobody fixes is just noise."