Built-in Checks

Tier 1: Declarative Rules (Built-in Types)

The simplest and most common. You declare a type and configuration; the platform generates the SQL and wraps it in a Dagster asset check.

Completeness

Ensures a column has a minimum fraction of non-null values.

checks:
  - name: revenue_not_null
    type: completeness
    config:
      column: total_revenue
      threshold: 0.99           # 99% of values must be non-null
    severity: error

The threshold is a fraction between 0.0 and 1.0. Set to 1.0 for columns that must never be null (like primary keys).

Freshness

Ensures the most recent value in a timestamp column is within a maximum age of now.

checks:
  - name: data_freshness
    type: freshness
    config:
      column: updated_at
      max_age: 6h
    severity: error

Duration formats: 30m, 6h, 1d, 7d.

Volume

Ensures the materialized dataset has a row count within expected bounds.

checks:
  - name: row_volume
    type: volume
    config:
      min_rows: 50
      max_rows: 100000          # optional upper bound
    severity: warn

Volume checks catch both empty datasets (pipeline broke) and explosions (duplicate join or missing filter).

Uniqueness

Ensures no duplicate values for a given column combination.

checks:
  - name: unique_outlet_day
    type: uniqueness
    config:
      columns:
        - outlet_id
        - sale_date
    severity: error

Use a list of columns for composite uniqueness constraints.

Range

Ensures all values fall within a specified range.

checks:
  - name: revenue_positive
    type: range
    config:
      column: total_revenue
      min: 0
      # max: 1000000            # optional upper bound
    severity: error

Accepted Values

Ensures all values in a column belong to an allowed set.

checks:
  - name: valid_status
    type: accepted_values
    config:
      column: order_status
      values:
        - pending
        - confirmed
        - shipped
        - delivered
        - cancelled
    severity: error

Referential Integrity

Ensures all values in a column exist in a referenced product’s column.

checks:
  - name: valid_outlet_ref
    type: referential
    config:
      column: outlet_id
      reference_product: cleaned-outlets
      reference_column: outlet_id
    severity: error

Custom Expression

Evaluates a SQL boolean expression against each row. The threshold specifies the minimum fraction of rows that must satisfy the expression.

checks:
  - name: revenue_matches_count
    type: custom_expression
    config:
      expression: "total_revenue >= 0 AND transaction_count >= 0"
      threshold: 1.0            # 100% of rows
    severity: error

Regex

Ensures all non-null values in a column match a regular expression pattern.

checks:
  - name: valid_email_format
    type: regex
    config:
      column: email
      pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
    severity: warn

Statistical

Ensures an aggregate metric (mean, stddev, median) falls within expected bounds.

checks:
  - name: avg_basket_reasonable
    type: statistical
    config:
      column: avg_basket_size
      metric: mean              # mean | stddev | median
      min: 10.0
      max: 500.0
    severity: warn

Quick Reference Table

Type	Config Fields	What It Checks
`completeness`	`column`, `threshold` (0.0-1.0)	Fraction of non-null values >= threshold
`freshness`	`column`, `max_age`	Most recent value within `max_age` of now
`volume`	`min_rows`, `max_rows`	Row count within bounds
`uniqueness`	`columns` (list)	No duplicate values for column combination
`range`	`column`, `min`, `max`	All values within [min, max]
`accepted_values`	`column`, `values`	All values in allowed set
`referential`	`column`, `reference_product`, `reference_column`	All values exist in referenced product
`custom_expression`	`expression`, `threshold`	Fraction of rows satisfying SQL boolean expression
`regex`	`column`, `pattern`	All non-null values match regex
`statistical`	`column`, `metric`, `min`, `max`	Aggregate metric (mean/stddev/median) within bounds

Severity Levels

Severity	On Failure
`error`	Block downstream serving writes. Enter retry/DLQ flow.
`warn`	Log warning. Continue serving writes.

Threshold Configuration

Setting Appropriate Thresholds

Rule Type	Loose Start	Tighten Over Time
Completeness	0.95	0.99+ for critical columns
Volume	+/- 50% of expected	+/- 20% once stable
Freshness	2x schedule interval	1.5x once reliable
Statistical	3 standard deviations	2 standard deviations

Tip: Start with severity: warn for new checks. Once you are confident in the thresholds, promote to severity: error. This avoids blocking production on false positives while you calibrate.

Severity Guide

Severity	Use When	Impact on Failure
`error`	Data correctness is critical. Wrong data is worse than no data.	Blocks downstream. Enters retry/DLQ.
`warn`	Anomaly worth investigating, but not worth blocking production.	Logs warning. Serving continues.

Rule of thumb: Primary keys, referential integrity, and freshness checks should almost always be error. Volume and statistical checks often start as warn.