Skip to content
GitLab

Built-in Checks

Tier 1: Declarative Rules (Built-in Types)

Section titled “Tier 1: Declarative Rules (Built-in Types)”

The simplest and most common. You declare a type and configuration; the platform generates the SQL and wraps it in a Dagster asset check.

Ensures a column has a minimum fraction of non-null values.

checks:
- name: revenue_not_null
type: completeness
config:
column: total_revenue
threshold: 0.99 # 99% of values must be non-null
severity: error

The threshold is a fraction between 0.0 and 1.0. Set to 1.0 for columns that must never be null (like primary keys).

Ensures the most recent value in a timestamp column is within a maximum age of now.

checks:
- name: data_freshness
type: freshness
config:
column: updated_at
max_age: 6h
severity: error

Duration formats: 30m, 6h, 1d, 7d.

Ensures the materialized dataset has a row count within expected bounds.

checks:
- name: row_volume
type: volume
config:
min_rows: 50
max_rows: 100000 # optional upper bound
severity: warn

Volume checks catch both empty datasets (pipeline broke) and explosions (duplicate join or missing filter).

Ensures no duplicate values for a given column combination.

checks:
- name: unique_outlet_day
type: uniqueness
config:
columns:
- outlet_id
- sale_date
severity: error

Use a list of columns for composite uniqueness constraints.

Ensures all values fall within a specified range.

checks:
- name: revenue_positive
type: range
config:
column: total_revenue
min: 0
# max: 1000000 # optional upper bound
severity: error

Ensures all values in a column belong to an allowed set.

checks:
- name: valid_status
type: accepted_values
config:
column: order_status
values:
- pending
- confirmed
- shipped
- delivered
- cancelled
severity: error

Ensures all values in a column exist in a referenced product’s column.

checks:
- name: valid_outlet_ref
type: referential
config:
column: outlet_id
reference_product: cleaned-outlets
reference_column: outlet_id
severity: error

Evaluates a SQL boolean expression against each row. The threshold specifies the minimum fraction of rows that must satisfy the expression.

checks:
- name: revenue_matches_count
type: custom_expression
config:
expression: "total_revenue >= 0 AND transaction_count >= 0"
threshold: 1.0 # 100% of rows
severity: error

Ensures all non-null values in a column match a regular expression pattern.

checks:
- name: valid_email_format
type: regex
config:
column: email
pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
severity: warn

Ensures an aggregate metric (mean, stddev, median) falls within expected bounds.

checks:
- name: avg_basket_reasonable
type: statistical
config:
column: avg_basket_size
metric: mean # mean | stddev | median
min: 10.0
max: 500.0
severity: warn

TypeConfig FieldsWhat It Checks
completenesscolumn, threshold (0.0-1.0)Fraction of non-null values >= threshold
freshnesscolumn, max_ageMost recent value within max_age of now
volumemin_rows, max_rowsRow count within bounds
uniquenesscolumns (list)No duplicate values for column combination
rangecolumn, min, maxAll values within [min, max]
accepted_valuescolumn, valuesAll values in allowed set
referentialcolumn, reference_product, reference_columnAll values exist in referenced product
custom_expressionexpression, thresholdFraction of rows satisfying SQL boolean expression
regexcolumn, patternAll non-null values match regex
statisticalcolumn, metric, min, maxAggregate metric (mean/stddev/median) within bounds
SeverityOn Failure
errorBlock downstream serving writes. Enter retry/DLQ flow.
warnLog warning. Continue serving writes.
Rule TypeLoose StartTighten Over Time
Completeness0.950.99+ for critical columns
Volume+/- 50% of expected+/- 20% once stable
Freshness2x schedule interval1.5x once reliable
Statistical3 standard deviations2 standard deviations

Tip: Start with severity: warn for new checks. Once you are confident in the thresholds, promote to severity: error. This avoids blocking production on false positives while you calibrate.

SeverityUse WhenImpact on Failure
errorData correctness is critical. Wrong data is worse than no data.Blocks downstream. Enters retry/DLQ.
warnAnomaly worth investigating, but not worth blocking production.Logs warning. Serving continues.

Rule of thumb: Primary keys, referential integrity, and freshness checks should almost always be error. Volume and statistical checks often start as warn.