Built-in Checks
Tier 1: Declarative Rules (Built-in Types)
Section titled “Tier 1: Declarative Rules (Built-in Types)”The simplest and most common. You declare a type and configuration; the platform generates the SQL and wraps it in a Dagster asset check.
Completeness
Section titled “Completeness”Ensures a column has a minimum fraction of non-null values.
checks: - name: revenue_not_null type: completeness config: column: total_revenue threshold: 0.99 # 99% of values must be non-null severity: errorThe threshold is a fraction between 0.0 and 1.0. Set to 1.0 for columns that must never be null (like primary keys).
Freshness
Section titled “Freshness”Ensures the most recent value in a timestamp column is within a maximum age of now.
checks: - name: data_freshness type: freshness config: column: updated_at max_age: 6h severity: errorDuration formats: 30m, 6h, 1d, 7d.
Volume
Section titled “Volume”Ensures the materialized dataset has a row count within expected bounds.
checks: - name: row_volume type: volume config: min_rows: 50 max_rows: 100000 # optional upper bound severity: warnVolume checks catch both empty datasets (pipeline broke) and explosions (duplicate join or missing filter).
Uniqueness
Section titled “Uniqueness”Ensures no duplicate values for a given column combination.
checks: - name: unique_outlet_day type: uniqueness config: columns: - outlet_id - sale_date severity: errorUse a list of columns for composite uniqueness constraints.
Ensures all values fall within a specified range.
checks: - name: revenue_positive type: range config: column: total_revenue min: 0 # max: 1000000 # optional upper bound severity: errorAccepted Values
Section titled “Accepted Values”Ensures all values in a column belong to an allowed set.
checks: - name: valid_status type: accepted_values config: column: order_status values: - pending - confirmed - shipped - delivered - cancelled severity: errorReferential Integrity
Section titled “Referential Integrity”Ensures all values in a column exist in a referenced product’s column.
checks: - name: valid_outlet_ref type: referential config: column: outlet_id reference_product: cleaned-outlets reference_column: outlet_id severity: errorCustom Expression
Section titled “Custom Expression”Evaluates a SQL boolean expression against each row. The threshold specifies the minimum fraction of rows that must satisfy the expression.
checks: - name: revenue_matches_count type: custom_expression config: expression: "total_revenue >= 0 AND transaction_count >= 0" threshold: 1.0 # 100% of rows severity: errorEnsures all non-null values in a column match a regular expression pattern.
checks: - name: valid_email_format type: regex config: column: email pattern: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$" severity: warnStatistical
Section titled “Statistical”Ensures an aggregate metric (mean, stddev, median) falls within expected bounds.
checks: - name: avg_basket_reasonable type: statistical config: column: avg_basket_size metric: mean # mean | stddev | median min: 10.0 max: 500.0 severity: warnQuick Reference Table
Section titled “Quick Reference Table”| Type | Config Fields | What It Checks |
|---|---|---|
completeness | column, threshold (0.0-1.0) | Fraction of non-null values >= threshold |
freshness | column, max_age | Most recent value within max_age of now |
volume | min_rows, max_rows | Row count within bounds |
uniqueness | columns (list) | No duplicate values for column combination |
range | column, min, max | All values within [min, max] |
accepted_values | column, values | All values in allowed set |
referential | column, reference_product, reference_column | All values exist in referenced product |
custom_expression | expression, threshold | Fraction of rows satisfying SQL boolean expression |
regex | column, pattern | All non-null values match regex |
statistical | column, metric, min, max | Aggregate metric (mean/stddev/median) within bounds |
Severity Levels
Section titled “Severity Levels”| Severity | On Failure |
|---|---|
error | Block downstream serving writes. Enter retry/DLQ flow. |
warn | Log warning. Continue serving writes. |
Threshold Configuration
Section titled “Threshold Configuration”Setting Appropriate Thresholds
Section titled “Setting Appropriate Thresholds”| Rule Type | Loose Start | Tighten Over Time |
|---|---|---|
| Completeness | 0.95 | 0.99+ for critical columns |
| Volume | +/- 50% of expected | +/- 20% once stable |
| Freshness | 2x schedule interval | 1.5x once reliable |
| Statistical | 3 standard deviations | 2 standard deviations |
Tip: Start with
severity: warnfor new checks. Once you are confident in the thresholds, promote toseverity: error. This avoids blocking production on false positives while you calibrate.
Severity Guide
Section titled “Severity Guide”| Severity | Use When | Impact on Failure |
|---|---|---|
error | Data correctness is critical. Wrong data is worse than no data. | Blocks downstream. Enters retry/DLQ. |
warn | Anomaly worth investigating, but not worth blocking production. | Logs warning. Serving continues. |
Rule of thumb: Primary keys, referential integrity, and freshness checks should almost always be error. Volume and statistical checks often start as warn.