Skip to content
GitLab

quality.yaml

Declares quality checks that run after every materialization. If a check with severity: error fails, downstream propagation is blocked.

apiVersion: akili/v1
kind: Quality
checks:
# Tier 1: Declarative (built-in check types)
- name: revenue_not_null
type: completeness
config:
column: total_revenue
threshold: 0.99
severity: error
- name: data_freshness
type: freshness
config:
column: updated_at
max_age: 6h
severity: error
- name: row_volume
type: volume
config:
min_rows: 50
max_rows: 100000
severity: warn
- name: revenue_positive
type: custom_expression
config:
expression: "total_revenue >= 0"
threshold: 1.0
severity: error
# Tier 2: Custom SQL
- name: no_duplicate_outlets_per_day
type: custom_sql
sql: |
SELECT COUNT(*) as failures
FROM {output}
WHERE (outlet_id, sale_date) IN (
SELECT outlet_id, sale_date
FROM {output}
GROUP BY outlet_id, sale_date
HAVING COUNT(*) > 1
)
severity: error
expect: "failures = 0"
# Tier 3: Custom Python
- name: revenue_distribution_stable
type: custom_python
entrypoint: checks/revenue_drift.py
severity: warn
TierDefined InTranslated ToUse Case
DeclarativeYAML type + configPlatform-generated SQL as @asset_checkStandard checks: completeness, freshness, volume, uniqueness, range
Custom SQLInline sql blockPlatform-wrapped SQL as @asset_checkComplex business logic, multi-table assertions
Custom PythonExternal .py filePlatform-wrapped function as @asset_checkStatistical analysis, ML drift detection, custom algorithms
TypeConfig FieldsWhat It Checks
completenesscolumn, threshold (0.0-1.0)Fraction of non-null values >= threshold
freshnesscolumn (timestamp), max_age (duration)Most recent value within max_age of now
volumemin_rows, max_rows (optional)Row count within bounds
uniquenesscolumns (string[])No duplicate values for given column combination
rangecolumn, min, maxAll values within [min, max]
accepted_valuescolumn, values (string[])All values in allowed set
referentialcolumn, reference_product, reference_columnAll values exist in referenced product
custom_expressionexpression (SQL boolean), thresholdFraction of rows satisfying expression >= threshold
regexcolumn, patternAll non-null values match regex
statisticalcolumn, metric (mean/stddev/median), min, maxAggregate metric within bounds

The sql block can use {output} as a placeholder for the materialized table reference. The platform injects the correct tenant-scoped, partition-scoped table name at execution time.

The expect field is a simple assertion: "column_name operator value". Supported operators: =, <, >, <=, >=, !=.

The entrypoint file must expose a function with this signature:

def check(df: pd.DataFrame, context: dict) -> CheckResult:
"""
Args:
df: The materialized output as a DataFrame.
context: Dict with keys: tenant_id, product_name, partition_key,
previous_df (last successful materialization, or None).
Returns:
CheckResult with passed (bool), message (str), metadata (dict).
"""

The context["previous_df"] enables drift detection by comparing current materialization against the previous one.

SeverityOn Failure
errorBlock downstream serving writes. Emit failure event. Enter retry/DLQ.
warnLog warning. Continue serving writes. Emit warning event.

Products can also declare tests in the tests block:

tests:
- name: customer_transform_basic
type: unit
description: "Validates basic customer transform with standard inputs"
fixtures:
inputs:
raw_customers: tests/fixtures/raw_customers.csv
region_lookup: tests/fixtures/region_lookup.csv
assertions:
- column: customer_id
condition: not_null
- column: lifetime_value
condition: ">= 0"
Test TypePurposeExecution
unitValidates transform logic with fixture dataakili test run, runs in CI before deploy
contractValidates interface agreement with upstreamRuns on deploy, fails deployment if violated
canaryValidates production output against expectationsRuns after each execution, alerts only