quality.yaml

Declares quality checks that run after every materialization. If a check with severity: error fails, downstream propagation is blocked.

Example

apiVersion: akili/v1
kind: Quality

checks:
  # Tier 1: Declarative (built-in check types)
  - name: revenue_not_null
    type: completeness
    config:
      column: total_revenue
      threshold: 0.99
    severity: error

  - name: data_freshness
    type: freshness
    config:
      column: updated_at
      max_age: 6h
    severity: error

  - name: row_volume
    type: volume
    config:
      min_rows: 50
      max_rows: 100000
    severity: warn

  - name: revenue_positive
    type: custom_expression
    config:
      expression: "total_revenue >= 0"
      threshold: 1.0
    severity: error

  # Tier 2: Custom SQL
  - name: no_duplicate_outlets_per_day
    type: custom_sql
    sql: |
      SELECT COUNT(*) as failures
      FROM {output}
      WHERE (outlet_id, sale_date) IN (
        SELECT outlet_id, sale_date
        FROM {output}
        GROUP BY outlet_id, sale_date
        HAVING COUNT(*) > 1
      )
    severity: error
    expect: "failures = 0"

  # Tier 3: Custom Python
  - name: revenue_distribution_stable
    type: custom_python
    entrypoint: checks/revenue_drift.py
    severity: warn

Three Expressiveness Tiers

Tier	Defined In	Translated To	Use Case
Declarative	YAML `type` + `config`	Platform-generated SQL as `@asset_check`	Standard checks: completeness, freshness, volume, uniqueness, range
Custom SQL	Inline `sql` block	Platform-wrapped SQL as `@asset_check`	Complex business logic, multi-table assertions
Custom Python	External `.py` file	Platform-wrapped function as `@asset_check`	Statistical analysis, ML drift detection, custom algorithms

Built-In Check Types (Tier 1)

Type	Config Fields	What It Checks
`completeness`	`column`, `threshold` (0.0-1.0)	Fraction of non-null values >= threshold
`freshness`	`column` (timestamp), `max_age` (duration)	Most recent value within max_age of now
`volume`	`min_rows`, `max_rows` (optional)	Row count within bounds
`uniqueness`	`columns` (string[])	No duplicate values for given column combination
`range`	`column`, `min`, `max`	All values within [min, max]
`accepted_values`	`column`, `values` (string[])	All values in allowed set
`referential`	`column`, `reference_product`, `reference_column`	All values exist in referenced product
`custom_expression`	`expression` (SQL boolean), `threshold`	Fraction of rows satisfying expression >= threshold
`regex`	`column`, `pattern`	All non-null values match regex
`statistical`	`column`, `metric` (mean/stddev/median), `min`, `max`	Aggregate metric within bounds

Custom SQL (Tier 2)

The sql block can use {output} as a placeholder for the materialized table reference. The platform injects the correct tenant-scoped, partition-scoped table name at execution time.

The expect field is a simple assertion: "column_name operator value". Supported operators: =, <, >, <=, >=, !=.

Custom Python (Tier 3)

The entrypoint file must expose a function with this signature:

def check(df: pd.DataFrame, context: dict) -> CheckResult:
    """
    Args:
        df: The materialized output as a DataFrame.
        context: Dict with keys: tenant_id, product_name, partition_key,
                 previous_df (last successful materialization, or None).
    Returns:
        CheckResult with passed (bool), message (str), metadata (dict).
    """

The context["previous_df"] enables drift detection by comparing current materialization against the previous one.

Severity and Blocking

Severity	On Failure
`error`	Block downstream serving writes. Emit failure event. Enter retry/DLQ.
`warn`	Log warning. Continue serving writes. Emit warning event.

Developer Testing

Products can also declare tests in the tests block:

tests:
  - name: customer_transform_basic
    type: unit
    description: "Validates basic customer transform with standard inputs"
    fixtures:
      inputs:
        raw_customers: tests/fixtures/raw_customers.csv
        region_lookup: tests/fixtures/region_lookup.csv
    assertions:
      - column: customer_id
        condition: not_null
      - column: lifetime_value
        condition: ">= 0"

Test Type	Purpose	Execution
`unit`	Validates transform logic with fixture data	`akili test run`, runs in CI before deploy
`contract`	Validates interface agreement with upstream	Runs on deploy, fails deployment if violated
`canary`	Validates production output against expectations	Runs after each execution, alerts only