Skip to content
GitLab

Quality & Compute

Declares quality checks that run after every materialization. Checks with severity: error block downstream propagation.

apiVersion: akili/v1
kind: Quality
checks:
# Tier 1: Declarative (built-in types)
- name: event_id_not_null
type: completeness
config:
column: event_id
threshold: 1.0 # 100% non-null
severity: error
- name: data_freshness
type: freshness
config:
column: event_timestamp
max_age: 6h
severity: error
- name: event_volume
type: volume
config:
min_rows: 100
max_rows: 10000000
severity: warn
- name: valid_event_types
type: accepted_values
config:
column: event_type
values: [click, view, scroll, submit, navigate]
severity: error
# Tier 2: Custom SQL
- name: no_duplicate_events
type: custom_sql
sql: |
SELECT COUNT(*) as failures
FROM {output}
WHERE event_id IN (
SELECT event_id FROM {output}
GROUP BY event_id HAVING COUNT(*) > 1
)
severity: error
expect: "failures = 0"
# Tier 3: Custom Python
- name: event_distribution_stable
type: custom_python
entrypoint: checks/event_drift.py
severity: warn

Three tiers of expressiveness:

TierDefined InUse Case
DeclarativeYAML type + configStandard checks: completeness, freshness, volume, uniqueness, range
Custom SQLInline sql blockComplex business logic, multi-table assertions
Custom PythonExternal .py fileStatistical analysis, ML drift detection

Built-in check types:

TypeConfig FieldsWhat It Checks
completenesscolumn, threshold (0.0-1.0)Fraction of non-null values >= threshold
freshnesscolumn, max_ageMost recent value within max_age of now
volumemin_rows, max_rowsRow count within bounds
uniquenesscolumns (list)No duplicate values for column combination
rangecolumn, min, maxAll values within [min, max]
accepted_valuescolumn, valuesAll values in allowed set
referentialcolumn, reference_product, reference_columnAll values exist in referenced product
custom_expressionexpression, thresholdFraction of rows satisfying SQL boolean expression
regexcolumn, patternAll non-null values match regex
statisticalcolumn, metric, min, maxAggregate metric (mean/stddev/median) within bounds

Severity levels:

SeverityOn Failure
errorBlock downstream serving writes. Enter retry/DLQ flow.
warnLog warning. Continue serving writes.

Declares how the product executes. Most fields have sensible defaults.

apiVersion: akili/v1
kind: Compute
runtime: sql # sql | python
mode: transform # transform | train | inference
engine: auto # auto | duckdb | spark
schedule:
type: event # event | cron | manual
# expression: "0 6 * * *" # only for type: cron
# timezone: Africa/Nairobi
resources:
cpu: "500m"
memory: "1Gi"
timeout: 30m
entrypoint: logic/transform.sql
retry:
max_attempts: 3
backoff: exponential
initial_delay: 30s

Trigger modes:

schedule.typeWhen It Runs
event (default)When all required inputs are materialized
cronAt the scheduled time
manualOnly via akili run or API

Engine selection:

EngineBehavior
auto (default)Platform picks DuckDB (under 10GB input) or Spark (10GB+)
duckdbForce DuckDB
sparkForce Spark