Quality & Compute
quality.yaml — Quality Gates
Section titled “quality.yaml — Quality Gates”Declares quality checks that run after every materialization. Checks with severity: error block downstream propagation.
apiVersion: akili/v1kind: Quality
checks: # Tier 1: Declarative (built-in types) - name: event_id_not_null type: completeness config: column: event_id threshold: 1.0 # 100% non-null severity: error
- name: data_freshness type: freshness config: column: event_timestamp max_age: 6h severity: error
- name: event_volume type: volume config: min_rows: 100 max_rows: 10000000 severity: warn
- name: valid_event_types type: accepted_values config: column: event_type values: [click, view, scroll, submit, navigate] severity: error
# Tier 2: Custom SQL - name: no_duplicate_events type: custom_sql sql: | SELECT COUNT(*) as failures FROM {output} WHERE event_id IN ( SELECT event_id FROM {output} GROUP BY event_id HAVING COUNT(*) > 1 ) severity: error expect: "failures = 0"
# Tier 3: Custom Python - name: event_distribution_stable type: custom_python entrypoint: checks/event_drift.py severity: warnThree tiers of expressiveness:
| Tier | Defined In | Use Case |
|---|---|---|
| Declarative | YAML type + config | Standard checks: completeness, freshness, volume, uniqueness, range |
| Custom SQL | Inline sql block | Complex business logic, multi-table assertions |
| Custom Python | External .py file | Statistical analysis, ML drift detection |
Built-in check types:
| Type | Config Fields | What It Checks |
|---|---|---|
completeness | column, threshold (0.0-1.0) | Fraction of non-null values >= threshold |
freshness | column, max_age | Most recent value within max_age of now |
volume | min_rows, max_rows | Row count within bounds |
uniqueness | columns (list) | No duplicate values for column combination |
range | column, min, max | All values within [min, max] |
accepted_values | column, values | All values in allowed set |
referential | column, reference_product, reference_column | All values exist in referenced product |
custom_expression | expression, threshold | Fraction of rows satisfying SQL boolean expression |
regex | column, pattern | All non-null values match regex |
statistical | column, metric, min, max | Aggregate metric (mean/stddev/median) within bounds |
Severity levels:
| Severity | On Failure |
|---|---|
error | Block downstream serving writes. Enter retry/DLQ flow. |
warn | Log warning. Continue serving writes. |
compute.yaml — Runtime
Section titled “compute.yaml — Runtime”Declares how the product executes. Most fields have sensible defaults.
apiVersion: akili/v1kind: Compute
runtime: sql # sql | pythonmode: transform # transform | train | inferenceengine: auto # auto | duckdb | spark
schedule: type: event # event | cron | manual # expression: "0 6 * * *" # only for type: cron # timezone: Africa/Nairobi
resources: cpu: "500m" memory: "1Gi" timeout: 30m
entrypoint: logic/transform.sql
retry: max_attempts: 3 backoff: exponential initial_delay: 30sTrigger modes:
schedule.type | When It Runs |
|---|---|
event (default) | When all required inputs are materialized |
cron | At the scheduled time |
manual | Only via akili run or API |
Engine selection:
| Engine | Behavior |
|---|---|
auto (default) | Platform picks DuckDB (under 10GB input) or Spark (10GB+) |
duckdb | Force DuckDB |
spark | Force Spark |