Orchestration
Akili’s execution engine translates 6 YAML manifest files into assets, sensors, store writers, and quality checks automatically via akili codegen. You write your business logic in SQL or Python — the platform generates all orchestration code.
Manifest Translation
Section titled “Manifest Translation”Each manifest file maps to a platform concept:
| Manifest | Platform Concept | Purpose |
|---|---|---|
source.yaml | Asset (one per source) | Ingestion layer |
pipeline.yaml | Asset / multi-asset | Transformations |
quality.yaml | Quality check | Blocking on severity: error |
serving.yaml | Store writer | Intent-based store routing |
schedule.yaml | Deadline timer + retry config | Scheduling and fallback |
governance.yaml | Asset metadata + tags | Retention, ownership |
contract.ext.yaml | Registry subscriptions | Event routing |
Asset key convention: AssetKey(["<tenant>", "<product>", "<asset>"])
Execution Model
Section titled “Execution Model”The orchestration layer is organized into independent sub-concerns:
Correlation Engine
Section titled “Correlation Engine”A data product often depends on multiple inputs. The correlation engine detects when all inputs are ready within a time window before triggering execution.
%%{init: {'sequence': {'mirrorActors': false}}}%%
sequenceDiagram
participant EB as Event Bus
participant CE as Correlation Engine
participant D as Dispatcher
EB->>CE: data.available (input A, window W1)
CE->>CE: Store: A ready for W1
CE->>CE: Check: all inputs for W1 ready? No
EB->>CE: data.available (input B, window W1)
CE->>CE: Store: B ready for W1
CE->>CE: Check: all inputs for W1 ready? Yes
CE->>D: All inputs correlated — trigger execution
Correlation key: (tenant_id, product_id, window_id). Each input event is matched to its correlation window. Once all declared inputs have a data.available event in the same window, the dispatcher is triggered.
Timeout: If not all inputs arrive within the configured deadline, the dispatcher enters TIMEOUT state and applies the fallback policy.
Dispatcher State Machine
Section titled “Dispatcher State Machine”The dispatcher manages execution lifecycle through four states:
stateDiagram-v2
[*] --> WAIT: Inputs declared
WAIT --> EXECUTE: All inputs correlated
WAIT --> TIMEOUT: Deadline exceeded
TIMEOUT --> EXECUTE: Fallback policy = run_partial
TIMEOUT --> FAILSAFE: Fallback policy = skip
EXECUTE --> [*]: Success
EXECUTE --> DLQ: Execution failed
DLQ --> EXECUTE: Retry (within policy)
DLQ --> FAILSAFE: Retries exhausted
FAILSAFE --> [*]: Alert + skip
| State | Behavior |
|---|---|
| WAIT | Correlation engine is collecting input events |
| TIMEOUT | Deadline exceeded — apply fallback policy from schedule.yaml |
| EXECUTE | Compute job launched, transform running |
| FAILSAFE | All retries exhausted — alert, serve last-known-good data |
DLQ and Retry
Section titled “DLQ and Retry”When execution fails, the entry routes to the Dead Letter Queue with a retry policy:
| Policy | Behavior |
|---|---|
exponential_backoff | 1s, 2s, 4s, 8s… up to max retries |
fixed_interval | Fixed delay between retries |
none | No retry — straight to DLQ for manual inspection |
Failed executions in the DLQ can be inspected via GET /api/v1/dlq and replayed via POST /api/v1/dlq/{id}/replay. A circuit breaker prevents cascading failures — if a product fails 3 consecutive times, further executions are suspended until manual intervention.
The Pipeline
Section titled “The Pipeline”%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart LR
A[YAML Validation] --> B[Code Generation]
B --> C[Lint & Check]
C --> D[Unit Tests]
D --> E[Integration Tests]
E --> F[Pre-push Hook]
F --> G[CI Pipeline]
G --> H[Deploy & Sync]
Asset Materialization
Section titled “Asset Materialization”%%{init: {'sequence': {'mirrorActors': false, 'messageMargin': 40}}}%%
sequenceDiagram
participant API as Control Plane
participant Engine as Execution Engine
participant Sensor as Sensor
participant Job as Compute Job
participant QC as Quality Checks
participant SW as Store Writers
participant Stores as Serving Stores
participant EB as Event Bus
API->>Engine: Deploy product
Engine->>Sensor: Create sensor for inputs
EB->>Sensor: data.available events
Sensor->>Sensor: Correlate — all inputs ready?
Sensor->>Engine: Yield run request
Engine->>Job: Launch compute job
Job->>Job: Execute SQL/Python transform
Job->>QC: Run quality checks
QC-->>QC: Blocking checks must pass
QC->>SW: Quality gate passed
SW->>Stores: Write to data lake
SW->>Stores: Write to serving stores
SW->>EB: Publish data.available
EB->>Sensor: Trigger downstream products
When all inputs for a data product are available:
- Sensor detection — A platform sensor detects all required
data.availableevents - Run request — Sensor yields a run request for the product’s asset
- Compute job — The execution engine launches a job with resources from
compute.yaml - Transform execution — SQL or Python runs in an isolated compute container
- Quality gate — Quality checks execute; blocking checks must pass
- Store routing — Store writers write to the data lake (always) plus serving stores per
serving.yaml - Event publication —
data.availablepublished to the event bus, triggering downstream sensors
Scheduling
Section titled “Scheduling”Three scheduling modes are supported:
| Mode | Trigger | Use Case |
|---|---|---|
cron | Time-based schedule | Regular batch processing (daily, hourly) |
event | Upstream data.available event | Reactive pipelines triggered by data readiness |
adaptive | Event-driven with debouncing and fallback cron | Near-real-time freshness SLAs (sub-5-minute) |
The adaptive mode (ADR-034) deploys a per-product trigger agent that consumes event bus messages, debounces them, and fires materializations when freshness would breach the SLA.
Code Generation
Section titled “Code Generation”akili codegen <product> is a deterministic transformation: same YAML input always produces byte-for-byte identical Python output. The pipeline:
- Parse — Read all 6 manifest files
- Resolve dependencies — Build the dependency graph
- Resolve templates — Substitute
{{ intent() }}calls with concrete values from upstream semantic intents - Compile — Generate execution engine code (assets, checks, store writers, sensors)
Template resolution enables semantic intent contracts — downstream products reference upstream meaning (e.g., “high-value customer”) rather than specific column values, so upstream changes propagate automatically at codegen time.
Related
Section titled “Related”- System Architecture — Platform layers and components
- Serving Layer — Where materialized data gets routed
- Quality and Governance — How quality checks work