Data Lifecycle
Every data product on the Akili platform follows a deterministic lifecycle defined entirely by 6 YAML manifest files. Developers declare what they want — the platform handles how it executes.
Lifecycle Stages
Section titled “Lifecycle Stages”%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart LR
A[Declare] --> B[Ingest]
B --> C[Transform]
C --> D[Quality Gate]
D --> E[Serve]
E --> F[Monitor]
F --> G[Archive]
D -- "fail (critical)" --> H[DLQ]
H -- "replay" --> C
Stage 1: Declare
Section titled “Stage 1: Declare”The data product starts as 6 YAML manifest files that fully describe its behavior:
| Manifest | What It Defines |
|---|---|
product.yaml | Identity, ownership, archetype, classification, schedule |
inputs.yaml | Source connections and upstream product dependencies |
transform.sql / transform.py | Business logic (SQL or Python) |
output.yaml | Output schema, partitioning, retention |
quality.yaml | Quality checks and SLA definitions |
serving.yaml | Serving intents (analytics, lookup, streaming) |
Manifests are validated locally with akili validate and registered with the platform via akili product create.
# Validate manifestsakili validate .akili/
# Register the productakili product create --name daily-orders --namespace sales
# Deploy to the execution engineakili product deploy daily-ordersStage 2: Ingest
Section titled “Stage 2: Ingest”Input ports pull data from external sources or upstream data products. The ingestion strategy is declared in inputs.yaml:
| Strategy | Behavior | Use Case |
|---|---|---|
cdc | Continuous change data capture from database WAL | Real-time from relational databases |
incremental | Cursor-based extraction (WHERE updated_at > last_run) | Periodic sync from any SQL source |
full_refresh | Complete dataset extraction on every run | Small reference tables |
snapshot_diff | Full extract with row-level diff against previous snapshot | Legacy sources without change tracking |
Each ingestion strategy tracks its own state (WAL position, cursor high-water mark, or snapshot hash) to enable exactly-once semantics across failures.
Stage 3: Transform
Section titled “Stage 3: Transform”The execution engine runs the declared transformation logic. Transforms reference upstream data using the {{ ref() }} macro:
SELECT DATE_TRUNC('day', o.order_date) AS date, o.region, SUM(o.total_amount) AS total_revenueFROM {{ ref('orders') }} oGROUP BY 1, 2The platform generates all orchestration code automatically via akili codegen. Developers write only business logic — no Dagster code, no boilerplate.
Transforms execute in isolated compute containers with resource limits from compute.yaml:
compute: engine: dagster schedule: "0 6 * * *" resources: cpu: "1" memory: 2Gi timeout: 1800 retries: 2Stage 4: Quality Gate
Section titled “Stage 4: Quality Gate”After transformation, quality checks from quality.yaml execute. This is a blocking gate — data does not proceed to serving stores until quality is verified.
%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart TB
TRANSFORM[Transform Complete] --> QC[Run Quality Checks]
QC --> CRITICAL{Critical checks pass?}
CRITICAL -- yes --> WARNING[Evaluate warnings]
WARNING --> PROMOTE[Promote to serving]
CRITICAL -- no --> BLOCK[Block promotion]
BLOCK --> DLQ[DLQ after retries]
BLOCK --> ALERT[SLA breach alert]
| Severity | Behavior |
|---|---|
critical | Blocks promotion. Consumers continue seeing the last good data. |
warning | Logged but does not block. Data is promoted, issue is flagged. |
Quality scores are recorded per product and per run, enabling trend analysis and regression detection.
Stage 5: Serve
Section titled “Stage 5: Serve”Once quality gates pass, data is routed to serving stores based on serving.yaml. The platform supports intent-based routing — each serving mode targets a different access pattern:
| Intent | Backing Store | Access Pattern |
|---|---|---|
analytics | StarRocks (via Iceberg federation) | OLAP queries, dashboards, aggregations |
lookup | PostgreSQL | Point lookups by key, low-latency reads |
streaming | Redpanda | Real-time event stream for downstream consumers |
Data is always written to the data lake (Iceberg on Ceph RGW) first, then materialized to serving stores. The lake is the single source of truth; serving stores are derived views.
serving: modes: - intent: analytics config: engine: starrocks materialized: true refresh_interval: "1h" - intent: lookup config: engine: postgresql cache_ttl: "5m"Stage 6: Monitor
Section titled “Stage 6: Monitor”The platform continuously monitors every deployed data product:
- Freshness SLA — is the latest materialization within the configured threshold?
- Quality score — rolling average of quality check pass rates
- Execution health — failure rate, duration trends, retry frequency
- Serving availability — are serving endpoints responsive?
When an SLA is breached, the platform emits an sla.breach event and notifies configured channels (email, webhook, PagerDuty).
# Check current SLA statusakili governance sla daily-orders
# View execution historyakili run list daily-orders
# Check quality scoresakili governance quality daily-ordersStage 7: Archive
Section titled “Stage 7: Archive”Data products have configurable retention policies. When data exceeds the retention period:
- A
retention.expiredgovernance event is emitted - The product owner is notified
- The owner explicitly triggers deletion or extends retention
- Deletion uses position delete files — non-destructive until compaction
The platform never auto-deletes data. This is a deliberate safety mechanism to prevent accidental data loss from misconfigured retention.
Scheduling Modes
Section titled “Scheduling Modes”Three scheduling modes control when a data product’s pipeline executes:
| Mode | Trigger | Use Case |
|---|---|---|
cron | Time-based schedule (e.g., "0 6 * * *") | Regular batch processing |
event | Upstream data.available event | Reactive, data-driven pipelines |
adaptive | Event-driven with debounce and fallback cron | Near-real-time freshness SLAs |
The adaptive mode deploys a per-product trigger agent that consumes event bus messages, debounces them, and fires materializations when freshness would breach the SLA.
Failure Handling
Section titled “Failure Handling”At every stage, failures are handled gracefully:
| Failure Point | Behavior |
|---|---|
| Ingestion failure | Retry with exponential backoff, then DLQ |
| Transform error | Retry up to retries count, then DLQ |
| Quality gate failure | Block promotion, retry, then DLQ |
| Serving write failure | Retry, circuit breaker opens after threshold |
The dead letter queue (DLQ) captures all failed events with full context for manual inspection and replay. See DLQ Management for details.
Related
Section titled “Related”- Orchestration — execution engine mechanics
- Serving Layer — intent-based store routing
- Quality and Governance — quality check details
- Governance Model — classification and retention policies