No Bad Data Served
Consumers never see data that failed quality checks. They may see stale-but-correct data during failures, but never fresh-but-wrong data. Quality gates are structural prerequisites for data promotion, not opt-in.
Akili is a Data Product Platform-as-a-Service. Developers define data products as 6 YAML files plus logic (SQL or Python). The platform handles everything else: ingestion, orchestration, execution, quality enforcement, store routing, access control, lineage, and observability.
Contract-driven. Event-sourced. Platform-orchestrated.
%%{init: {'flowchart': {'curve': 'basis', 'rankSpacing': 80}}}%%
flowchart TD
DEV[Developer] --> CLI[CLI / REST API / CI Pipeline]
subgraph Contract["Contract Layer"]
CLI --> MANIFESTS[6 YAML Manifests + logic]
MANIFESTS --> VALIDATE[Validate + Register]
end
subgraph Registry["Manifest Registry"]
VALIDATE --> REG[Store Manifests]
REG --> SCHEMA[Enforce Schema + Classification]
SCHEMA --> EVENT[Publish product.registered]
end
subgraph Orchestration["Execution Engine"]
EVENT --> CODEGEN[Manifest Translation]
CODEGEN --> SENSORS[Input Correlation]
SENSORS --> IOMGR[Store Routing]
CODEGEN --> CHECKS[Quality Gates]
end
subgraph Execution["Execution Layer"]
SENSORS --> K8S[Compute Jobs]
K8S --> TRANSFORM[SQL / Python]
end
subgraph Serving["Serving Layer"]
TRANSFORM --> LAKE[Data Lake -- always]
TRANSFORM --> STRUCTURED[Structured Store -- lookup]
TRANSFORM --> TIMESERIES[Time-Series Store -- timeseries]
TRANSFORM --> ANALYTICS[Analytics Engine -- analytics]
TRANSFORM --> CACHE[Real-Time Cache -- realtime]
end
subgraph Observability
K8S -.-> ENGINEUI[Execution Dashboard]
K8S -.-> PROM[Metrics + Dashboards]
REG -.-> CATALOG[Data Catalog]
end
| Layer | Capability | What It Does |
|---|---|---|
| Orchestration | Execution Engine | Translates manifests into pipelines, correlates inputs, routes outputs |
| Compute | Compute Cluster | Runs transformations as isolated jobs |
| Platform | Control Plane | Registry, gateway, serving — manages the data product lifecycle |
| Developer | SQL / Python | Write transformations in familiar languages |
| Structured | Structured Store | Current-state lookups with row-level security |
| Time Series | Time-Series Store | Pre-computed aggregations over time windows |
| Analytics | Analytics Engine | Interactive OLAP queries across the data lake |
| Cache | Real-Time Cache | Sub-10ms key-value lookups |
| Storage | Data Lake | S3-compatible durable storage (source of truth) |
| Events | Event Bus | Streaming, CDC, inter-product coordination |
| Governance | Data Catalog | Lineage, discovery, quality scores |
| Visualization | Dashboard Engine | Embedded dashboards and visualizations |
| Auth | Identity Provider | Single sign-on with role-based access |
| Deployment | Automated Deployment | Git-driven deployments |
| Monitoring | Metrics + Dashboards | Platform observability and alerting |
Ingestion — External sources flow through the connector registry into raw storage on the Object Store, auto-registered as raw.{connector_id}.{stream}, with a data.available event published to the Event Bus.
Execution — The execution engine detects all inputs are ready, dispatches a compute job, and runs your transformation. Output is written to the Data Lake (always) plus serving stores per serving.yaml. Quality checks execute; if all pass, data.available is published for downstream consumers.
Query — Users and agents send queries via the API. The Gateway validates JWT claims (tenant, teams, classification clearance) and routes to the appropriate serving endpoint: Structured Store for lookups, Time-Series Store for time series, Analytics Engine for analytics, Real-Time Cache for realtime, or Data Lake for history/time travel.
%%{init: {'flowchart': {'curve': 'basis', 'rankSpacing': 80}}}%%
flowchart TD
CF[Cloudflare WAF + DNS] --> TRAEFIK[Traefik Ingress]
TRAEFIK --> GW[Gateway]
GW --> |JWT Validation| AUTH{Auth Checks}
AUTH --> |Classification + Team| ROUTE[Route Request]
ROUTE --> REGISTRY[Registry -- Rust]
ROUTE --> ENGINE[Execution Engine API]
ROUTE --> SERVING[Serving -- Rust]
REGISTRY --> DATA
ENGINE --> DATA
SERVING --> DATA
subgraph DATA["Data Layer"]
STRUCTURED[Structured Store]
OBJSTORE[Object Store]
RTCACHE[Real-Time Cache]
EBUS[Event Bus]
TSSTORE[Time-Series Store]
ANALYTICSENG[Analytics Engine]
end
All platform events flow through the Event Bus with tenant-prefixed topics: tenant.{tenant_id}.{event_type}.
Key platform events:
| Event | Trigger |
|---|---|
product.registered | New data product added |
execution.triggered | Correlation complete, job dispatched |
execution.completed | Job succeeded, output available |
quality.passed / quality.failed | Quality gate result |
data.available | New output ready for downstream consumers |
Events follow the CloudEvents spec with tenant_id, timestamp, and correlation_id on every payload.
Entities are things that exist and have identity — a customer, a store, a product. An entity has a stable identifier that survives all attribute changes.
Events are things that happened — an order placed, a payment processed. Events are immutable from birth.
Column roles in output.yaml encode this distinction:
| Role | Modeling Concept | Meaning |
|---|---|---|
identity | Entity anchor | Stable identifier that survives all attribute changes |
attribute | Entity property | Descriptive properties that can evolve over time |
measure | Event value | Numeric facts produced by events, for aggregation |
event_key | Event-entity link | Foreign key connecting an event to the entity it involves |
The same underlying entity/event data can simultaneously produce a star schema in the Structured Store, a flat table in the Analytics Engine, a key-value projection in the Real-Time Cache, and a time-series in the Time-Series Store. One transformation. One quality gate. Multiple serving projections.
No Bad Data Served
Consumers never see data that failed quality checks. They may see stale-but-correct data during failures, but never fresh-but-wrong data. Quality gates are structural prerequisites for data promotion, not opt-in.
Quality-First Execution
The execution pipeline follows strict ordering: transform, quality check, promote. Data is validated before any consumer can see it. There is no state where data has been written to a consumer-facing store but has not passed quality checks.