Architecture Overview
What Akili Is
Section titled “What Akili Is”Akili is a Data Product Platform-as-a-Service. Developers declare data products in 6 YAML manifests plus business logic (SQL or Python). The platform handles everything else: ingestion, orchestration, execution, quality enforcement, multi-tier serving, governance, and lineage tracking — all multi-tenant, all automated.
One deployment serves all tenants. No orchestration code is written by developers. No infrastructure is managed by developers. The platform generates all pipeline code deterministically from manifests.
The core loop: Declare (YAML) -> Validate (schema + cross-manifest) -> Generate (deterministic codegen) -> Execute (isolated compute) -> Serve (intent-routed stores) -> Govern (automated quality, lineage, SLAs).
Key Goals
Section titled “Key Goals”| Goal | Meaning |
|---|---|
| Declarative | Developers write YAML + SQL/Python. The platform does the rest. |
| Multi-tenant | Every table, every query, every topic is tenant-scoped. No cross-tenant data access. |
| Self-hosted | Runs on bare-metal Hetzner K3s. No cloud vendor lock-in. |
| Observable | Every execution has a trace. Every quality check is recorded. Every lineage edge is tracked. |
| Enterprise-ready | Bootstrap from bare metal, license enforcement, air-gap support, IP protection. |
Quality Requirements
Section titled “Quality Requirements”Reliability (Residuality Theory)
Section titled “Reliability (Residuality Theory)”The platform degrades gracefully rather than failing completely:
- Security governance is fail-closed — never degrade masking, classification, or auth
- Quality governance is fail-open — degrade gracefully, show partial data
- Circuit breakers on all downstream calls (serving stores, notifications, external APIs)
- Atomic writes for all cached/persisted state (tempfile, fsync, persist)
Scalability
Section titled “Scalability”- All queries filtered by
tenant_idat every layer - Cursor-based pagination (never offset-based)
- Per-tenant resource limits enforced at the service layer
- Split database pools: 90% routine traffic + 10% critical operations
Maintainability
Section titled “Maintainability”- Service layer pattern: handlers are thin adapters, business logic in services
- Repository pattern: services never touch the database directly
- Crate dependency rules enforced by CI
- Generated code (catalog packages) never hand-edited
Stakeholders
Section titled “Stakeholders”| Stakeholder | Concern |
|---|---|
| Data product developers | Simple manifest authoring, fast feedback, reliable execution |
| Platform operators | Observability, alerting, disaster recovery, capacity planning |
| Enterprise customers | Air-gap deployment, license compliance, IP protection |
| Security team | Tenant isolation, classification propagation, audit trail |
Technology Stack
Section titled “Technology Stack”| Layer | Technology | Purpose |
|---|---|---|
| Platform services | Rust + Axum | Control-plane API, CLI, CRD operators |
| Orchestration | Dagster | Asset graph, sensors, IO managers, quality checks |
| Portal | Next.js 15 + React 19 | Developer and operator UI |
| OLTP | PostgreSQL (CNPG) | Registry, state, RLS-based tenant isolation |
| Analytics | StarRocks | OLAP via Iceberg federation |
| Time series | TimescaleDB | KPIs, continuous aggregates |
| Cache | Redis | Real-time serving, session cache |
| Object storage | Ceph RGW | S3-compatible storage for data lake |
| Streaming | Redpanda | Domain events, per-tenant topics |
| Identity | Authentik | OIDC, SSO, RBAC |
| GitOps | ArgoCD | App-of-Apps deployment pattern |
| Compute | K3s on Hetzner | 3 masters (HA) + 8 workers + 2 Spark nodes |