Skip to content
GitLab

System Architecture

Akili is a Data Product Platform-as-a-Service. Developers define data products as 6 YAML files plus logic (SQL or Python). The platform handles everything else: ingestion, orchestration, execution, quality enforcement, store routing, access control, lineage, and observability.

Contract-driven. Event-sourced. Platform-orchestrated.

%%{init: {'flowchart': {'curve': 'basis', 'rankSpacing': 80}}}%%
flowchart TD
    DEV[Developer] --> CLI[CLI / REST API / CI Pipeline]

    subgraph Contract["Contract Layer"]
        CLI --> MANIFESTS[6 YAML Manifests + logic]
        MANIFESTS --> VALIDATE[Validate + Register]
    end

    subgraph Registry["Manifest Registry"]
        VALIDATE --> REG[Store Manifests]
        REG --> SCHEMA[Enforce Schema + Classification]
        SCHEMA --> EVENT[Publish product.registered]
    end

    subgraph Orchestration["Execution Engine"]
        EVENT --> CODEGEN[Manifest Translation]
        CODEGEN --> SENSORS[Input Correlation]
        SENSORS --> IOMGR[Store Routing]
        CODEGEN --> CHECKS[Quality Gates]
    end

    subgraph Execution["Execution Layer"]
        SENSORS --> K8S[Compute Jobs]
        K8S --> TRANSFORM[SQL / Python]
    end

    subgraph Serving["Serving Layer"]
        TRANSFORM --> LAKE[Data Lake -- always]
        TRANSFORM --> STRUCTURED[Structured Store -- lookup]
        TRANSFORM --> TIMESERIES[Time-Series Store -- timeseries]
        TRANSFORM --> ANALYTICS[Analytics Engine -- analytics]
        TRANSFORM --> CACHE[Real-Time Cache -- realtime]
    end

    subgraph Observability
        K8S -.-> ENGINEUI[Execution Dashboard]
        K8S -.-> PROM[Metrics + Dashboards]
        REG -.-> CATALOG[Data Catalog]
    end
LayerCapabilityWhat It Does
OrchestrationExecution EngineTranslates manifests into pipelines, correlates inputs, routes outputs
ComputeCompute ClusterRuns transformations as isolated jobs
PlatformControl PlaneRegistry, gateway, serving — manages the data product lifecycle
DeveloperSQL / PythonWrite transformations in familiar languages
StructuredStructured StoreCurrent-state lookups with row-level security
Time SeriesTime-Series StorePre-computed aggregations over time windows
AnalyticsAnalytics EngineInteractive OLAP queries across the data lake
CacheReal-Time CacheSub-10ms key-value lookups
StorageData LakeS3-compatible durable storage (source of truth)
EventsEvent BusStreaming, CDC, inter-product coordination
GovernanceData CatalogLineage, discovery, quality scores
VisualizationDashboard EngineEmbedded dashboards and visualizations
AuthIdentity ProviderSingle sign-on with role-based access
DeploymentAutomated DeploymentGit-driven deployments
MonitoringMetrics + DashboardsPlatform observability and alerting
  1. Ingestion — External sources flow through the connector registry into raw storage on the Object Store, auto-registered as raw.{connector_id}.{stream}, with a data.available event published to the Event Bus.

  2. Execution — The execution engine detects all inputs are ready, dispatches a compute job, and runs your transformation. Output is written to the Data Lake (always) plus serving stores per serving.yaml. Quality checks execute; if all pass, data.available is published for downstream consumers.

  3. Query — Users and agents send queries via the API. The Gateway validates JWT claims (tenant, teams, classification clearance) and routes to the appropriate serving endpoint: Structured Store for lookups, Time-Series Store for time series, Analytics Engine for analytics, Real-Time Cache for realtime, or Data Lake for history/time travel.

%%{init: {'flowchart': {'curve': 'basis', 'rankSpacing': 80}}}%%
flowchart TD
    CF[Cloudflare WAF + DNS] --> TRAEFIK[Traefik Ingress]
    TRAEFIK --> GW[Gateway]

    GW --> |JWT Validation| AUTH{Auth Checks}
    AUTH --> |Classification + Team| ROUTE[Route Request]

    ROUTE --> REGISTRY[Registry -- Rust]
    ROUTE --> ENGINE[Execution Engine API]
    ROUTE --> SERVING[Serving -- Rust]

    REGISTRY --> DATA
    ENGINE --> DATA
    SERVING --> DATA

    subgraph DATA["Data Layer"]
        STRUCTURED[Structured Store]
        OBJSTORE[Object Store]
        RTCACHE[Real-Time Cache]
        EBUS[Event Bus]
        TSSTORE[Time-Series Store]
        ANALYTICSENG[Analytics Engine]
    end

All platform events flow through the Event Bus with tenant-prefixed topics: tenant.{tenant_id}.{event_type}.

Key platform events:

EventTrigger
product.registeredNew data product added
execution.triggeredCorrelation complete, job dispatched
execution.completedJob succeeded, output available
quality.passed / quality.failedQuality gate result
data.availableNew output ready for downstream consumers

Events follow the CloudEvents spec with tenant_id, timestamp, and correlation_id on every payload.

Entities are things that exist and have identity — a customer, a store, a product. An entity has a stable identifier that survives all attribute changes.

Events are things that happened — an order placed, a payment processed. Events are immutable from birth.

Column roles in output.yaml encode this distinction:

RoleModeling ConceptMeaning
identityEntity anchorStable identifier that survives all attribute changes
attributeEntity propertyDescriptive properties that can evolve over time
measureEvent valueNumeric facts produced by events, for aggregation
event_keyEvent-entity linkForeign key connecting an event to the entity it involves

The same underlying entity/event data can simultaneously produce a star schema in the Structured Store, a flat table in the Analytics Engine, a key-value projection in the Real-Time Cache, and a time-series in the Time-Series Store. One transformation. One quality gate. Multiple serving projections.

No Bad Data Served

Consumers never see data that failed quality checks. They may see stale-but-correct data during failures, but never fresh-but-wrong data. Quality gates are structural prerequisites for data promotion, not opt-in.

Quality-First Execution

The execution pipeline follows strict ordering: transform, quality check, promote. Data is validated before any consumer can see it. There is no state where data has been written to a consumer-facing store but has not passed quality checks.