Skip to content
GitLab

Runtime View

The primary flow: a developer declares a data product, triggers execution, and consumers query the results.

sequenceDiagram
    participant Dev as Developer
    participant CLI as akili CLI
    participant API as Control-Plane API
    participant PG as PostgreSQL
    participant RP as Redpanda
    participant DAG as Dagster
    participant Ceph as Ceph RGW
    participant SR as StarRocks

    Dev->>CLI: akili init + akili validate
    Dev->>CLI: akili product deploy my-product
    CLI->>API: POST /products/my-product/deploy
    API->>PG: Store manifest + deployment record
    API->>RP: Publish product.deployed event
    RP->>DAG: Sensor detects event
    DAG->>DAG: Generate asset graph from manifest
    DAG->>PG: Read source data (via IO manager)
    DAG->>DAG: Execute SQL transform
    DAG->>DAG: Run quality checks
    DAG->>Ceph: Write Iceberg table (always)
    DAG->>SR: Materialize serving view (if analytical intent)
    DAG->>RP: Publish execution.completed event
    RP->>API: Consumer processes event
    API->>PG: Update execution status

When a quality check fails, data is not promoted to serving stores. The last-known-good data remains available.

sequenceDiagram
    participant DAG as Dagster
    participant QC as Quality Checks
    participant DLQ as Dead Letter Queue
    participant API as Control-Plane
    participant Consumer as Data Consumer

    DAG->>DAG: Execute SQL transform
    DAG->>QC: Run quality checks
    QC-->>QC: Critical check FAILS (e.g., null dates)
    QC->>DLQ: Route failed execution to DLQ
    QC->>API: Publish quality.failed event
    API->>API: Execution marked FAILED
    Note over Consumer: Last-known-good data<br/>remains available
    Consumer->>API: GET /products/my-product/data
    API->>API: Serve last successful version

Key principle: consumers never see fresh-but-wrong data. Quality checks are the gate between execution and serving.

Every request is scoped to a tenant. The tenant boundary is enforced at multiple layers.

sequenceDiagram
    participant User as User (Browser)
    participant Portal as Portal (BFF)
    participant API as Control-Plane
    participant PG as PostgreSQL

    User->>Portal: GET /products
    Portal->>Portal: Extract tenant from JWT
    Portal->>API: GET /api/v1/products (+ JWT)
    API->>API: Validate JWT, extract tenant_id
    API->>PG: SET LOCAL app.tenant_id = $1
    PG->>PG: RLS policy filters by tenant_id
    PG->>API: Return tenant-scoped rows only
    API->>Portal: JSON response
    Portal->>User: Render products

Tenant isolation is enforced at 4 points:

  1. JWT — tenant_id extracted from auth token
  2. Service layer — all service methods receive tenant_id
  3. Database — PostgreSQL RLS policies filter by tenant_id
  4. Object storage — Ceph paths prefixed by tenant_id

Domain events flow through Redpanda, enabling loose coupling between services.

flowchart LR
    subgraph Producers
        API[Control-Plane API]
        DAG[Dagster]
    end
    subgraph Redpanda
        T1["tenant.{id}.products"]
        T2["tenant.{id}.executions"]
        T3["tenant.{id}.quality"]
    end
    subgraph Consumers
        NOTIFY[Notification Service]
        LINEAGE[Lineage Tracker]
        SENSORS[Dagster Sensors]
    end

    API --> T1
    API --> T2
    DAG --> T2
    DAG --> T3
    T1 --> SENSORS
    T2 --> NOTIFY
    T2 --> LINEAGE
    T3 --> NOTIFY

Topics are per-tenant (tenant.{id}.{domain}), ensuring complete tenant isolation at the event layer.