Serving Layer
The serving layer translates developer intent (declared in serving.yaml) into store-specific infrastructure, exposes data through unified API endpoints, and enforces tenant isolation at the query boundary.
Developers never reference database names, connection strings, or table names. They declare what access pattern they need; the platform routes data to the correct store.
%%{init: {'flowchart': {'curve': 'basis', 'nodeSpacing': 50}}}%%
flowchart TD
TRANSFORM[Transform Output] --> ICEBERG[Data Lake -- Source of Truth]
ICEBERG --> PG_IO[Lookup IO Manager]
ICEBERG --> TS_IO[Timeseries IO Manager]
ICEBERG --> SR_IO[Analytics IO Manager]
ICEBERG --> RD_IO[Realtime IO Manager]
PG_IO --> PG[Structured Store]
TS_IO --> TS[Time-Series Store]
SR_IO --> SR[Analytics Engine via Lakehouse Federation]
RD_IO --> RD[Real-Time Cache]
PG --> |Point reads by key, < 10ms| APP1[Portal / API Clients]
TS --> |Time-range queries with rollups| APP2[Dashboards / Reporting]
SR --> |Full OLAP SQL| APP3[Superset / BI Tools]
RD --> |Sub-10ms key-value| APP4[Live Dashboards / Alerts]
Serving Tiers
Section titled “Serving Tiers”serving.yaml Type | Target Store | Data Movement | Access Pattern |
|---|---|---|---|
lookup | Structured store | Upsert by primary key | Point reads by key, < 10ms |
timeseries | Time-series store | Append rows | Time-range queries with rollups |
analytics | Analytics engine (via lakehouse) | No-op (data already in lake) | OLAP, dashboards |
realtime | Real-time cache | SET with TTL | Sub-10ms key-value |
| (none) | Object store only | Already written | Batch, time travel |
The object store (S3-compatible) is always written to, regardless of serving config. It is the source of truth. All other stores are projections — optimized read replicas populated by IO Managers.
A single product can declare multiple serving types simultaneously:
# serving.yaml -- a product served in 3 storesendpoints: - type: lookup primary_key: [outlet_id] - type: timeseries time_column: date granularity: [daily, weekly, monthly] metrics: [total_revenue, outlet_count] - type: realtime key: [territory] ttl: 1hStore Behaviors
Section titled “Store Behaviors”Lookup (Structured Store)
Section titled “Lookup (Structured Store)”Low-latency point reads by primary key. Ideal for “give me the current state of entity X” queries.
- Write: Upsert by
primary_keycolumns, RLS enforced via tenant context - Read:
GET /api/v1/products/{name}/latest?filter=outlet_id:123 - Latency: < 10ms for indexed point lookups
- History mode: Supports Type 2 and Type 3 SCD via IO Manager strategies
Timeseries (Time-Series Store)
Section titled “Timeseries (Time-Series Store)”Efficient time-range queries with automatic rollups. Ideal for “give me weekly revenue for the last 3 months” queries.
- Write: Append rows (time series is additive), auto-creates hypertables and continuous aggregates
- Read:
GET /api/v1/products/{name}/history?from=2026-01-01&to=2026-02-07&granularity=weekly - Features: Auto-compression after 7 days, pre-computed aggregates at all declared granularities
Analytics (Analytics Engine via Lakehouse)
Section titled “Analytics (Analytics Engine via Lakehouse)”Full analytical queries via lakehouse federation. No data movement required — the analytics engine reads directly from lakehouse tables in the object store.
- Write: No-op — data is already in the lakehouse from the standard pipeline
- Read: Full SQL via the analytics engine, accessible through Superset dashboards
- Advantage: Zero-copy analytics on the same data used for other serving tiers
Realtime (Real-Time Cache)
Section titled “Realtime (Real-Time Cache)”Sub-10ms key-value access with automatic TTL expiry.
- Write: SET with TTL from
serving.yaml - Read:
GET /api/v1/products/{name}/realtime?key=territory:nairobi - Use case: Operational dashboards, real-time alerts, API-driven lookups
Namespace Convention
Section titled “Namespace Convention”Every piece of data follows a four-level hierarchy aligned with DDD bounded contexts:
{tenant} / {domain} / {product} / {version}This maps to each store:
| Store | Pattern | Example |
|---|---|---|
| Object store (lakehouse) | s3://akili/{tenant}/{domain}/{product}/data/ | s3://akili/fmcg-ea/sales/raw-orders/data/ |
| Structured store | schema={domain}, table={product}, RLS by tenant_id | finance.daily_kpi_report |
| Time-series store | schema={domain}, hypertable={product} | finance.daily_kpi_report |
| Analytics engine | catalog=lakehouse_{tenant}, database={domain} | lakehouse_fmcg_ea.finance.daily_kpi_report |
| Real-time cache | {tenant}:{domain}:{product}:{key_values} | fmcg-ea:finance:daily-kpi-report:nairobi |
| Event bus | {tenant}.{domain}.{product}.events | fmcg-ea.finance.daily-kpi-report.events |
Domain auto-creation: When a product is registered in a new domain, the control plane atomically creates the domain across all stores (structured store schema, S3 prefix, lakehouse namespace). No separate “create domain” API needed.
Domain immutability: Products cannot move between domains. If ownership changes, deprecate and recreate in the new domain.
Query Routing
Section titled “Query Routing”When a consumer requests data, the control-plane routes the query to the correct store based on the product’s declared serving types:
flowchart TD
Q["API Request<br/>GET /api/v1/products/{name}/..."]
Q --> AUTH["Validate JWT<br/>Extract tenant_id"]
AUTH --> ROUTE{"Request path?"}
ROUTE -->|/latest?filter=...| LOOKUP["Structured Store<br/>Point read by primary key"]
ROUTE -->|/history?from=&to=| TS["Time-Series Store<br/>Time-range query"]
ROUTE -->|/analytics| SR["Analytics Engine<br/>OLAP SQL via lakehouse"]
ROUTE -->|/realtime?key=| RT["Real-Time Cache<br/>Key-value lookup"]
ROUTE -->|/data| LAKE["Object Store<br/>Batch or time travel"]
If a product has not declared the requested serving type, the API returns 404 Not Found with a message indicating which types are available.
Tenant Isolation at the Serving Boundary
Section titled “Tenant Isolation at the Serving Boundary”Tenant isolation is enforced at 5 points in the serving path:
| Point | Mechanism | Failure mode |
|---|---|---|
| 1. JWT extraction | tenant_id extracted from auth token at API gateway | 401 Unauthorized |
| 2. Service layer | All service methods receive tenant_id as parameter | Compile-time enforced |
| 3. Database RLS | SET LOCAL app.tenant_id before every query (PostgreSQL Row-Level Security) | Query returns empty set (not error) |
| 4. Object storage | Ceph paths prefixed by tenant (s3://akili/{tenant_id}/...) | 403 Forbidden |
| 5. Cache keys | Redis keys prefixed by tenant ({tenant_id}:{domain}:{product}:...) | Key not found |
Even if application code omits a WHERE clause, PostgreSQL RLS ensures no cross-tenant data leaks. This is defense-in-depth — every layer independently enforces isolation.
Latency Budget
Section titled “Latency Budget”| Store | Target | P99 budget |
|---|---|---|
| Structured store (lookup) | < 10ms | 50ms |
| Real-time cache | < 5ms | 20ms |
| Time-series store | < 100ms | 500ms |
| Analytics engine | < 2s | 10s |
| Object store (batch) | < 30s | 120s |
Latency is measured end-to-end from API request to response. Store-level timeouts trigger circuit breakers — if a serving store is unresponsive, the platform returns the last-known-good result (fail-open for quality, not security).
SCD as a Serving Concern
Section titled “SCD as a Serving Concern”Slowly Changing Dimension (SCD) logic lives in the serving layer, not in transforms (ADR-027). Developers write stateless transforms that produce the current state. The IO Manager handles temporal bookkeeping based on history_mode in serving.yaml:
| History Mode | Strategy | IO Manager Behavior |
|---|---|---|
current | Standard upsert | Overwrites on identity columns |
type_2 | Insert-then-close | Sets effective_to on previous row, inserts new row |
type_3 | Update-in-place | Stores previous values in prev_* columns |
append | Append-only | Every version is a new row |
The lakehouse always stores the full current state; history mode only affects how data is rendered in the serving stores.
Related
Section titled “Related”- System Architecture — Platform layers and data flow
- Orchestration — How IO Managers get triggered
- Quality and Governance — Quality gates before data reaches serving stores