Skip to content
GitLab

Serving Configuration

The serving layer translates developer intent into consumer-facing data endpoints. You declare what you want in serving.yaml; the platform handles where data goes, how it gets there, and how consumers access it.

Every data product writes to the object store via the lakehouse format. This is the source of truth with full time-travel capability. Serving endpoints are additional projections — optimized read replicas populated by the platform after each materialization.

Materialization
|
v
Object Store / Data Lake (always written -- source of truth)
|
+---> Structured Store (type: lookup)
+---> Timeseries Store (type: timeseries)
+---> Analytics Engine (type: analytics -- no data movement)
+---> Cache Store (type: realtime)

Point reads by primary key. Use when consumers need to fetch individual records by ID.

endpoints:
- type: lookup
description: Point lookups by outlet_id for the Portal
config:
index_columns:
- outlet_id
- sale_date

Write behavior: The platform upserts by primary key columns. Auto-creates the table, schema, and indexes. Row-level security is enforced on every operation.

Read pattern: GET /api/v1/products/{name}/latest?filter=outlet_id:123

When to use: Portal detail pages, key-value access, any “give me the current state of entity X” pattern.

Config options:

FieldRequiredDefaultDescription
index_columnsnoPrimary key columnsColumns to index in the structured store
history_modenocurrentcurrent = latest state only. type_2 = SCD Type 2 with effective dates. type_3 = previous-value columns.
tracked_columnsif history_mode is not currentColumns whose changes trigger a new historical record
endpoints:
- type: lookup
config:
index_columns: [customer_id]

Standard upsert. Only the latest state of each record is kept.

Optimized for time-range queries and trend analysis. Uses hypertables with automatic chunking.

endpoints:
- type: timeseries
description: Hourly sales trends for territory dashboards
config:
time_column: sale_date
chunk_interval: 1 day
continuous_aggregates:
- interval: 1 week
columns: [total_revenue, transaction_count]

Write behavior: The platform appends rows. Auto-creates the hypertable with the specified time column and chunk interval.

Read pattern: GET /api/v1/products/{name}/timeseries?from=2026-01-01&to=2026-01-31

When to use: Time-range queries, trend analysis, dashboards with time filters.

Config options:

FieldRequiredDefaultDescription
time_columnnoFirst date/timestamp columnColumn for hypertable partitioning
chunk_intervalno1 dayTimeseries chunk size
continuous_aggregatesnononeAuto-created aggregate views for faster queries

OLAP queries, aggregations, and dashboard backends. The analytics engine federates directly to the data lake on the object store — no data movement occurs.

endpoints:
- type: analytics
description: Territory-level aggregations for executive dashboards

Write behavior: No-op. The platform registers the data lake table in the analytics engine’s external catalog. The analytics engine queries the object store directly.

Read pattern: Three query paths:

  1. Superset drag-and-drop (no SQL required)
  2. Superset SQL Lab (write queries against the analytics engine)
  3. Any MySQL-wire-protocol client connecting directly to the analytics engine

When to use: OLAP, GROUP BY, bulk scans, dashboards, ad-hoc analytics. This is the most common endpoint for analytical use cases.

Sub-10ms reads for hot-path data. The platform writes with TTL-based expiry.

endpoints:
- type: realtime
description: Running totals refreshed every 15 minutes
config:
key_template: "outlet:{outlet_id}:daily:{sale_date}"
ttl: 24h
include_columns:
- outlet_id
- sale_date
- total_revenue
- transaction_count

Write behavior: The platform creates cache keys using the template and sets TTL. Keys are tenant-prefixed automatically.

Read pattern: GET /api/v1/products/{name}/realtime?key=outlet:123:daily:2026-01-15

When to use: Live dashboards, streaming hot path, any scenario where latency under 10ms is required.

Config options:

FieldRequiredDefaultDescription
key_templateyesCache key pattern. Use {column_name} placeholders.
ttlno24hTime-to-live. Format: 1h, 24h, 7d.
include_columnsnoAll columnsSubset of columns to store (reduces cache memory).

Every piece of data follows a four-level hierarchy: {tenant}/{domain}/{product}/{version}. This maps consistently across all stores:

StorePatternExample
Object Store (Data Lake)s3://akili/{tenant}/{domain}/{product}/data/s3://akili/fmcg-ea/sales/raw-orders/data/
Structured Storeschema={domain}, table={product}, RLS on tenant_idsales.raw_orders WHERE tenant_id = $1
Timeseries Storeschema={domain}, hypertable={product}, RLS on tenant_idsales.raw_orders
Analytics Enginecatalog=lakehouse_{tenant}, database={domain}, table={product}lakehouse_fmcg_ea.sales.raw_orders
Cache Storekey={tenant}:{domain}:{product}:{key_values}fmcg-ea:sales:raw-orders:nairobi

A single product can serve data through multiple stores simultaneously:

apiVersion: akili/v1
kind: Serving
endpoints:
# Portal detail pages
- type: lookup
description: Point lookups by outlet for detail pages
config:
index_columns: [outlet_id]
# Executive dashboards
- type: analytics
description: Territory-level OLAP queries
# Live dashboard tiles
- type: realtime
description: Running daily totals
config:
key_template: "outlet:{outlet_id}:daily:{sale_date}"
ttl: 24h
include_columns:
- outlet_id
- sale_date
- total_revenue
- transaction_count

After each materialization, the platform writes to all declared stores in parallel. The object store (data lake) is always written first (source of truth), then projections are updated.


Optional Superset integration for dashboard provisioning:

visualization:
enabled: true
dashboard_template: daily-sales-overview
refresh_interval: 15m

When enabled, the platform:

  1. Creates or updates a Superset dataset pointing to the serving store
  2. Applies RLS filters matching the tenant and classification
  3. Optionally provisions a dashboard from a platform-provided template
FieldRequiredDefaultDescription
visualization.enablednofalseEnable Superset integration
visualization.dashboard_templatenoPlatform-provided template name
visualization.refresh_intervalnoSuperset cache refresh interval

Access control is enforced at five points in the serving chain:

  1. API Gateway — JWT validation, rate limiting
  2. Control Plane — Classification clearance check
  3. Row-level security — Enforced on tenant_id
  4. Serving API — Column-level masking based on classification
  5. Superset RLS — Dashboard-level tenant isolation

The classification level in product.yaml determines who can query each endpoint:

ClassificationWho Can Query
publicAny authenticated user in the tenant
internalAny team member in the tenant
confidentialExplicit team grant required
restrictedNamed individuals only, audit logged

When output columns declare classifications (see output.yaml column classification field), the serving layer dynamically masks or omits sensitive columns based on consumer clearance:

output.yaml
schema:
- name: customer_id
type: string
role: identity
classification: pii.identifier
- name: customer_name
type: string
classification: pii.name
- name: total_orders
type: integer
role: measure
classification: public

A consumer with internal clearance querying this product sees total_orders but gets masked values for customer_id and customer_name.


Access PatternRecommended TypeLatency
”Give me entity X”lookup~5ms
”Show me last 30 days”timeseries~50ms
”Total revenue by territory”analytics~200ms
”Current value, sub-10ms”realtime~1ms
”Full historical scan”Data lake (no endpoint)~5s
  • Add index_columns for your most common query patterns
  • Use include_columns in realtime endpoints to reduce cache memory
  • Consider history_mode: type_2 only if consumers need to query historical states
  • The analytics engine federates to the data lake — query performance depends on the data lake file layout
  • Partition your output by the most common filter column (usually date)
  • The data lake’s predicate pushdown automatically skips irrelevant partitions
  • Keep include_columns to only what the hot path needs
  • Set ttl to match your refresh cycle (no point caching 24h of data if you refresh every 15m)
  • Use specific key templates to enable direct lookups without scans

# Product serving a REST API
apiVersion: akili/v1
kind: Serving
endpoints:
- type: lookup
description: Customer profiles for the mobile app
config:
index_columns: [customer_id]