Serving Configuration

The serving layer translates developer intent into consumer-facing data endpoints. You declare what you want in serving.yaml; the platform handles where data goes, how it gets there, and how consumers access it.

Architecture Overview

Every data product writes to the object store via the lakehouse format. This is the source of truth with full time-travel capability. Serving endpoints are additional projections — optimized read replicas populated by the platform after each materialization.

Materialization
    |
    v
Object Store / Data Lake (always written -- source of truth)
    |
    +---> Structured Store  (type: lookup)
    +---> Timeseries Store  (type: timeseries)
    +---> Analytics Engine   (type: analytics -- no data movement)
    +---> Cache Store        (type: realtime)

Endpoint Types

Lookup (Structured Store)

Point reads by primary key. Use when consumers need to fetch individual records by ID.

endpoints:
  - type: lookup
    description: Point lookups by outlet_id for the Portal
    config:
      index_columns:
        - outlet_id
        - sale_date

Write behavior: The platform upserts by primary key columns. Auto-creates the table, schema, and indexes. Row-level security is enforced on every operation.

Read pattern: GET /api/v1/products/{name}/latest?filter=outlet_id:123

When to use: Portal detail pages, key-value access, any “give me the current state of entity X” pattern.

Config options:

Field	Required	Default	Description
`index_columns`	no	Primary key columns	Columns to index in the structured store
`history_mode`	no	`current`	`current` = latest state only. `type_2` = SCD Type 2 with effective dates. `type_3` = previous-value columns.
`tracked_columns`	if history_mode is not `current`	—	Columns whose changes trigger a new historical record

Current state (default)
SCD Type 2 history

endpoints:
  - type: lookup
    config:
      index_columns: [customer_id]

Standard upsert. Only the latest state of each record is kept.

endpoints:
  - type: lookup
    config:
      index_columns: [customer_id]
      history_mode: type_2
      tracked_columns: [segment, tier, address]

Maintains a full history of changes. The platform auto-manages effective_from, effective_to, and is_current columns.

Timeseries (Timeseries Store)

Optimized for time-range queries and trend analysis. Uses hypertables with automatic chunking.

endpoints:
  - type: timeseries
    description: Hourly sales trends for territory dashboards
    config:
      time_column: sale_date
      chunk_interval: 1 day
      continuous_aggregates:
        - interval: 1 week
          columns: [total_revenue, transaction_count]

Write behavior: The platform appends rows. Auto-creates the hypertable with the specified time column and chunk interval.

Read pattern: GET /api/v1/products/{name}/timeseries?from=2026-01-01&to=2026-01-31

When to use: Time-range queries, trend analysis, dashboards with time filters.

Config options:

Field	Required	Default	Description
`time_column`	no	First `date`/`timestamp` column	Column for hypertable partitioning
`chunk_interval`	no	`1 day`	Timeseries chunk size
`continuous_aggregates`	no	none	Auto-created aggregate views for faster queries

Analytics (Analytics Engine)

OLAP queries, aggregations, and dashboard backends. The analytics engine federates directly to the data lake on the object store — no data movement occurs.

endpoints:
  - type: analytics
    description: Territory-level aggregations for executive dashboards

Write behavior: No-op. The platform registers the data lake table in the analytics engine’s external catalog. The analytics engine queries the object store directly.

Read pattern: Three query paths:

Superset drag-and-drop (no SQL required)
Superset SQL Lab (write queries against the analytics engine)
Any MySQL-wire-protocol client connecting directly to the analytics engine

When to use: OLAP, GROUP BY, bulk scans, dashboards, ad-hoc analytics. This is the most common endpoint for analytical use cases.

Realtime (Cache Store)

Sub-10ms reads for hot-path data. The platform writes with TTL-based expiry.

endpoints:
  - type: realtime
    description: Running totals refreshed every 15 minutes
    config:
      key_template: "outlet:{outlet_id}:daily:{sale_date}"
      ttl: 24h
      include_columns:
        - outlet_id
        - sale_date
        - total_revenue
        - transaction_count

Write behavior: The platform creates cache keys using the template and sets TTL. Keys are tenant-prefixed automatically.

Read pattern: GET /api/v1/products/{name}/realtime?key=outlet:123:daily:2026-01-15

When to use: Live dashboards, streaming hot path, any scenario where latency under 10ms is required.

Config options:

Field	Required	Default	Description
`key_template`	yes	—	Cache key pattern. Use `{column_name}` placeholders.
`ttl`	no	`24h`	Time-to-live. Format: `1h`, `24h`, `7d`.
`include_columns`	no	All columns	Subset of columns to store (reduces cache memory).

Namespace Convention

Every piece of data follows a four-level hierarchy: {tenant}/{domain}/{product}/{version}. This maps consistently across all stores:

Store	Pattern	Example
Object Store (Data Lake)	`s3://akili/{tenant}/{domain}/{product}/data/`	`s3://akili/fmcg-ea/sales/raw-orders/data/`
Structured Store	schema=`{domain}`, table=`{product}`, RLS on `tenant_id`	`sales.raw_orders WHERE tenant_id = $1`
Timeseries Store	schema=`{domain}`, hypertable=`{product}`, RLS on `tenant_id`	`sales.raw_orders`
Analytics Engine	catalog=`lakehouse_{tenant}`, database=`{domain}`, table=`{product}`	`lakehouse_fmcg_ea.sales.raw_orders`
Cache Store	key=`{tenant}:{domain}:{product}:{key_values}`	`fmcg-ea:sales:raw-orders:nairobi`

Multiple Endpoints

A single product can serve data through multiple stores simultaneously:

apiVersion: akili/v1
kind: Serving

endpoints:
  # Portal detail pages
  - type: lookup
    description: Point lookups by outlet for detail pages
    config:
      index_columns: [outlet_id]

  # Executive dashboards
  - type: analytics
    description: Territory-level OLAP queries

  # Live dashboard tiles
  - type: realtime
    description: Running daily totals
    config:
      key_template: "outlet:{outlet_id}:daily:{sale_date}"
      ttl: 24h
      include_columns:
        - outlet_id
        - sale_date
        - total_revenue
        - transaction_count

After each materialization, the platform writes to all declared stores in parallel. The object store (data lake) is always written first (source of truth), then projections are updated.

Visualization Integration

Optional Superset integration for dashboard provisioning:

visualization:
  enabled: true
  dashboard_template: daily-sales-overview
  refresh_interval: 15m

When enabled, the platform:

Creates or updates a Superset dataset pointing to the serving store
Applies RLS filters matching the tenant and classification
Optionally provisions a dashboard from a platform-provided template

Field	Required	Default	Description
`visualization.enabled`	no	`false`	Enable Superset integration
`visualization.dashboard_template`	no	—	Platform-provided template name
`visualization.refresh_interval`	no	—	Superset cache refresh interval

Access Control

Access control is enforced at five points in the serving chain:

API Gateway — JWT validation, rate limiting
Control Plane — Classification clearance check
Row-level security — Enforced on tenant_id
Serving API — Column-level masking based on classification
Superset RLS — Dashboard-level tenant isolation

The classification level in product.yaml determines who can query each endpoint:

Classification	Who Can Query
`public`	Any authenticated user in the tenant
`internal`	Any team member in the tenant
`confidential`	Explicit team grant required
`restricted`	Named individuals only, audit logged

Column-Level Security

When output columns declare classifications (see output.yaml column classification field), the serving layer dynamically masks or omits sensitive columns based on consumer clearance:

schema:
  - name: customer_id
    type: string
    role: identity
    classification: pii.identifier

  - name: customer_name
    type: string
    classification: pii.name

  - name: total_orders
    type: integer
    role: measure
    classification: public

A consumer with internal clearance querying this product sees total_orders but gets masked values for customer_id and customer_name.

Performance Tuning

Choosing the Right Endpoint

Access Pattern	Recommended Type	Latency
”Give me entity X”	`lookup`	~5ms
”Show me last 30 days”	`timeseries`	~50ms
”Total revenue by territory”	`analytics`	~200ms
”Current value, sub-10ms”	`realtime`	~1ms
”Full historical scan”	Data lake (no endpoint)	~5s

Lookup Optimization

Add index_columns for your most common query patterns
Use include_columns in realtime endpoints to reduce cache memory
Consider history_mode: type_2 only if consumers need to query historical states

Analytics Optimization

The analytics engine federates to the data lake — query performance depends on the data lake file layout
Partition your output by the most common filter column (usually date)
The data lake’s predicate pushdown automatically skips irrelevant partitions

Realtime Optimization

Keep include_columns to only what the hot path needs
Set ttl to match your refresh cycle (no point caching 24h of data if you refresh every 15m)
Use specific key templates to enable direct lookups without scans

Example Configurations

# Product serving a REST API
apiVersion: akili/v1
kind: Serving

endpoints:

- type: lookup
    description: Customer profiles for the mobile app
    config:
      index_columns: [customer_id]

# Product powering dashboards
apiVersion: akili/v1
kind: Serving

endpoints:
  - type: analytics
    description: Revenue analytics for Superset

visualization:
  enabled: true
  dashboard_template: revenue-overview
  refresh_interval: 15m

# Product serving multiple access patterns
apiVersion: akili/v1
kind: Serving

endpoints:

- type: lookup
    description: Portal detail pages
    config:
      index_columns: [outlet_id, sale_date]

- type: analytics
    description: Executive dashboards

- type: timeseries
    description: Trend analysis
    config:
      time_column: sale_date

- type: realtime
    description: Live tiles
    config:
      key_template: "outlet:{outlet_id}:current"
      ttl: 1h
      include_columns: [outlet_id, total_revenue, transaction_count]

Next Steps

Writing Manifests — Full serving.yaml field reference
Quality Rules — Quality gates that protect serving endpoints
Governance — Access control and classification rules
End-to-End Tutorial — See serving in a full lifecycle