Skip to content
GitLab

Manifest Overview

Every data product in Akili is defined by 6 YAML manifest files plus a logic/ directory containing your SQL or Python. These manifests are the single source of truth — you write business logic, the platform handles orchestration, scheduling, and store routing.

my-data-product/
product.yaml # Identity, domain, ownership, classification
inputs.yaml # What this product consumes
output.yaml # Schema of what this product produces
serving.yaml # Intent-based store routing
quality.yaml # Quality check definitions
compute.yaml # Runtime, engine, schedule, resources
logic/
transform.sql # Business logic (or .py)
tests/
fixtures/ # Test fixture data (CSV/JSON)

Scaffold a new product with:

Terminal window
akili init my-data-product

Validate all files:

Terminal window
akili validate my-data-product/
flowchart TD
    PRODUCT[product.yaml -- Identity + Ownership] --> INPUTS[inputs.yaml -- What It Consumes]
    PRODUCT --> OUTPUT[output.yaml -- What It Produces]
    PRODUCT --> COMPUTE[compute.yaml -- Runtime Config]

    INPUTS --> |Dependencies| OUTPUT
    OUTPUT --> SERVING[serving.yaml -- Where It Is Served]
    OUTPUT --> QUALITY[quality.yaml -- Quality Gates]

    COMPUTE --> |Schedule + Resources| LOGIC[logic/ -- SQL or Python]
    INPUTS --> |Available as tables| LOGIC

    QUALITY --> |Blocking checks gate| SERVING
FileKindPurposeRequired
product.yamlDataProductProduct identity, domain, ownership, classificationYes
inputs.yamlInputsDependencies on other products or external systemsYes
output.yamlOutputSchema contract for produced dataYes
serving.yamlServingWhere and how output is accessibleYes
quality.yamlQualityQuality gates that run after every materializationYes
compute.yamlComputeRuntime configuration, schedule, resourcesYes

All files share a common header:

apiVersion: akili/v1
kind: <ManifestKind>

akili validate <product-dir>/ performs comprehensive validation checks. Validation output is designed for agent consumption: one line per file, grep-friendly errors with ERROR: prefix.

ModeFlagSpeed TargetWhat It Checks
Fast--fast<100msJSON Schema parse only. No cross-refs.
Standard(default)<2sFull schema + all XVAL rules + schema evolution + dependency resolution + codegen dry-run
Deploy--check-deploy<3sStandard + deploy readiness checks (classification, contracts, clearance)
Strict--strict<5sAll checks + style checks (description length, tag conventions)
Rule IDDescriptionFiles Involved
XVAL-012At least one column must have role: identityoutput
XVAL-013Columns with role: measure must have numeric typeoutput
XVAL-014event_key must reference valid upstream identityoutput + inputs
XVAL-015Fixture keys must match declared input column namesquality + inputs
XVAL-016Fixture files must exist in tests/fixtures/quality + filesystem
XVAL-017Contract test producer must reference valid inputquality + inputs
XVAL-018semantic_intents column references must exist in schemaoutput
XVAL-019semantic_contract intents must match upstream declarationsinputs + upstream
XVAL-020semantic_contract tier_refs must exist in upstream tiersinputs + upstream
XVAL-021{{ intent() }} template calls must reference input intentscompute + inputs
XVAL-022Tier values must be valid for column typeoutput
XVAL-023Identity columns cannot have deprecated: trueoutput
XVAL-024Identity columns cannot be removed across versionsoutput (version comparison)
XVAL-025Identity column types cannot change across versionsoutput (version comparison)
XVAL-026Non-deprecated columns cannot be removed without deprecationoutput (version comparison)
XVAL-027Product domain cannot change on updateproduct (version comparison)
XVAL-028_correction_of must be type string, role attributeoutput
XVAL-029_restatement_period requires _correction_ofoutput
XVAL-030Domain _platform is reserved for platform canariesproduct
XVAL-031If any column has classification, all mustoutput
XVAL-032Product classification must be >= max column classificationproduct + output
XVAL-033Classification value must be from taxonomyoutput

Success:

$ akili validate outlet-daily-sales/
product.yaml OK valid
inputs.yaml OK valid (2 inputs, 0 unresolved)
output.yaml OK valid (7 columns, 2 identity, 1 measure, 4 attribute)
serving.yaml OK valid (3 endpoints)
quality.yaml OK valid (6 checks: 4 error, 2 warn)
compute.yaml OK valid (sql runtime, auto engine, cron schedule)
logic/transform.sql OK syntax valid
Column roles OK XVAL-012,013,014,023
Reserved columns OK XVAL-028,029
Cross-references OK all columns resolve
Classification OK propagation valid
Schema evolution OK no breaking changes
RESULT: PASS (7 files, 14 rules checked, 0 errors, 0 warnings)

Failure:

$ akili validate broken-product/
output.yaml FAIL
XVAL-012: no column with role: identity
XVAL-028: reserved column "_correction_of" must have type: string (got: integer)
serving.yaml FAIL
ERROR: key_template references "customer_id" not in output.yaml schema
RESULT: FAIL (7 files, 14 rules checked, 3 errors, 0 warnings)

Data products are scoped to tenants. The tenant context is implicit — it comes from the authenticated user’s JWT, not from the manifest files.

What the platform adds invisibly:

ConcernPlatform Action
Storage isolationS3 paths: s3://akili/{tenant}/{domain}/{product}/data/
Database isolationPostgreSQL RLS: WHERE tenant_id = current_setting('app.tenant_id'), schema per domain
Cache isolationRedis keys: {tenant}:{domain}:{product}:{key_values}
Event isolationRedpanda topics: {tenant}.{domain}.{product}.events
Asset isolationDagster asset keys: ["{tenant}", "{domain}", "{product}"]
Compute isolationK8s jobs tagged with tenant_id, resource quotas per tenant

Developers never write tenant_id in their manifests or logic files.


BumpWhenExamples
MAJORBreaking output schema changeColumn removed, type narrowed, column renamed
MINORAdditive change, backwards compatibleNew nullable column, type widened, new serving endpoint
PATCHLogic-only change, same schemaTransform SQL updated, quality threshold adjusted

When a new major version is deployed, the previous version enters a deprecation period. Both versions run simultaneously during the deprecation_window (default: 90 days, max: 180 days).

StateMeaning
deployedActive, receiving data, serving consumers
deprecatedStill active but scheduled for retirement. Consumers see deprecation warnings.
retiredNo longer active. Data preserved for time-travel queries. Serving returns 410 Gone.

During the deprecation window, the deprecated version’s API responses include a Sunset header (RFC 8594) with the retirement date.


A full outlet-daily-sales data product:

apiVersion: akili/v1
kind: DataProduct
metadata:
name: outlet-daily-sales
domain: retail
version: 1.0.0
owner: retail-analytics
description: >
Aggregates raw POS transactions by outlet and day.
Feeds executive dashboards and territory planning.
tags: [sales, daily, fmcg]
classification: internal
apiVersion: akili/v1
kind: Inputs
inputs:
- id: cleaned-outlets
type: data_product
version: ">=1.0.0"
timeout: 6h
fallback: use_cached
- id: cleaned-transactions
type: data_product
version: ">=2.0.0"
timeout: 4h
fallback: fail
partition_mapping: same_day
apiVersion: akili/v1
kind: Output
schema:
- { name: outlet_id, type: uuid, primary_key: true, role: identity }
- { name: sale_date, type: date, primary_key: true, role: identity }
- { name: total_revenue, type: "decimal(18,2)", nullable: false, role: measure }
- { name: transaction_count, type: integer, nullable: false, role: measure }
- { name: avg_basket_size, type: "decimal(10,2)", role: measure }
- { name: territory_code, type: string, nullable: false, role: attribute }
- { name: updated_at, type: timestamp, nullable: false, role: attribute }
partitioning:
- field: sale_date
granularity: day
apiVersion: akili/v1
kind: Serving
endpoints:
- type: lookup
config:
index_columns: [outlet_id, sale_date]
- type: analytics
- type: realtime
config:
key_template: "outlet:{outlet_id}:daily:{sale_date}"
ttl: 24h
include_columns: [outlet_id, sale_date, total_revenue, transaction_count]
visualization:
enabled: true
dashboard_template: daily-sales-overview
refresh_interval: 15m
apiVersion: akili/v1
kind: Quality
checks:
- { name: revenue_complete, type: completeness, config: { column: total_revenue, threshold: 0.99 }, severity: error }
- { name: data_fresh, type: freshness, config: { column: updated_at, max_age: 6h }, severity: error }
- { name: volume_check, type: volume, config: { min_rows: 50 }, severity: warn }
- { name: revenue_positive, type: custom_expression, config: { expression: "total_revenue >= 0", threshold: 1.0 }, severity: error }
- name: no_duplicate_keys
type: uniqueness
config:
columns: [outlet_id, sale_date]
severity: error
apiVersion: akili/v1
kind: Compute
runtime: sql
engine: auto
schedule:
type: cron
expression: "0 6 * * *"
timezone: Africa/Nairobi
resources:
cpu: "500m"
memory: "1Gi"
timeout: 30m
entrypoint: logic/transform.sql
retry:
max_attempts: 3
backoff: exponential
SELECT
t.outlet_id,
t.transaction_date AS sale_date,
SUM(t.amount) AS total_revenue,
COUNT(*) AS transaction_count,
AVG(t.amount) AS avg_basket_size,
o.territory_code,
NOW() AS updated_at
FROM cleaned_transactions t
JOIN cleaned_outlets o ON t.outlet_id = o.outlet_id
WHERE t.transaction_date = '{{ partition_key }}'
GROUP BY t.outlet_id, t.transaction_date, o.territory_code

TopicDocument
How manifests translate to Dagster primitivesD02: ORCHESTRATION.md
Serving store details, namespace conventionD03: SERVING.md
API endpoints for manifest operationsD04: API-SPEC.md
CLI commands (init, validate, deploy, run)D05: CLI.md
Connector registry and ingestion detailsD06: INPUT-PORTS.md
Governance, lineage, and OpenMetadataD07: GOVERNANCE.md