Manifest Overview
Every data product in Akili is defined by 6 YAML manifest files plus a logic/ directory containing your SQL or Python. These manifests are the single source of truth — you write business logic, the platform handles orchestration, scheduling, and store routing.
Directory Structure
Section titled “Directory Structure”my-data-product/ product.yaml # Identity, domain, ownership, classification inputs.yaml # What this product consumes output.yaml # Schema of what this product produces serving.yaml # Intent-based store routing quality.yaml # Quality check definitions compute.yaml # Runtime, engine, schedule, resources logic/ transform.sql # Business logic (or .py) tests/ fixtures/ # Test fixture data (CSV/JSON)Scaffold a new product with:
akili init my-data-productValidate all files:
akili validate my-data-product/flowchart TD
PRODUCT[product.yaml -- Identity + Ownership] --> INPUTS[inputs.yaml -- What It Consumes]
PRODUCT --> OUTPUT[output.yaml -- What It Produces]
PRODUCT --> COMPUTE[compute.yaml -- Runtime Config]
INPUTS --> |Dependencies| OUTPUT
OUTPUT --> SERVING[serving.yaml -- Where It Is Served]
OUTPUT --> QUALITY[quality.yaml -- Quality Gates]
COMPUTE --> |Schedule + Resources| LOGIC[logic/ -- SQL or Python]
INPUTS --> |Available as tables| LOGIC
QUALITY --> |Blocking checks gate| SERVING
Manifest Files Overview
Section titled “Manifest Files Overview”| File | Kind | Purpose | Required |
|---|---|---|---|
product.yaml | DataProduct | Product identity, domain, ownership, classification | Yes |
inputs.yaml | Inputs | Dependencies on other products or external systems | Yes |
output.yaml | Output | Schema contract for produced data | Yes |
serving.yaml | Serving | Where and how output is accessible | Yes |
quality.yaml | Quality | Quality gates that run after every materialization | Yes |
compute.yaml | Compute | Runtime configuration, schedule, resources | Yes |
All files share a common header:
apiVersion: akili/v1kind: <ManifestKind>Validation
Section titled “Validation”akili validate <product-dir>/ performs comprehensive validation checks. Validation output is designed for agent consumption: one line per file, grep-friendly errors with ERROR: prefix.
Validation Modes
Section titled “Validation Modes”| Mode | Flag | Speed Target | What It Checks |
|---|---|---|---|
| Fast | --fast | <100ms | JSON Schema parse only. No cross-refs. |
| Standard | (default) | <2s | Full schema + all XVAL rules + schema evolution + dependency resolution + codegen dry-run |
| Deploy | --check-deploy | <3s | Standard + deploy readiness checks (classification, contracts, clearance) |
| Strict | --strict | <5s | All checks + style checks (description length, tag conventions) |
Cross-File Validation Rules
Section titled “Cross-File Validation Rules”| Rule ID | Description | Files Involved |
|---|---|---|
| XVAL-012 | At least one column must have role: identity | output |
| XVAL-013 | Columns with role: measure must have numeric type | output |
| XVAL-014 | event_key must reference valid upstream identity | output + inputs |
| XVAL-015 | Fixture keys must match declared input column names | quality + inputs |
| XVAL-016 | Fixture files must exist in tests/fixtures/ | quality + filesystem |
| XVAL-017 | Contract test producer must reference valid input | quality + inputs |
| XVAL-018 | semantic_intents column references must exist in schema | output |
| XVAL-019 | semantic_contract intents must match upstream declarations | inputs + upstream |
| XVAL-020 | semantic_contract tier_refs must exist in upstream tiers | inputs + upstream |
| XVAL-021 | {{ intent() }} template calls must reference input intents | compute + inputs |
| XVAL-022 | Tier values must be valid for column type | output |
| XVAL-023 | Identity columns cannot have deprecated: true | output |
| XVAL-024 | Identity columns cannot be removed across versions | output (version comparison) |
| XVAL-025 | Identity column types cannot change across versions | output (version comparison) |
| XVAL-026 | Non-deprecated columns cannot be removed without deprecation | output (version comparison) |
| XVAL-027 | Product domain cannot change on update | product (version comparison) |
| XVAL-028 | _correction_of must be type string, role attribute | output |
| XVAL-029 | _restatement_period requires _correction_of | output |
| XVAL-030 | Domain _platform is reserved for platform canaries | product |
| XVAL-031 | If any column has classification, all must | output |
| XVAL-032 | Product classification must be >= max column classification | product + output |
| XVAL-033 | Classification value must be from taxonomy | output |
Example Validation Output
Section titled “Example Validation Output”Success:
$ akili validate outlet-daily-sales/
product.yaml OK valid inputs.yaml OK valid (2 inputs, 0 unresolved) output.yaml OK valid (7 columns, 2 identity, 1 measure, 4 attribute) serving.yaml OK valid (3 endpoints) quality.yaml OK valid (6 checks: 4 error, 2 warn) compute.yaml OK valid (sql runtime, auto engine, cron schedule) logic/transform.sql OK syntax valid
Column roles OK XVAL-012,013,014,023 Reserved columns OK XVAL-028,029 Cross-references OK all columns resolve Classification OK propagation valid Schema evolution OK no breaking changes
RESULT: PASS (7 files, 14 rules checked, 0 errors, 0 warnings)Failure:
$ akili validate broken-product/
output.yaml FAIL XVAL-012: no column with role: identity XVAL-028: reserved column "_correction_of" must have type: string (got: integer) serving.yaml FAIL ERROR: key_template references "customer_id" not in output.yaml schema
RESULT: FAIL (7 files, 14 rules checked, 3 errors, 0 warnings)Multi-Tenancy
Section titled “Multi-Tenancy”Data products are scoped to tenants. The tenant context is implicit — it comes from the authenticated user’s JWT, not from the manifest files.
What the platform adds invisibly:
| Concern | Platform Action |
|---|---|
| Storage isolation | S3 paths: s3://akili/{tenant}/{domain}/{product}/data/ |
| Database isolation | PostgreSQL RLS: WHERE tenant_id = current_setting('app.tenant_id'), schema per domain |
| Cache isolation | Redis keys: {tenant}:{domain}:{product}:{key_values} |
| Event isolation | Redpanda topics: {tenant}.{domain}.{product}.events |
| Asset isolation | Dagster asset keys: ["{tenant}", "{domain}", "{product}"] |
| Compute isolation | K8s jobs tagged with tenant_id, resource quotas per tenant |
Developers never write tenant_id in their manifests or logic files.
Versioning
Section titled “Versioning”Semver Rules
Section titled “Semver Rules”| Bump | When | Examples |
|---|---|---|
MAJOR | Breaking output schema change | Column removed, type narrowed, column renamed |
MINOR | Additive change, backwards compatible | New nullable column, type widened, new serving endpoint |
PATCH | Logic-only change, same schema | Transform SQL updated, quality threshold adjusted |
Version Coexistence
Section titled “Version Coexistence”When a new major version is deployed, the previous version enters a deprecation period. Both versions run simultaneously during the deprecation_window (default: 90 days, max: 180 days).
| State | Meaning |
|---|---|
deployed | Active, receiving data, serving consumers |
deprecated | Still active but scheduled for retirement. Consumers see deprecation warnings. |
retired | No longer active. Data preserved for time-travel queries. Serving returns 410 Gone. |
During the deprecation window, the deprecated version’s API responses include a Sunset header (RFC 8594) with the retirement date.
Complete Example
Section titled “Complete Example”A full outlet-daily-sales data product:
product.yaml
Section titled “product.yaml”apiVersion: akili/v1kind: DataProductmetadata: name: outlet-daily-sales domain: retail version: 1.0.0 owner: retail-analytics description: > Aggregates raw POS transactions by outlet and day. Feeds executive dashboards and territory planning. tags: [sales, daily, fmcg] classification: internalinputs.yaml
Section titled “inputs.yaml”apiVersion: akili/v1kind: Inputsinputs: - id: cleaned-outlets type: data_product version: ">=1.0.0" timeout: 6h fallback: use_cached - id: cleaned-transactions type: data_product version: ">=2.0.0" timeout: 4h fallback: fail partition_mapping: same_dayoutput.yaml
Section titled “output.yaml”apiVersion: akili/v1kind: Outputschema: - { name: outlet_id, type: uuid, primary_key: true, role: identity } - { name: sale_date, type: date, primary_key: true, role: identity } - { name: total_revenue, type: "decimal(18,2)", nullable: false, role: measure } - { name: transaction_count, type: integer, nullable: false, role: measure } - { name: avg_basket_size, type: "decimal(10,2)", role: measure } - { name: territory_code, type: string, nullable: false, role: attribute } - { name: updated_at, type: timestamp, nullable: false, role: attribute }partitioning: - field: sale_date granularity: dayserving.yaml
Section titled “serving.yaml”apiVersion: akili/v1kind: Servingendpoints: - type: lookup config: index_columns: [outlet_id, sale_date] - type: analytics - type: realtime config: key_template: "outlet:{outlet_id}:daily:{sale_date}" ttl: 24h include_columns: [outlet_id, sale_date, total_revenue, transaction_count]visualization: enabled: true dashboard_template: daily-sales-overview refresh_interval: 15mquality.yaml
Section titled “quality.yaml”apiVersion: akili/v1kind: Qualitychecks: - { name: revenue_complete, type: completeness, config: { column: total_revenue, threshold: 0.99 }, severity: error } - { name: data_fresh, type: freshness, config: { column: updated_at, max_age: 6h }, severity: error } - { name: volume_check, type: volume, config: { min_rows: 50 }, severity: warn } - { name: revenue_positive, type: custom_expression, config: { expression: "total_revenue >= 0", threshold: 1.0 }, severity: error } - name: no_duplicate_keys type: uniqueness config: columns: [outlet_id, sale_date] severity: errorcompute.yaml
Section titled “compute.yaml”apiVersion: akili/v1kind: Computeruntime: sqlengine: autoschedule: type: cron expression: "0 6 * * *" timezone: Africa/Nairobiresources: cpu: "500m" memory: "1Gi" timeout: 30mentrypoint: logic/transform.sqlretry: max_attempts: 3 backoff: exponentiallogic/transform.sql
Section titled “logic/transform.sql”SELECT t.outlet_id, t.transaction_date AS sale_date, SUM(t.amount) AS total_revenue, COUNT(*) AS transaction_count, AVG(t.amount) AS avg_basket_size, o.territory_code, NOW() AS updated_atFROM cleaned_transactions tJOIN cleaned_outlets o ON t.outlet_id = o.outlet_idWHERE t.transaction_date = '{{ partition_key }}'GROUP BY t.outlet_id, t.transaction_date, o.territory_codeCross-References
Section titled “Cross-References”| Topic | Document |
|---|---|
| How manifests translate to Dagster primitives | D02: ORCHESTRATION.md |
| Serving store details, namespace convention | D03: SERVING.md |
| API endpoints for manifest operations | D04: API-SPEC.md |
| CLI commands (init, validate, deploy, run) | D05: CLI.md |
| Connector registry and ingestion details | D06: INPUT-PORTS.md |
| Governance, lineage, and OpenMetadata | D07: GOVERNANCE.md |