Step 1: Declare
Step 1: Initialize the Project
Section titled “Step 1: Initialize the Project”Scaffold the product directory with all 6 manifest files:
akili init customer-orders --archetype aggregateThis creates:
customer-orders/ product.yaml inputs.yaml output.yaml serving.yaml quality.yaml compute.yaml logic/ transform.sqlStep 2: Write the Manifests
Section titled “Step 2: Write the Manifests”Replace the scaffolded files with the following content.
product.yaml — Identity and Ownership
Section titled “product.yaml — Identity and Ownership”apiVersion: akili/v1kind: DataProduct
metadata: name: customer-orders domain: commerce version: 1.0.0 owner: analytics-team description: > Aggregates cleaned customer data with order events to produce per-customer order metrics. Powers the customer insights dashboard and the mobile app customer profile. tags: - customers - orders - commerce classification: internal contacts: - name: Amina Wanjiku role: product-owner email: amina.wanjiku@example.com
retention: period: "730d" basis: event_time review_date: "2027-01-01"Key decisions:
domain: commerce— groups this product with other commerce data productsclassification: internal— accessible to all team members (no PII in the output)retention: 730d— two years of data retention
inputs.yaml — Dependencies
Section titled “inputs.yaml — Dependencies”apiVersion: akili/v1kind: Inputs
inputs: - id: cleaned-customers type: data_product version: ">=1.0.0" timeout: 6h fallback: use_cached optional: false
- id: cleaned-orders type: data_product version: ">=1.0.0" timeout: 4h fallback: fail partition_mapping: same_day
defaults: timeout: 4h fallback: failKey decisions:
cleaned-customersusesfallback: use_cached— if the customer data is late, use the last successful version rather than blockingcleaned-ordersusesfallback: fail— order data is critical, we cannot proceed without itpartition_mapping: same_day— orders and the output share the same daily partition
output.yaml — Schema Contract
Section titled “output.yaml — Schema Contract”apiVersion: akili/v1kind: Output
schema: - name: customer_id type: string primary_key: true role: identity nullable: false description: Unique customer identifier
- name: customer_name type: string nullable: false role: attribute description: Customer display name
- name: email type: string nullable: true role: attribute description: Customer email address
- name: order_date type: date primary_key: true role: attribute nullable: false description: Date of aggregated orders
- name: total_orders type: integer nullable: false role: measure description: Number of orders placed on this date
- name: total_revenue type: "decimal(18,2)" nullable: false role: measure description: Sum of order amounts for the day
- name: avg_order_value type: "decimal(10,2)" nullable: true role: measure description: Average order value (null if zero orders)
- name: first_order_at type: timestamp nullable: true role: attribute description: Timestamp of first order on this date
- name: last_order_at type: timestamp nullable: true role: attribute description: Timestamp of last order on this date
- name: updated_at type: timestamp nullable: false description: When this row was last computed
format: parquetpartitioning: - field: order_date granularity: dayKey decisions:
- Composite primary key:
customer_id+order_date(one row per customer per day) customer_idhasrole: identity— it is the entity’s natural key- Numeric aggregation columns use
role: measure— enabling correct aggregation in analytics tools - Partitioned by
order_datefor efficient time-range queries
serving.yaml — Endpoint Configuration
Section titled “serving.yaml — Endpoint Configuration”apiVersion: akili/v1kind: Serving
endpoints: - type: lookup description: Customer order details for Portal profile pages config: index_columns: - customer_id - order_date
- type: analytics description: Customer order analytics for Superset dashboards
- type: realtime description: Today's order totals per customer for live dashboard config: key_template: "customer:{customer_id}:orders:{order_date}" ttl: 24h include_columns: - customer_id - order_date - total_orders - total_revenue
visualization: enabled: true dashboard_template: customer-orders-overview refresh_interval: 15mThree serving endpoints:
- Lookup — Portal fetches individual customer profiles by ID
- Analytics — StarRocks federation for OLAP dashboards (no data movement)
- Realtime — Redis for live dashboard tiles with 24h TTL
quality.yaml — Quality Gates
Section titled “quality.yaml — Quality Gates”apiVersion: akili/v1kind: Quality
checks: # Primary key integrity - name: customer_id_complete type: completeness config: column: customer_id threshold: 1.0 severity: error
- name: unique_customer_day type: uniqueness config: columns: - customer_id - order_date severity: error
# Freshness - name: data_freshness type: freshness config: column: updated_at max_age: 6h severity: error
# Value constraints - name: revenue_non_negative type: range config: column: total_revenue min: 0 severity: error
- name: orders_non_negative type: range config: column: total_orders min: 0 severity: error
# Referential integrity - name: valid_customer_ref type: referential config: column: customer_id reference_product: cleaned-customers reference_column: customer_id severity: error
# Volume sanity - name: row_volume type: volume config: min_rows: 10 max_rows: 10000000 severity: warn
# Business logic - name: avg_matches_total type: custom_expression config: expression: "total_orders = 0 OR avg_order_value IS NOT NULL" threshold: 1.0 severity: error
# Cross-check via SQL - name: revenue_consistent type: custom_sql sql: | SELECT COUNT(*) as inconsistent FROM {output} WHERE total_orders > 0 AND total_revenue <= 0 expect: "inconsistent = 0" severity: errorNine quality checks covering:
- Primary key integrity (completeness + uniqueness)
- Data freshness (within 6 hours)
- Value constraints (non-negative revenue and orders)
- Referential integrity (customers exist in upstream)
- Volume sanity (warning on anomalies)
- Business logic (average order value consistency)
- Cross-check (revenue should be positive when orders exist)
compute.yaml — Runtime Configuration
Section titled “compute.yaml — Runtime Configuration”apiVersion: akili/v1kind: Compute
runtime: sqlmode: transformengine: auto
schedule: type: event
resources: cpu: "1" memory: "2Gi" timeout: 30m
entrypoint: logic/transform.sql
retry: max_attempts: 3 backoff: exponential initial_delay: 30sKey decisions:
schedule.type: event— runs when both inputs are materialized (not on a cron)engine: auto— platform selects DuckDB or Spark based on input sizeresources— adequate for a medium-sized join
logic/transform.sql — Business Logic
Section titled “logic/transform.sql — Business Logic”-- Inputs available as tables matching their ids from inputs.yaml:-- cleaned_customers, cleaned_orders
SELECT c.customer_id, c.customer_name, c.email, o.order_date, COUNT(o.order_id) AS total_orders, COALESCE(SUM(o.amount), 0) AS total_revenue, CASE WHEN COUNT(o.order_id) > 0 THEN SUM(o.amount) / COUNT(o.order_id) ELSE NULL END AS avg_order_value, MIN(o.order_timestamp) AS first_order_at, MAX(o.order_timestamp) AS last_order_at, NOW() AS updated_atFROM cleaned_customers cLEFT JOIN cleaned_orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.customer_name, c.email, o.order_dateNote: Input table names match the
idfields frominputs.yaml, with hyphens converted to underscores.cleaned-customersbecomescleaned_customers.