Product & Inputs
product.yaml — Identity
Section titled “product.yaml — Identity”This file changes least frequently. It declares who owns the product and its sensitivity level.
apiVersion: akili/v1kind: DataProduct
metadata: name: user-events # unique within tenant+domain domain: analytics # DDD bounded context version: 1.0.0 # semver owner: platform-team # registered team identifier description: > Captures user interaction events from the web application. Source of truth for user behavior analytics. tags: - events - user-behavior - clickstream classification: internal # public | internal | confidential | restricted contacts: - name: Alex Ochieng role: product-owner email: alex.ochieng@example.comKey fields:
| Field | Rules |
|---|---|
metadata.name | Lowercase, hyphens allowed. Pattern: [a-z0-9][a-z0-9-]*[a-z0-9]. Max 63 chars. Must be unique within tenant+domain. |
metadata.domain | DDD bounded context. Starts with a letter, [a-z][a-z0-9-] (max 63 chars). Auto-created on first use — no separate “create domain” step. |
metadata.version | Semver MAJOR.MINOR.PATCH. Breaking output schema changes require a major bump. |
metadata.owner | Must match a registered team in the platform. |
metadata.description | Minimum 10 characters. |
metadata.classification | Drives access control. See classification levels below. |
Classification levels:
| Level | Who Can Access | Propagation Rule |
|---|---|---|
public | Any authenticated user in the tenant | — |
internal | Any team member in the tenant | — |
confidential | Explicit team grant required | Output >= max(input classifications) |
restricted | Named individuals only, audit logged | Output >= max(input classifications) |
Caution: Classification follows the high-water mark rule: if any input is
confidential, your output cannot bepublicorinternal. The platform enforces this at deploy time to prevent data laundering through aggregation.
Optional retention policy:
retention: period: "365d" # how long to retain data basis: created_at # which timestamp determines age review_date: "2026-06-01"inputs.yaml — Dependencies
Section titled “inputs.yaml — Dependencies”Declares what this product consumes. Inputs are either other data products (internal) or external systems via connectors.
Source-aligned example:
apiVersion: akili/v1kind: Inputs
inputs: - id: raw-user-events type: connector connector_ref: webapp-postgres # registered by platform admin ingestion_strategy: cdc ingestion_config: primary_key: event_id timeout: 2h fallback: fail
defaults: timeout: 4h fallback: failAggregate example:
apiVersion: akili/v1kind: Inputs
inputs: - id: cleaned-users type: data_product version: ">=1.0.0" # semver range timeout: 6h fallback: skip
- id: cleaned-events type: data_product version: ">=2.0.0" timeout: 4h fallback: fail partition_mapping: same_day # same_day | previous_day | custom
defaults: timeout: 4h fallback: failConsumer-aligned example:
apiVersion: akili/v1kind: Inputs
inputs: - id: user-behavior-aggregate type: data_product version: ">=1.0.0" timeout: 2h fallback: use_cachedInput fields for type: data_product:
| Field | Required | Description |
|---|---|---|
id | yes | Must match a metadata.name in another product’s product.yaml |
version | no | Semver range (>=1.0.0, ~1.2.0, ^2.0.0). Default: latest |
timeout | no | Max wait before fallback. Format: 30m, 4h, 1d |
fallback | no | skip — proceed without. fail — abort. use_cached — use last successful run |
optional | no | If true, product can execute without this input |
partition_mapping | no | same_day, previous_day, or custom |
Input fields for type: connector:
| Field | Required | Description |
|---|---|---|
connector_ref | yes | Name of a connection registered by a platform admin |
ingestion_strategy | yes | full_refresh, incremental, cdc, event, file_upload, api_poll |
ingestion_config | varies | Strategy-specific: cursor_field, primary_key, topic, etc. |
Ingestion strategies:
| Strategy | Use Case | Required Config |
|---|---|---|
full_refresh | Small reference tables | None |
incremental | Append/upsert based on cursor | cursor_field, primary_key |
cdc | Change Data Capture via database log | primary_key |
event | Consume from Redpanda topic | topic |
file_upload | Manual CSV/Parquet upload | format |
api_poll | HTTP API polling | url, method |