output.yaml
Declares the shape of data this product produces. This is the contract with downstream consumers. Breaking changes require a major version bump.
Example
Section titled “Example”apiVersion: akili/v1kind: Output
schema: - name: outlet_id type: uuid primary_key: true role: identity description: Unique outlet identifier
- name: sale_date type: date primary_key: true role: identity description: Date of aggregated sales
- name: total_revenue type: "decimal(18,2)" nullable: false role: measure description: Sum of all transaction amounts for the day
- name: transaction_count type: integer nullable: false role: measure description: Number of transactions
- name: avg_basket_size type: "decimal(10,2)" nullable: true role: measure description: Average transaction value (null if zero transactions)
- name: territory_code type: string nullable: false role: attribute description: Territory grouping code for the outlet
- name: updated_at type: timestamp nullable: false role: attribute description: When this row was last computed
format: parquetpartitioning: - field: sale_date granularity: dayField Reference — Column Definition
Section titled “Field Reference — Column Definition”| Field | Type | Required | Default | Validation | Description |
|---|---|---|---|---|---|
name | string | Yes | — | snake_case. Must be unique within schema. | Column name |
type | string | Yes | — | Must be a supported type (see below) | Data type |
primary_key | bool | No | false | At least one PK column required per schema | Part of the composite primary key |
nullable | bool | No | true | PK columns are implicitly non-nullable | Whether the column accepts null values |
description | string | No | — | — | Documentation for Portal display |
role | enum | No | attribute | identity, attribute, measure, event_key | Semantic role in the entity model |
deprecated | bool | No | false | Cannot be true on identity columns (XVAL-023) | Marks column for deprecation lifecycle |
deprecated_message | string | No | — | — | Guidance for consumers on migration |
classification | enum | No | — | Must be from the classification taxonomy. All-or-nothing: if any column declares it, all must (XVAL-031). | Column-level sensitivity classification |
Field Reference — Output-Level Fields
Section titled “Field Reference — Output-Level Fields”| Field | Type | Required | Default | Validation | Description |
|---|---|---|---|---|---|
schema | object[] | Yes | — | At least one column. At least one primary_key: true. At least one role: identity. | Column definitions |
format | enum | No | parquet | parquet, avro, json | Internal storage format on S3 object storage (Ceph RGW) |
partitioning | object[] | No | — | field must reference a schema column | Iceberg hidden partitioning |
partitioning[].field | string | Yes | — | Must reference a column in schema | Partition field |
partitioning[].granularity | enum | Yes | — | day, month, year | Partition granularity |
semantic_intents | map | No | — | Keys follow pattern [a-z][a-z0-9_]{0,62} | Named semantic intent declarations (see below) |
Supported Data Types
Section titled “Supported Data Types”| Type | Mapping | Notes |
|---|---|---|
string | VARCHAR | Variable length |
integer | INT64 | 64-bit signed |
bigint | INT64 | Alias for integer |
smallint | INT16 | 16-bit signed |
float | FLOAT64 | Double precision |
decimal(p,s) | DECIMAL | Precision and scale required |
boolean | BOOLEAN | |
date | DATE | Calendar date |
timestamp | TIMESTAMP_TZ | Always stored with timezone (UTC) |
uuid | UUID | |
json | JSON | Stored as string in Iceberg, parsed by serving stores |
binary | BINARY | For opaque blobs |
array<T> | LIST | Nested type, e.g. array<string> |
map<K,V> | MAP | Key-value, e.g. map<string,integer> |
All types map to Apache Iceberg types internally (via Lakekeeper catalog). The platform handles translation to each serving store’s native types.
Column Roles
Section titled “Column Roles”Every column carries a role that declares its semantic purpose. The platform uses roles to derive entity graphs, validate schemas, and enable downstream features.
| Role | Meaning | Constraints |
|---|---|---|
identity | Natural/business key that uniquely identifies an entity | At least one per schema. String, integer, or UUID types only. Immutable across versions — cannot be removed, renamed, or type-changed. |
attribute | Descriptive property of an entity | No constraints. Default role if omitted. |
measure | Numeric fact intended for aggregation (sum, avg, count) | Must be numeric type: integer, bigint, float, decimal. |
event_key | Foreign key referencing another product’s identity column | Must reference a valid identity column in an input product. |
Rules:
- At least one
identitycolumn required per schema (XVAL-012). measurecolumns must be numeric (XVAL-013).event_keymust reference a valid upstream identity (XVAL-014).- Identity columns cannot be deprecated (XVAL-023).
- Identity columns are immutable across versions (XVAL-024, XVAL-025).
Column Classification
Section titled “Column Classification”Columns can optionally declare their sensitivity level, enabling column-level access control in the serving layer.
| Classification | Category | Description | Example |
|---|---|---|---|
pii.identifier | PII | Stable personal identity | National ID, SSN |
pii.name | PII | Personal name | First name, last name |
pii.contact | PII | Contact information | Email, phone, address |
business.confidential | Business | Business-sensitive | Risk ratings, margins |
business.internal | Business | Internal non-sensitive | Segments, categories |
public | Public | No restriction | Country codes, currencies |
Rules:
- All-or-nothing (XVAL-031): if any column declares classification, all must.
- Product classification must be >= max column classification (XVAL-032).
- Values must be from the taxonomy above (XVAL-033).
Classification to product level mapping:
| Max Column Classification | Minimum Product Classification |
|---|---|
pii.identifier | restricted |
pii.name | restricted |
pii.contact | restricted |
business.confidential | confidential |
business.internal | internal |
public | public |
Semantic Intents
Section titled “Semantic Intents”Upstream products declare named semantic intents that map business concepts to concrete columns. This decouples downstream consumers from specific column values.
semantic_intents: customer_classification: description: "Categorical grouping of customers by business value" column: customer_segment type: categorical tiers: high_value: [Engaged] medium_value: [At-Risk] low_value: [Dormant, New] history: - values: [Gold, Silver, Bronze] methodology: "Revenue-based tiering" valid_until: "2026-03-10" - values: [Engaged, At-Risk, Dormant, New] methodology: "Behavioral clustering model v2" valid_from: "2026-03-11"
transaction_revenue: description: "Net revenue per transaction line" column: net_revenue type: numeric_range unit: currencyIntent Fields
Section titled “Intent Fields”| Field | Type | Required | Description |
|---|---|---|---|
description | string | Yes | Human-readable description of the business concept |
column | string | Yes | Must reference a column name in the output schema (XVAL-018) |
type | enum | Yes | categorical, numeric_range, temporal, boolean |
tiers | map | No | Named groupings. Keys are stable tier names; values are lists of concrete values. |
history | list | No | Ordered log (oldest first) of value domain changes for audit |
unit | string | No | Unit of measurement (e.g., currency, seconds, meters) |
Intent Types
Section titled “Intent Types”| Type | Use Case | Tier Semantics |
|---|---|---|
categorical | Discrete values with named groupings | Tiers map to specific value sets |
numeric_range | Continuous values with thresholds | Tiers define named ranges (e.g., high: [">1000"]) |
temporal | Time-based classifications | Tiers define time periods or recency windows |
boolean | Binary flags | Tiers map to true/false with semantic names |
History Entry Fields
Section titled “History Entry Fields”| Field | Type | Required | Description |
|---|---|---|---|
values | list | Yes | Set of valid values during this period |
methodology | string | Yes | How values are determined |
valid_from | date | No | Start date (ISO 8601) |
valid_until | date | No | End date (ISO 8601) |
Schema Evolution
Section titled “Schema Evolution”The platform tracks schema changes across versions:
| Change Type | Version Impact | Platform Action |
|---|---|---|
| Add nullable column | Minor bump | Iceberg schema evolution, no rewrite |
| Add non-nullable column with default | Minor bump | Backfill default value |
| Remove column | Major bump | Downstream consumers notified, blocked until update |
| Rename column | Major bump | Treated as remove + add |
| Change type (widening, e.g. int to bigint) | Minor bump | Iceberg type promotion |
| Change type (narrowing or incompatible) | Major bump | Requires full rewrite |
| Remove/rename identity column | Forbidden | Cannot be done. Create a new product instead. |
| Change identity column type | Forbidden | Cannot be done. Identity types locked at creation. |
Attribute Deprecation Lifecycle
Section titled “Attribute Deprecation Lifecycle”active --> deprecated --> removed (minor) (major)To deprecate a column, add deprecated: true:
schema: - name: old_segment_code type: string role: attribute deprecated: true deprecated_message: "Use customer_classification semantic intent instead. Removal in v3.0.0."Correction Events
Section titled “Correction Events”Products that handle corrections declare reserved columns:
schema: - name: _correction_of type: string role: attribute description: "References the identity value of the record being corrected" - name: _restatement_period type: string role: attribute description: "ISO 8601 period being restated (e.g., '2026-01')"Rules:
_correction_ofmust be typestringwith roleattribute(XVAL-028)._restatement_periodrequires_correction_ofto also be declared (XVAL-029).