Skip to content
GitLab

output.yaml

Declares the shape of data this product produces. This is the contract with downstream consumers. Breaking changes require a major version bump.

apiVersion: akili/v1
kind: Output
schema:
- name: outlet_id
type: uuid
primary_key: true
role: identity
description: Unique outlet identifier
- name: sale_date
type: date
primary_key: true
role: identity
description: Date of aggregated sales
- name: total_revenue
type: "decimal(18,2)"
nullable: false
role: measure
description: Sum of all transaction amounts for the day
- name: transaction_count
type: integer
nullable: false
role: measure
description: Number of transactions
- name: avg_basket_size
type: "decimal(10,2)"
nullable: true
role: measure
description: Average transaction value (null if zero transactions)
- name: territory_code
type: string
nullable: false
role: attribute
description: Territory grouping code for the outlet
- name: updated_at
type: timestamp
nullable: false
role: attribute
description: When this row was last computed
format: parquet
partitioning:
- field: sale_date
granularity: day
FieldTypeRequiredDefaultValidationDescription
namestringYessnake_case. Must be unique within schema.Column name
typestringYesMust be a supported type (see below)Data type
primary_keyboolNofalseAt least one PK column required per schemaPart of the composite primary key
nullableboolNotruePK columns are implicitly non-nullableWhether the column accepts null values
descriptionstringNoDocumentation for Portal display
roleenumNoattributeidentity, attribute, measure, event_keySemantic role in the entity model
deprecatedboolNofalseCannot be true on identity columns (XVAL-023)Marks column for deprecation lifecycle
deprecated_messagestringNoGuidance for consumers on migration
classificationenumNoMust be from the classification taxonomy. All-or-nothing: if any column declares it, all must (XVAL-031).Column-level sensitivity classification
FieldTypeRequiredDefaultValidationDescription
schemaobject[]YesAt least one column. At least one primary_key: true. At least one role: identity.Column definitions
formatenumNoparquetparquet, avro, jsonInternal storage format on S3 object storage (Ceph RGW)
partitioningobject[]Nofield must reference a schema columnIceberg hidden partitioning
partitioning[].fieldstringYesMust reference a column in schemaPartition field
partitioning[].granularityenumYesday, month, yearPartition granularity
semantic_intentsmapNoKeys follow pattern [a-z][a-z0-9_]{0,62}Named semantic intent declarations (see below)
TypeMappingNotes
stringVARCHARVariable length
integerINT6464-bit signed
bigintINT64Alias for integer
smallintINT1616-bit signed
floatFLOAT64Double precision
decimal(p,s)DECIMALPrecision and scale required
booleanBOOLEAN
dateDATECalendar date
timestampTIMESTAMP_TZAlways stored with timezone (UTC)
uuidUUID
jsonJSONStored as string in Iceberg, parsed by serving stores
binaryBINARYFor opaque blobs
array<T>LISTNested type, e.g. array<string>
map<K,V>MAPKey-value, e.g. map<string,integer>

All types map to Apache Iceberg types internally (via Lakekeeper catalog). The platform handles translation to each serving store’s native types.

Every column carries a role that declares its semantic purpose. The platform uses roles to derive entity graphs, validate schemas, and enable downstream features.

RoleMeaningConstraints
identityNatural/business key that uniquely identifies an entityAt least one per schema. String, integer, or UUID types only. Immutable across versions — cannot be removed, renamed, or type-changed.
attributeDescriptive property of an entityNo constraints. Default role if omitted.
measureNumeric fact intended for aggregation (sum, avg, count)Must be numeric type: integer, bigint, float, decimal.
event_keyForeign key referencing another product’s identity columnMust reference a valid identity column in an input product.

Rules:

  1. At least one identity column required per schema (XVAL-012).
  2. measure columns must be numeric (XVAL-013).
  3. event_key must reference a valid upstream identity (XVAL-014).
  4. Identity columns cannot be deprecated (XVAL-023).
  5. Identity columns are immutable across versions (XVAL-024, XVAL-025).

Columns can optionally declare their sensitivity level, enabling column-level access control in the serving layer.

ClassificationCategoryDescriptionExample
pii.identifierPIIStable personal identityNational ID, SSN
pii.namePIIPersonal nameFirst name, last name
pii.contactPIIContact informationEmail, phone, address
business.confidentialBusinessBusiness-sensitiveRisk ratings, margins
business.internalBusinessInternal non-sensitiveSegments, categories
publicPublicNo restrictionCountry codes, currencies

Rules:

  • All-or-nothing (XVAL-031): if any column declares classification, all must.
  • Product classification must be >= max column classification (XVAL-032).
  • Values must be from the taxonomy above (XVAL-033).

Classification to product level mapping:

Max Column ClassificationMinimum Product Classification
pii.identifierrestricted
pii.namerestricted
pii.contactrestricted
business.confidentialconfidential
business.internalinternal
publicpublic

Upstream products declare named semantic intents that map business concepts to concrete columns. This decouples downstream consumers from specific column values.

semantic_intents:
customer_classification:
description: "Categorical grouping of customers by business value"
column: customer_segment
type: categorical
tiers:
high_value: [Engaged]
medium_value: [At-Risk]
low_value: [Dormant, New]
history:
- values: [Gold, Silver, Bronze]
methodology: "Revenue-based tiering"
valid_until: "2026-03-10"
- values: [Engaged, At-Risk, Dormant, New]
methodology: "Behavioral clustering model v2"
valid_from: "2026-03-11"
transaction_revenue:
description: "Net revenue per transaction line"
column: net_revenue
type: numeric_range
unit: currency
FieldTypeRequiredDescription
descriptionstringYesHuman-readable description of the business concept
columnstringYesMust reference a column name in the output schema (XVAL-018)
typeenumYescategorical, numeric_range, temporal, boolean
tiersmapNoNamed groupings. Keys are stable tier names; values are lists of concrete values.
historylistNoOrdered log (oldest first) of value domain changes for audit
unitstringNoUnit of measurement (e.g., currency, seconds, meters)
TypeUse CaseTier Semantics
categoricalDiscrete values with named groupingsTiers map to specific value sets
numeric_rangeContinuous values with thresholdsTiers define named ranges (e.g., high: [">1000"])
temporalTime-based classificationsTiers define time periods or recency windows
booleanBinary flagsTiers map to true/false with semantic names
FieldTypeRequiredDescription
valueslistYesSet of valid values during this period
methodologystringYesHow values are determined
valid_fromdateNoStart date (ISO 8601)
valid_untildateNoEnd date (ISO 8601)

The platform tracks schema changes across versions:

Change TypeVersion ImpactPlatform Action
Add nullable columnMinor bumpIceberg schema evolution, no rewrite
Add non-nullable column with defaultMinor bumpBackfill default value
Remove columnMajor bumpDownstream consumers notified, blocked until update
Rename columnMajor bumpTreated as remove + add
Change type (widening, e.g. int to bigint)Minor bumpIceberg type promotion
Change type (narrowing or incompatible)Major bumpRequires full rewrite
Remove/rename identity columnForbiddenCannot be done. Create a new product instead.
Change identity column typeForbiddenCannot be done. Identity types locked at creation.
active --> deprecated --> removed
(minor) (major)

To deprecate a column, add deprecated: true:

schema:
- name: old_segment_code
type: string
role: attribute
deprecated: true
deprecated_message: "Use customer_classification semantic intent instead. Removal in v3.0.0."

Products that handle corrections declare reserved columns:

schema:
- name: _correction_of
type: string
role: attribute
description: "References the identity value of the record being corrected"
- name: _restatement_period
type: string
role: attribute
description: "ISO 8601 period being restated (e.g., '2026-01')"

Rules:

  • _correction_of must be type string with role attribute (XVAL-028).
  • _restatement_period requires _correction_of to also be declared (XVAL-029).