Skip to content
GitLab

Product & Inputs

This file changes least frequently. It declares who owns the product and its sensitivity level.

apiVersion: akili/v1
kind: DataProduct
metadata:
name: user-events # unique within tenant+domain
domain: analytics # DDD bounded context
version: 1.0.0 # semver
owner: platform-team # registered team identifier
description: >
Captures user interaction events from the web application.
Source of truth for user behavior analytics.
tags:
- events
- user-behavior
- clickstream
classification: internal # public | internal | confidential | restricted
contacts:
- name: Alex Ochieng
role: product-owner
email: alex.ochieng@example.com

Key fields:

FieldRules
metadata.nameLowercase, hyphens allowed. Pattern: [a-z0-9][a-z0-9-]*[a-z0-9]. Max 63 chars. Must be unique within tenant+domain.
metadata.domainDDD bounded context. Starts with a letter, [a-z][a-z0-9-] (max 63 chars). Auto-created on first use — no separate “create domain” step.
metadata.versionSemver MAJOR.MINOR.PATCH. Breaking output schema changes require a major bump.
metadata.ownerMust match a registered team in the platform.
metadata.descriptionMinimum 10 characters.
metadata.classificationDrives access control. See classification levels below.

Classification levels:

LevelWho Can AccessPropagation Rule
publicAny authenticated user in the tenant
internalAny team member in the tenant
confidentialExplicit team grant requiredOutput >= max(input classifications)
restrictedNamed individuals only, audit loggedOutput >= max(input classifications)

Caution: Classification follows the high-water mark rule: if any input is confidential, your output cannot be public or internal. The platform enforces this at deploy time to prevent data laundering through aggregation.

Optional retention policy:

retention:
period: "365d" # how long to retain data
basis: created_at # which timestamp determines age
review_date: "2026-06-01"

Declares what this product consumes. Inputs are either other data products (internal) or external systems via connectors.

Source-aligned example:

apiVersion: akili/v1
kind: Inputs
inputs:
- id: raw-user-events
type: connector
connector_ref: webapp-postgres # registered by platform admin
ingestion_strategy: cdc
ingestion_config:
primary_key: event_id
timeout: 2h
fallback: fail
defaults:
timeout: 4h
fallback: fail

Aggregate example:

apiVersion: akili/v1
kind: Inputs
inputs:
- id: cleaned-users
type: data_product
version: ">=1.0.0" # semver range
timeout: 6h
fallback: skip
- id: cleaned-events
type: data_product
version: ">=2.0.0"
timeout: 4h
fallback: fail
partition_mapping: same_day # same_day | previous_day | custom
defaults:
timeout: 4h
fallback: fail

Consumer-aligned example:

apiVersion: akili/v1
kind: Inputs
inputs:
- id: user-behavior-aggregate
type: data_product
version: ">=1.0.0"
timeout: 2h
fallback: use_cached

Input fields for type: data_product:

FieldRequiredDescription
idyesMust match a metadata.name in another product’s product.yaml
versionnoSemver range (>=1.0.0, ~1.2.0, ^2.0.0). Default: latest
timeoutnoMax wait before fallback. Format: 30m, 4h, 1d
fallbacknoskip — proceed without. fail — abort. use_cached — use last successful run
optionalnoIf true, product can execute without this input
partition_mappingnosame_day, previous_day, or custom

Input fields for type: connector:

FieldRequiredDescription
connector_refyesName of a connection registered by a platform admin
ingestion_strategyyesfull_refresh, incremental, cdc, event, file_upload, api_poll
ingestion_configvariesStrategy-specific: cursor_field, primary_key, topic, etc.

Ingestion strategies:

StrategyUse CaseRequired Config
full_refreshSmall reference tablesNone
incrementalAppend/upsert based on cursorcursor_field, primary_key
cdcChange Data Capture via database logprimary_key
eventConsume from Redpanda topictopic
file_uploadManual CSV/Parquet uploadformat
api_pollHTTP API pollingurl, method