inputs.yaml
Declares what this data product consumes. Inputs are either other data products (internal) or external systems via the connector registry.
Product Archetypes
Section titled “Product Archetypes”The input pattern determines the product archetype:
| Archetype | Input Source | Description | Example |
|---|---|---|---|
| Source-aligned | External system via connector | Captures raw data from operational systems. Minimal transformation. | Raw POS data from Salesforce |
| Aggregate | Other data products | Combines cleaned sources. This is where business logic lives. Organized around business capabilities, not reports. | Combines cleaned orders + cleaned outlets |
| Consumer-aligned | Aggregate products | Shaped for specific use cases. Most volatile and disposable. | Executive KPI dashboard feed |
A single product can mix internal and external inputs.
Example
Section titled “Example”apiVersion: akili/v1kind: Inputs
inputs: # Internal dependency -- another data product - id: cleaned-outlets type: data_product version: ">=1.0.0" timeout: 6h fallback: skip optional: false
# Internal dependency with partition alignment - id: cleaned-transactions type: data_product version: ">=2.0.0" timeout: 4h fallback: fail partition_mapping: same_day
# External dependency -- connector registry - id: raw-pos-extract type: connector connector_ref: pg-production ingestion_strategy: incremental ingestion_config: cursor_field: updated_at primary_key: transaction_id timeout: 2h fallback: fail
defaults: timeout: 4h fallback: failField Reference — Internal Inputs (type: data_product)
Section titled “Field Reference — Internal Inputs (type: data_product)”| Field | Type | Required | Default | Validation | Description |
|---|---|---|---|---|---|
id | string | Yes | — | Must match a metadata.name in another product’s product.yaml | Upstream product identifier |
type | enum | Yes | — | data_product | Input type discriminator |
version | string | No | latest | Semver range: >=1.0.0, ~1.2.0, ^2.0.0 | Version constraint for the upstream product |
timeout | duration | No | From defaults | Format: 30m, 4h, 1d | Max wait before fallback triggers |
fallback | enum | No | From defaults | skip, fail, use_cached | Action when input is unavailable or times out |
optional | bool | No | false | — | If true, product can execute without this input |
partition_mapping | enum | No | — | same_day, previous_day, custom | How partitions align between this input and the current product |
semantic_contract | object | No | — | See Semantic Contracts below | Declares semantic intent requirements from upstream |
Field Reference — External Inputs (type: connector)
Section titled “Field Reference — External Inputs (type: connector)”| Field | Type | Required | Default | Validation | Description |
|---|---|---|---|---|---|
id | string | Yes | — | Unique within this file | Input identifier used in logic files |
type | enum | Yes | — | connector | Input type discriminator |
connector_ref | string | Yes | — | Must reference a registered connection | Connection name from platform admin |
ingestion_strategy | enum | Yes | — | See Ingestion Strategies below | How data is extracted |
ingestion_config | object | Varies | — | Strategy-specific fields | Configuration for the chosen strategy |
timeout | duration | No | From defaults | Format: 30m, 4h, 1d | Max wait before fallback triggers |
fallback | enum | No | From defaults | skip, fail, use_cached | Action on failure |
Field Reference — Defaults
Section titled “Field Reference — Defaults”| Field | Type | Required | Default | Description |
|---|---|---|---|---|
defaults.timeout | duration | No | 4h | Default timeout for all inputs |
defaults.fallback | enum | No | fail | Default fallback for all inputs |
Ingestion Strategies
Section titled “Ingestion Strategies”| Strategy | Use Case | Required Config |
|---|---|---|
full_refresh | Small reference tables, complete replacement each run | None |
incremental | Append/upsert based on a cursor column | cursor_field, primary_key |
cdc | Change Data Capture via database log | primary_key (connector handles CDC mechanics) |
event | Consume from Redpanda topic | topic, consumer_group (optional) |
file_upload | Manual CSV/Parquet upload via Portal or API | format (csv, parquet, json) |
api_poll | HTTP API polling on schedule | url, method, headers (optional), pagination (optional) |
Ingestion Fitness Functions
Section titled “Ingestion Fitness Functions”External inputs can declare pre-transformation quality checks that run immediately after ingestion:
- id: raw-pos-extract type: connector connector_ref: pg-production ingestion_strategy: incremental ingestion_config: cursor_field: updated_at primary_key: transaction_id fitness: - type: row_count_min threshold: 100 - type: schema_match severity: error # error = block, warn = log and continueIf an error-severity fitness function fails, the input is rejected and the fallback logic activates.
Semantic Contracts
Section titled “Semantic Contracts”Downstream products declare what semantic intents they require from upstream:
inputs: - id: store-performance type: data_product version: ">=2.0.0" semantic_contract: requires: - intent: customer_classification freshness: 24h tier_refs: [high_value] - intent: transaction_revenue freshness: 24h entity_refs: [customer_id, store_id]Contract Fields
Section titled “Contract Fields”| Field | Type | Required | Description |
|---|---|---|---|
requires | list | Yes | List of intent requirements |
requires[].intent | string | Yes | Must match an upstream product’s semantic_intents key |
requires[].freshness | duration | No | SLA override for this intent’s data freshness |
requires[].tier_refs | list | No | Tier names this product references in its transform |
entity_refs | list | No | Identity columns expected in the upstream product |
At deploy time, the platform validates that the upstream product declares the referenced intent, the tier refs exist, and the entity refs are identity columns.
Fallback Behavior
Section titled “Fallback Behavior”| Fallback | Action |
|---|---|
skip | Remove input from set, execute if remaining required inputs are ready |
fail | Abort execution, emit failure event |
use_cached | Substitute last successful materialization, re-evaluate |
Version Pinning Operators
Section titled “Version Pinning Operators”| Operator | Meaning | Example |
|---|---|---|
>= | Greater than or equal | >=1.0.0 |
<= | Less than or equal | <=2.0.0 |
> | Greater than | >1.0.0 |
< | Less than | <2.0.0 |
= | Exact match | =1.5.0 |
!= | Not equal | !=1.3.0 |
~ | Patch range | ~1.2.0 (matches 1.2.x) |
^ | Minor range | ^1.0.0 (matches 1.x.x) |
Ranges can be combined: ">=1.0.0,<2.0.0".