Skip to content
GitLab

inputs.yaml

Declares what this data product consumes. Inputs are either other data products (internal) or external systems via the connector registry.

The input pattern determines the product archetype:

ArchetypeInput SourceDescriptionExample
Source-alignedExternal system via connectorCaptures raw data from operational systems. Minimal transformation.Raw POS data from Salesforce
AggregateOther data productsCombines cleaned sources. This is where business logic lives. Organized around business capabilities, not reports.Combines cleaned orders + cleaned outlets
Consumer-alignedAggregate productsShaped for specific use cases. Most volatile and disposable.Executive KPI dashboard feed

A single product can mix internal and external inputs.

apiVersion: akili/v1
kind: Inputs
inputs:
# Internal dependency -- another data product
- id: cleaned-outlets
type: data_product
version: ">=1.0.0"
timeout: 6h
fallback: skip
optional: false
# Internal dependency with partition alignment
- id: cleaned-transactions
type: data_product
version: ">=2.0.0"
timeout: 4h
fallback: fail
partition_mapping: same_day
# External dependency -- connector registry
- id: raw-pos-extract
type: connector
connector_ref: pg-production
ingestion_strategy: incremental
ingestion_config:
cursor_field: updated_at
primary_key: transaction_id
timeout: 2h
fallback: fail
defaults:
timeout: 4h
fallback: fail

Field Reference — Internal Inputs (type: data_product)

Section titled “Field Reference — Internal Inputs (type: data_product)”
FieldTypeRequiredDefaultValidationDescription
idstringYesMust match a metadata.name in another product’s product.yamlUpstream product identifier
typeenumYesdata_productInput type discriminator
versionstringNolatestSemver range: >=1.0.0, ~1.2.0, ^2.0.0Version constraint for the upstream product
timeoutdurationNoFrom defaultsFormat: 30m, 4h, 1dMax wait before fallback triggers
fallbackenumNoFrom defaultsskip, fail, use_cachedAction when input is unavailable or times out
optionalboolNofalseIf true, product can execute without this input
partition_mappingenumNosame_day, previous_day, customHow partitions align between this input and the current product
semantic_contractobjectNoSee Semantic Contracts belowDeclares semantic intent requirements from upstream

Field Reference — External Inputs (type: connector)

Section titled “Field Reference — External Inputs (type: connector)”
FieldTypeRequiredDefaultValidationDescription
idstringYesUnique within this fileInput identifier used in logic files
typeenumYesconnectorInput type discriminator
connector_refstringYesMust reference a registered connectionConnection name from platform admin
ingestion_strategyenumYesSee Ingestion Strategies belowHow data is extracted
ingestion_configobjectVariesStrategy-specific fieldsConfiguration for the chosen strategy
timeoutdurationNoFrom defaultsFormat: 30m, 4h, 1dMax wait before fallback triggers
fallbackenumNoFrom defaultsskip, fail, use_cachedAction on failure
FieldTypeRequiredDefaultDescription
defaults.timeoutdurationNo4hDefault timeout for all inputs
defaults.fallbackenumNofailDefault fallback for all inputs
StrategyUse CaseRequired Config
full_refreshSmall reference tables, complete replacement each runNone
incrementalAppend/upsert based on a cursor columncursor_field, primary_key
cdcChange Data Capture via database logprimary_key (connector handles CDC mechanics)
eventConsume from Redpanda topictopic, consumer_group (optional)
file_uploadManual CSV/Parquet upload via Portal or APIformat (csv, parquet, json)
api_pollHTTP API polling on scheduleurl, method, headers (optional), pagination (optional)

External inputs can declare pre-transformation quality checks that run immediately after ingestion:

- id: raw-pos-extract
type: connector
connector_ref: pg-production
ingestion_strategy: incremental
ingestion_config:
cursor_field: updated_at
primary_key: transaction_id
fitness:
- type: row_count_min
threshold: 100
- type: schema_match
severity: error # error = block, warn = log and continue

If an error-severity fitness function fails, the input is rejected and the fallback logic activates.

Downstream products declare what semantic intents they require from upstream:

inputs:
- id: store-performance
type: data_product
version: ">=2.0.0"
semantic_contract:
requires:
- intent: customer_classification
freshness: 24h
tier_refs: [high_value]
- intent: transaction_revenue
freshness: 24h
entity_refs: [customer_id, store_id]
FieldTypeRequiredDescription
requireslistYesList of intent requirements
requires[].intentstringYesMust match an upstream product’s semantic_intents key
requires[].freshnessdurationNoSLA override for this intent’s data freshness
requires[].tier_refslistNoTier names this product references in its transform
entity_refslistNoIdentity columns expected in the upstream product

At deploy time, the platform validates that the upstream product declares the referenced intent, the tier refs exist, and the entity refs are identity columns.

FallbackAction
skipRemove input from set, execute if remaining required inputs are ready
failAbort execution, emit failure event
use_cachedSubstitute last successful materialization, re-evaluate
OperatorMeaningExample
>=Greater than or equal>=1.0.0
<=Less than or equal<=2.0.0
>Greater than>1.0.0
<Less than<2.0.0
=Exact match=1.5.0
!=Not equal!=1.3.0
~Patch range~1.2.0 (matches 1.2.x)
^Minor range^1.0.0 (matches 1.x.x)

Ranges can be combined: ">=1.0.0,<2.0.0".