Governance Model
Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time.
Four Pillars
Section titled “Four Pillars”| Pillar | Purpose | Enforcement Point |
|---|---|---|
| Classification | Control data access by sensitivity level | Deploy time + query time |
| Quality | Ensure data correctness and completeness | After every materialization |
| Lineage | Track data provenance end-to-end | Registration + deploy + run time |
| SLA Management | Monitor freshness and availability | Continuous (every 5 minutes) |
Enforcement Points
Section titled “Enforcement Points”Governance rules are enforced at four distinct points in the data product lifecycle:
%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart TB
subgraph EP1["1. Ingestion"]
I1[Schema validation]
I2[Connection auth verification]
I3[Input fitness functions]
end
subgraph EP2["2. Transformation"]
T1[Classification propagation check]
T2[Dependency graph validation]
T3[Semantic contract verification]
end
subgraph EP3["3. Quality Gate"]
Q1[Blocking checks must pass]
Q2[SLA threshold evaluation]
Q3[Quality score update]
end
subgraph EP4["4. Serving"]
S1[Column-level access control]
S2[Classification-based masking]
S3[Rate limiting per tenant]
end
EP1 --> EP2 --> EP3 --> EP4
Enforcement Point 1: Ingestion
Section titled “Enforcement Point 1: Ingestion”When data enters the platform through input ports, the following checks execute:
- Schema validation — incoming data must match the declared schema in
inputs.yaml - Connection verification — the source connection must be active and authorized
- Input fitness functions — row count, null ratio, and freshness thresholds
Enforcement Point 2: Transformation (Deploy Time)
Section titled “Enforcement Point 2: Transformation (Deploy Time)”At deploy time, the platform validates governance rules before allowing a data product to execute:
- Classification propagation — the output classification must be greater than or equal to the highest classification of any input (transitive)
- Dependency graph validation — all upstream products must be deployed and accessible
- Semantic contract verification — referenced intents and tiers must still exist upstream
Enforcement Point 3: Quality Gate (Run Time)
Section titled “Enforcement Point 3: Quality Gate (Run Time)”After every materialization, quality checks from quality.yaml execute:
- Blocking checks (
severity: critical) — must pass before data is promoted to serving stores - Warning checks (
severity: warning) — logged but do not block promotion - SLA evaluation — freshness, completeness, and availability thresholds checked
Enforcement Point 4: Serving (Query Time)
Section titled “Enforcement Point 4: Serving (Query Time)”When consumers query data products:
- Column-level access control — columns with high-sensitivity classifications are masked or omitted based on the consumer’s clearance level
- Classification-based filtering — consumers can only access products at or below their clearance level
- Rate limiting — per-tenant query rate limits prevent resource exhaustion
Data Classification
Section titled “Data Classification”Every data product declares a sensitivity level. Classification drives access control across the entire platform.
Classification Levels
Section titled “Classification Levels”public --> internal --> confidential --> restricted| Level | Who Can Access | Typical Use |
|---|---|---|
public | Any authenticated user in the tenant | Country codes, product catalogs |
internal | Any team member in the tenant | Sales aggregates, operational metrics |
confidential | Explicit team grant required | Customer segments, financial data |
restricted | Named individuals only, audit logged | PII, salary data, health records |
High-Water Mark Propagation
Section titled “High-Water Mark Propagation”The platform enforces a critical rule: the output classification must be greater than or equal to the highest classification of any input.
raw-orders (internal) + raw-payroll (confidential) --> output MUST be >= confidentialThis prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation. Propagation is transitive: if product C depends on B, which depends on A (restricted), then C must be at least restricted.
Column-Level Classification
Section titled “Column-Level Classification”For finer-grained control, individual columns can declare their sensitivity:
schema: - name: customer_id type: string classification: pii.identifier
- name: email type: string classification: pii.contact
- name: country_code type: string classification: publicThe serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.
Retention Policies
Section titled “Retention Policies”Products declare how long data should be retained:
retention: period: "365d" basis: created_at review_date: "2026-06-01"| Field | Required | Description |
|---|---|---|
period | yes | Duration: "90d", "365d", "7y" |
basis | yes | Timestamp column for age calculation |
review_date | no | ISO date for next retention review |
The platform evaluates retention daily. When data exceeds the retention period, a retention.expired governance event is emitted. The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention. This deliberate design prevents accidental data loss.
Deletion Workflow
Section titled “Deletion Workflow”When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:
- Identify data files containing records to delete
- Write position delete files listing row positions to logically delete
- Subsequent reads skip deleted positions
- Physical deletion occurs during data lake compaction
SLA Management
Section titled “SLA Management”Each data product defines freshness and availability SLAs. The platform monitors these continuously and emits alerts when thresholds are breached.
sla: freshness: 24h completeness: 0.99| SLA Type | Description | Alert Mechanism |
|---|---|---|
| Freshness | Maximum age of the latest materialization | sla.breach event when threshold exceeded |
| Completeness | Minimum ratio of non-null values in key columns | Quality check after each run |
| Availability | Uptime of serving endpoints | Health check every 5 minutes |
Notification Channels
Section titled “Notification Channels”SLA breach notifications can be routed to multiple channels:
| Channel | Configuration |
|---|---|
Per-product owner email from product.yaml | |
| Webhook | Custom HTTP endpoint for integration |
| PagerDuty | Incident creation for critical breaches |
Lineage Tracking
Section titled “Lineage Tracking”Lineage answers: “Where did this data come from, and what depends on it?”
Four metadata pathways feed the lineage graph:
| Pathway | When | What |
|---|---|---|
| Manifest registration | Product creation | Product identity, schemas, classification |
| Asset graph | After deploy | Dependency edges, source-to-product lineage |
| Execution events | After each materialization | Freshness, row count, duration, quality scores |
| Deployment lineage | Every deploy | Manifest version to data snapshot mapping |
Entity Graph
Section titled “Entity Graph”The platform automatically derives an entity graph from column roles across all deployed products:
- Products with
role: identitycolumns become nodes role: event_keycolumns create directed edges between products- The graph spans all domains within a tenant
This graph powers impact analysis, deletion cascades, and the concept registry.
Concept Management
Section titled “Concept Management”As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.
| State | Meaning |
|---|---|
draft | Auto-extracted, not yet reviewed |
proposed | Reviewed by product owner, submitted for approval |
accepted | Approved by domain owner |
canonical | Organization-wide standard term |
deprecated | No longer in active use |
Canonical concepts surface in akili init suggestions and trigger validation warnings when new products introduce colliding column names.
Related
Section titled “Related”- Governance Guide — step-by-step governance configuration
- Quality and Governance — quality check mechanics
- Event System — governance event types
- Security Architecture — authentication and access control