Governance Model

Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time.

Four Pillars

Pillar	Purpose	Enforcement Point
Classification	Control data access by sensitivity level	Deploy time + query time
Quality	Ensure data correctness and completeness	After every materialization
Lineage	Track data provenance end-to-end	Registration + deploy + run time
SLA Management	Monitor freshness and availability	Continuous (every 5 minutes)

Enforcement Points

Governance rules are enforced at four distinct points in the data product lifecycle:

%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart TB
    subgraph EP1["1. Ingestion"]
        I1[Schema validation]
        I2[Connection auth verification]
        I3[Input fitness functions]
    end

    subgraph EP2["2. Transformation"]
        T1[Classification propagation check]
        T2[Dependency graph validation]
        T3[Semantic contract verification]
    end

    subgraph EP3["3. Quality Gate"]
        Q1[Blocking checks must pass]
        Q2[SLA threshold evaluation]
        Q3[Quality score update]
    end

    subgraph EP4["4. Serving"]
        S1[Column-level access control]
        S2[Classification-based masking]
        S3[Rate limiting per tenant]
    end

    EP1 --> EP2 --> EP3 --> EP4

Enforcement Point 1: Ingestion

When data enters the platform through input ports, the following checks execute:

Schema validation — incoming data must match the declared schema in inputs.yaml
Connection verification — the source connection must be active and authorized
Input fitness functions — row count, null ratio, and freshness thresholds

Enforcement Point 2: Transformation (Deploy Time)

At deploy time, the platform validates governance rules before allowing a data product to execute:

Classification propagation — the output classification must be greater than or equal to the highest classification of any input (transitive)
Dependency graph validation — all upstream products must be deployed and accessible
Semantic contract verification — referenced intents and tiers must still exist upstream

Enforcement Point 3: Quality Gate (Run Time)

After every materialization, quality checks from quality.yaml execute:

Blocking checks (severity: critical) — must pass before data is promoted to serving stores
Warning checks (severity: warning) — logged but do not block promotion
SLA evaluation — freshness, completeness, and availability thresholds checked

Enforcement Point 4: Serving (Query Time)

When consumers query data products:

Column-level access control — columns with high-sensitivity classifications are masked or omitted based on the consumer’s clearance level
Classification-based filtering — consumers can only access products at or below their clearance level
Rate limiting — per-tenant query rate limits prevent resource exhaustion

Data Classification

Every data product declares a sensitivity level. Classification drives access control across the entire platform.

Classification Levels

public --> internal --> confidential --> restricted

Level	Who Can Access	Typical Use
`public`	Any authenticated user in the tenant	Country codes, product catalogs
`internal`	Any team member in the tenant	Sales aggregates, operational metrics
`confidential`	Explicit team grant required	Customer segments, financial data
`restricted`	Named individuals only, audit logged	PII, salary data, health records

High-Water Mark Propagation

The platform enforces a critical rule: the output classification must be greater than or equal to the highest classification of any input.

raw-orders (internal) + raw-payroll (confidential)
    --> output MUST be >= confidential

This prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation. Propagation is transitive: if product C depends on B, which depends on A (restricted), then C must be at least restricted.

Column-Level Classification

For finer-grained control, individual columns can declare their sensitivity:

schema:
  - name: customer_id
    type: string
    classification: pii.identifier

  - name: email
    type: string
    classification: pii.contact

  - name: country_code
    type: string
    classification: public

The serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.

Retention Policies

Products declare how long data should be retained:

retention:
  period: "365d"
  basis: created_at
  review_date: "2026-06-01"

Field	Required	Description
`period`	yes	Duration: `"90d"`, `"365d"`, `"7y"`
`basis`	yes	Timestamp column for age calculation
`review_date`	no	ISO date for next retention review

The platform evaluates retention daily. When data exceeds the retention period, a retention.expired governance event is emitted. The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention. This deliberate design prevents accidental data loss.

Deletion Workflow

When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:

Identify data files containing records to delete
Write position delete files listing row positions to logically delete
Subsequent reads skip deleted positions
Physical deletion occurs during data lake compaction

SLA Management

Each data product defines freshness and availability SLAs. The platform monitors these continuously and emits alerts when thresholds are breached.

sla:
  freshness: 24h
  completeness: 0.99

SLA Type	Description	Alert Mechanism
Freshness	Maximum age of the latest materialization	`sla.breach` event when threshold exceeded
Completeness	Minimum ratio of non-null values in key columns	Quality check after each run
Availability	Uptime of serving endpoints	Health check every 5 minutes

Notification Channels

SLA breach notifications can be routed to multiple channels:

Channel	Configuration
Email	Per-product owner email from `product.yaml`
Webhook	Custom HTTP endpoint for integration
PagerDuty	Incident creation for critical breaches

Lineage Tracking

Lineage answers: “Where did this data come from, and what depends on it?”

Four metadata pathways feed the lineage graph:

Pathway	When	What
Manifest registration	Product creation	Product identity, schemas, classification
Asset graph	After deploy	Dependency edges, source-to-product lineage
Execution events	After each materialization	Freshness, row count, duration, quality scores
Deployment lineage	Every deploy	Manifest version to data snapshot mapping

Entity Graph

The platform automatically derives an entity graph from column roles across all deployed products:

Products with role: identity columns become nodes
role: event_key columns create directed edges between products
The graph spans all domains within a tenant

This graph powers impact analysis, deletion cascades, and the concept registry.

Concept Management

As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.

State	Meaning
`draft`	Auto-extracted, not yet reviewed
`proposed`	Reviewed by product owner, submitted for approval
`accepted`	Approved by domain owner
`canonical`	Organization-wide standard term
`deprecated`	No longer in active use

Canonical concepts surface in akili init suggestions and trigger validation warnings when new products introduce colliding column names.

Governance Guide — step-by-step governance configuration
Quality and Governance — quality check mechanics
Event System — governance event types
Security Architecture — authentication and access control