Governance

Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time. This guide covers classification, retention, lineage, concept management, and compliance features.

Four Governance Pillars

Pillar	Purpose	Enforcement Point
Classification	Control who can access what	Deploy time + query time
Quality	Ensure data correctness	After every materialization
Lineage	Track data provenance	Registration + deploy + run time
SLA Management	Monitor freshness and availability	Continuous (sensor every 5 min)

Data Classification

Every data product declares a sensitivity level in product.yaml. Classification drives access control across the entire platform.

Classification Levels

public --> internal --> confidential --> restricted

Ordered by sensitivity. Each level is a strict superset of the previous.

Level	Who Can Access	Typical Use
`public`	Any authenticated user in the tenant	Country codes, product catalogs
`internal`	Any team member in the tenant	Sales aggregates, operational metrics
`confidential`	Explicit team grant required	Customer segments, financial data
`restricted`	Named individuals only, audit logged	PII, salary data, health records

Declaring Classification

metadata:
  name: daily-payroll-summary
  domain: finance
  owner: finance-team
  classification: confidential

High-Water Mark Propagation

The platform enforces a critical rule: the output classification of a data product must be greater than or equal to the highest classification of any input.

raw-orders (internal) + raw-payroll (confidential)
    --> output MUST be >= confidential

This prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation.

Enforcement at deploy time:

POST /api/v1/products/{name}/deploy
    |
    v
Resolve ALL upstream products (transitively)
    |
    v
Compute max(input classifications)
    |
    v
Verify: product.classification >= max(inputs)
    |
    +-- PASS --> Continue deploy
    +-- FAIL --> 422 ClassificationLaundering error

Developer Clearance

In addition to product-level classification, the platform checks that the deploying developer has clearance to access all upstream products:

For each upstream product:
    if developer_clearance does not include upstream.classification:
        --> 403 InsufficientClearance

This prevents a developer from building a product that references data they cannot personally access.

Column-Level Classification

For finer-grained control, individual columns can declare their sensitivity:

schema:
  - name: customer_id
    type: string
    role: identity
    classification: pii.identifier

  - name: full_name
    type: string
    classification: pii.name

  - name: email
    type: string
    classification: pii.contact

  - name: customer_segment
    type: string
    classification: business.internal

  - name: country_code
    type: string
    classification: public

Classification taxonomy (highest to lowest sensitivity):

Classification	Category	Example
`pii.identifier`	PII	National ID, SSN, passport number
`pii.name`	PII	First name, last name
`pii.contact`	PII	Email, phone, address
`business.confidential`	Business	Risk ratings, profit margins
`business.internal`	Business	Customer segments, product categories
`public`	Public	Country codes, currency codes

The serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.

Retention Policies

Products declare how long data should be retained. The platform evaluates retention daily and notifies when data exceeds the retention period.

Writing Retention Policies

metadata:
  name: raw-transactions
  # ... other fields ...

retention:
  period: "365d"            # How long to retain data
  basis: created_at         # Which timestamp determines age
  review_date: "2026-06-01" # Next scheduled retention review

Field	Required	Description
`period`	yes	Duration string. Examples: `"90d"`, `"365d"`, `"7y"`
`basis`	yes	Timestamp column that determines record age: `created_at`, `event_time`, or `ingested_at`
`review_date`	no	ISO date for next retention review

Retention Behavior

The platform evaluates retention daily
When data exceeds the retention period, a retention.expired governance event is emitted
The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention
This deliberate design prevents accidental data loss from misconfigured retention

Deletion Workflows

When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:

Identify data files containing records to delete
Write position delete files listing row positions to logically delete
Subsequent reads skip deleted positions — data is logically erased
Physical deletion occurs during data lake compaction (configurable schedule)

This approach is non-destructive (original files untouched), auditable (delete files serve as a record), and reversible before compaction.

Lineage Tracking

Lineage answers: “Where did this data come from, and what depends on it?”

Four Metadata Pathways

Metadata flows into OpenMetadata (the governance catalog) from four sources:

Pathway	When	What
1. Manifest registration	`POST /api/v1/products`	Product identity, schemas, classification, access teams
2. Asset graph	After deploy	Dependency edges, source-to-product lineage, product-to-serving lineage
3. Execution events	After each materialization	Freshness, row count, duration, quality scores
4. Deployment lineage	Every `akili deploy`	Which manifest version produced which data snapshots

Lineage Queries

Impact analysis: “What breaks if raw-outlet-visits fails?”

The platform traverses the dependency graph to show all downstream products affected by a failure. This is available via the API (GET /api/v1/registry/entity-graph) and visualized in the Portal.

Provenance: “Where does monthly-cost-report get its data?”

Full upstream lineage shows every transformation step from external source to final output.

Deployment audit: “Which version of the transform logic produced the data in snapshot X?”

Deployment lineage tracks manifest versions alongside data lake snapshots, enabling precise debugging and backfill decisions.

Entity Graph

The platform automatically derives an entity graph from column roles across all deployed products:

Products with role: identity columns become nodes
role: event_key columns create directed edges between products
The graph spans all domains within a tenant

This graph powers impact analysis, deletion cascades, and the concept registry.

Concept Management (Business Glossary)

As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.

How Concepts Are Extracted

Source	Rule	Example
`identity` columns	Each unique identity column name becomes a concept	`customer_id` becomes concept “Customer”
Domain names	Each domain becomes a concept	`analytics` becomes concept “Analytics Domain”
Product tags	Tags in `product.yaml` become concept associations	`tags: [revenue]` becomes concept “Revenue”

Concept Maturity Lifecycle

State	Meaning
`draft`	Auto-extracted, not yet reviewed
`proposed`	Reviewed by product owner, submitted for approval
`accepted`	Approved by domain owner
`canonical`	Organization-wide standard term (Published Language)
`deprecated`	No longer in active use

Canonical concepts surface in akili init suggestions (“Did you mean customer_id (canonical)?”) and trigger validation warnings when new products introduce colliding column names.

Managing Concepts

# List concepts in a domain
akili governance concepts --domain analytics

# Promote a concept
akili governance concept promote customer_id --to accepted

# Register a manual concept
akili governance concept create \
  --name "Net Revenue" \
  --domain finance \
  --description "Revenue after discounts and gift cards"

The concept graph is synced to OpenMetadata as a glossary and visualized in the Portal.

Compliance Features

The platform supports structured deletion workflows that propagate through the lineage graph.

# Submit a deletion request
akili governance deletion-request \
  --entity-type customer \
  --identity-column customer_id \
  --identity-value "CUST-12345" \
  --reason "GDPR Art. 17 request" \
  --cascade

Deletion workflow:

Request — Deletion request submitted via API or CLI
Impact analysis — Platform traverses lineage graph to identify all affected products
Plan — Generates a deletion plan: which products, which records, which method
Execute — Position delete files written in the data lake for each affected product
Verify — Post-deletion check confirms no residual data in serving endpoints
Audit — Permanent audit record created (never deleted, exempt from retention)

Cascade behavior:

When --cascade is set, deletion propagates through the entity graph:

Scenario	Behavior
Downstream product has `event_key` to deleted entity	Delete matching rows
Downstream product aggregates the entity (SUM, COUNT)	Propagation stops — individual contributions are not identifiable
Downstream product has no aggregation	Continue propagation

Audit Trail

Every deletion is permanently recorded in the deletion_audit_log:

Field	Description
`entity_type`	Type of entity deleted
`identity_value`	Specific value deleted
`reason`	Regulatory basis
`requested_by`	Who requested the deletion
`products_affected`	List of products that were modified
`rows_deleted`	Total rows removed
`audit_hash`	SHA-256 for tamper detection

The audit log is permanently retained and exempt from any retention policy.

Governance Events

All governance-relevant events flow through the event bus for audit and downstream processing:

Event	Trigger
`product.registered`	New product created
`product.deployed`	Product deployed to execution
`quality.check.failed`	Blocking quality check fails
`classification.violation`	Classification laundering attempt at deploy
`sla.breach`	Freshness or quality threshold exceeded
`deletion.requested`	Right-to-erasure request submitted
`deletion.completed`	Deletion fully executed and verified
`retention.expired`	Data exceeds retention period
`concept.created`	New business concept extracted
`semantic.contract.broken`	Upstream removed a referenced intent or tier
`semantic.contract.stale`	Upstream values changed since downstream compiled

All events include tenant_id, timestamp, and correlation_id for tracing.

Governance Dashboard

The Portal provides a governance dashboard with:

Classification overview: Products by classification level, propagation chain visualization
Quality scores: Rolling quality scores per product, trend charts, failure history
Lineage graph: Interactive visualization of data flow across products and domains
Concept browser: Business glossary with maturity filters and graph visualization
Deletion audit: History of deletion requests with status and verification results
SLA status: Freshness and availability monitoring per product

The dashboard is available to all team members. Actions like concept promotion and deletion requests require appropriate permissions.

Writing governance.yaml

While most governance configuration lives in product.yaml (classification, retention) and quality.yaml (quality rules), the governance.yaml file configures additional governance behaviors:

apiVersion: akili/v1
kind: Governance

# Access control
access:
  teams:
    - finance
    - analytics
  deny:
    - field-ops

# Ownership
steward: jane.mwangi@example.com

# Compliance tags
compliance:
  - gdpr
  - sox

Field	Required	Description
`access.teams`	no	Teams with explicit access (for `confidential`/`restricted` products)
`access.deny`	no	Teams explicitly denied access
`steward`	no	Data steward contact for governance questions
`compliance`	no	Regulatory frameworks this product falls under

Next Steps

Writing Manifests — Classification and retention in product.yaml
Quality Rules — Quality enforcement details
Serving Configuration — Access control at the serving layer
End-to-End Tutorial — See governance in a full lifecycle