Governance
Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time. This guide covers classification, retention, lineage, concept management, and compliance features.
Four Governance Pillars
Section titled “Four Governance Pillars”| Pillar | Purpose | Enforcement Point |
|---|---|---|
| Classification | Control who can access what | Deploy time + query time |
| Quality | Ensure data correctness | After every materialization |
| Lineage | Track data provenance | Registration + deploy + run time |
| SLA Management | Monitor freshness and availability | Continuous (sensor every 5 min) |
Data Classification
Section titled “Data Classification”Every data product declares a sensitivity level in product.yaml. Classification drives access control across the entire platform.
Classification Levels
Section titled “Classification Levels”public --> internal --> confidential --> restrictedOrdered by sensitivity. Each level is a strict superset of the previous.
| Level | Who Can Access | Typical Use |
|---|---|---|
public | Any authenticated user in the tenant | Country codes, product catalogs |
internal | Any team member in the tenant | Sales aggregates, operational metrics |
confidential | Explicit team grant required | Customer segments, financial data |
restricted | Named individuals only, audit logged | PII, salary data, health records |
Declaring Classification
Section titled “Declaring Classification”metadata: name: daily-payroll-summary domain: finance owner: finance-team classification: confidentialHigh-Water Mark Propagation
Section titled “High-Water Mark Propagation”The platform enforces a critical rule: the output classification of a data product must be greater than or equal to the highest classification of any input.
raw-orders (internal) + raw-payroll (confidential) --> output MUST be >= confidentialThis prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation.
Enforcement at deploy time:
POST /api/v1/products/{name}/deploy | vResolve ALL upstream products (transitively) | vCompute max(input classifications) | vVerify: product.classification >= max(inputs) | +-- PASS --> Continue deploy +-- FAIL --> 422 ClassificationLaundering errorDeveloper Clearance
Section titled “Developer Clearance”In addition to product-level classification, the platform checks that the deploying developer has clearance to access all upstream products:
For each upstream product: if developer_clearance does not include upstream.classification: --> 403 InsufficientClearanceThis prevents a developer from building a product that references data they cannot personally access.
Column-Level Classification
Section titled “Column-Level Classification”For finer-grained control, individual columns can declare their sensitivity:
schema: - name: customer_id type: string role: identity classification: pii.identifier
- name: full_name type: string classification: pii.name
- name: email type: string classification: pii.contact
- name: customer_segment type: string classification: business.internal
- name: country_code type: string classification: publicClassification taxonomy (highest to lowest sensitivity):
| Classification | Category | Example |
|---|---|---|
pii.identifier | PII | National ID, SSN, passport number |
pii.name | PII | First name, last name |
pii.contact | PII | Email, phone, address |
business.confidential | Business | Risk ratings, profit margins |
business.internal | Business | Customer segments, product categories |
public | Public | Country codes, currency codes |
The serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.
Retention Policies
Section titled “Retention Policies”Products declare how long data should be retained. The platform evaluates retention daily and notifies when data exceeds the retention period.
Writing Retention Policies
Section titled “Writing Retention Policies”metadata: name: raw-transactions # ... other fields ...
retention: period: "365d" # How long to retain data basis: created_at # Which timestamp determines age review_date: "2026-06-01" # Next scheduled retention review| Field | Required | Description |
|---|---|---|
period | yes | Duration string. Examples: "90d", "365d", "7y" |
basis | yes | Timestamp column that determines record age: created_at, event_time, or ingested_at |
review_date | no | ISO date for next retention review |
Retention Behavior
Section titled “Retention Behavior”- The platform evaluates retention daily
- When data exceeds the retention period, a
retention.expiredgovernance event is emitted - The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention
- This deliberate design prevents accidental data loss from misconfigured retention
Deletion Workflows
Section titled “Deletion Workflows”When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:
-
Identify data files containing records to delete
-
Write position delete files listing row positions to logically delete
-
Subsequent reads skip deleted positions — data is logically erased
-
Physical deletion occurs during data lake compaction (configurable schedule)
This approach is non-destructive (original files untouched), auditable (delete files serve as a record), and reversible before compaction.
Lineage Tracking
Section titled “Lineage Tracking”Lineage answers: “Where did this data come from, and what depends on it?”
Four Metadata Pathways
Section titled “Four Metadata Pathways”Metadata flows into OpenMetadata (the governance catalog) from four sources:
| Pathway | When | What |
|---|---|---|
| 1. Manifest registration | POST /api/v1/products | Product identity, schemas, classification, access teams |
| 2. Asset graph | After deploy | Dependency edges, source-to-product lineage, product-to-serving lineage |
| 3. Execution events | After each materialization | Freshness, row count, duration, quality scores |
| 4. Deployment lineage | Every akili deploy | Which manifest version produced which data snapshots |
Lineage Queries
Section titled “Lineage Queries”Impact analysis: “What breaks if raw-outlet-visits fails?”
The platform traverses the dependency graph to show all downstream products affected by a failure. This is available via the API (GET /api/v1/registry/entity-graph) and visualized in the Portal.
Provenance: “Where does monthly-cost-report get its data?”
Full upstream lineage shows every transformation step from external source to final output.
Deployment audit: “Which version of the transform logic produced the data in snapshot X?”
Deployment lineage tracks manifest versions alongside data lake snapshots, enabling precise debugging and backfill decisions.
Entity Graph
Section titled “Entity Graph”The platform automatically derives an entity graph from column roles across all deployed products:
- Products with
role: identitycolumns become nodes role: event_keycolumns create directed edges between products- The graph spans all domains within a tenant
This graph powers impact analysis, deletion cascades, and the concept registry.
Concept Management (Business Glossary)
Section titled “Concept Management (Business Glossary)”As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.
How Concepts Are Extracted
Section titled “How Concepts Are Extracted”| Source | Rule | Example |
|---|---|---|
identity columns | Each unique identity column name becomes a concept | customer_id becomes concept “Customer” |
| Domain names | Each domain becomes a concept | analytics becomes concept “Analytics Domain” |
| Product tags | Tags in product.yaml become concept associations | tags: [revenue] becomes concept “Revenue” |
Concept Maturity Lifecycle
Section titled “Concept Maturity Lifecycle”| State | Meaning |
|---|---|
draft | Auto-extracted, not yet reviewed |
proposed | Reviewed by product owner, submitted for approval |
accepted | Approved by domain owner |
canonical | Organization-wide standard term (Published Language) |
deprecated | No longer in active use |
Canonical concepts surface in akili init suggestions (“Did you mean customer_id (canonical)?”) and trigger validation warnings when new products introduce colliding column names.
Managing Concepts
Section titled “Managing Concepts”# List concepts in a domainakili governance concepts --domain analytics
# Promote a conceptakili governance concept promote customer_id --to accepted
# Register a manual conceptakili governance concept create \ --name "Net Revenue" \ --domain finance \ --description "Revenue after discounts and gift cards"The concept graph is synced to OpenMetadata as a glossary and visualized in the Portal.
Compliance Features
Section titled “Compliance Features”GDPR Right-to-Erasure
Section titled “GDPR Right-to-Erasure”The platform supports structured deletion workflows that propagate through the lineage graph.
# Submit a deletion requestakili governance deletion-request \ --entity-type customer \ --identity-column customer_id \ --identity-value "CUST-12345" \ --reason "GDPR Art. 17 request" \ --cascadeDeletion workflow:
-
Request — Deletion request submitted via API or CLI
-
Impact analysis — Platform traverses lineage graph to identify all affected products
-
Plan — Generates a deletion plan: which products, which records, which method
-
Execute — Position delete files written in the data lake for each affected product
-
Verify — Post-deletion check confirms no residual data in serving endpoints
-
Audit — Permanent audit record created (never deleted, exempt from retention)
Cascade behavior:
When --cascade is set, deletion propagates through the entity graph:
| Scenario | Behavior |
|---|---|
Downstream product has event_key to deleted entity | Delete matching rows |
| Downstream product aggregates the entity (SUM, COUNT) | Propagation stops — individual contributions are not identifiable |
| Downstream product has no aggregation | Continue propagation |
Audit Trail
Section titled “Audit Trail”Every deletion is permanently recorded in the deletion_audit_log:
| Field | Description |
|---|---|
entity_type | Type of entity deleted |
identity_value | Specific value deleted |
reason | Regulatory basis |
requested_by | Who requested the deletion |
products_affected | List of products that were modified |
rows_deleted | Total rows removed |
audit_hash | SHA-256 for tamper detection |
The audit log is permanently retained and exempt from any retention policy.
Governance Events
Section titled “Governance Events”All governance-relevant events flow through the event bus for audit and downstream processing:
| Event | Trigger |
|---|---|
product.registered | New product created |
product.deployed | Product deployed to execution |
quality.check.failed | Blocking quality check fails |
classification.violation | Classification laundering attempt at deploy |
sla.breach | Freshness or quality threshold exceeded |
deletion.requested | Right-to-erasure request submitted |
deletion.completed | Deletion fully executed and verified |
retention.expired | Data exceeds retention period |
concept.created | New business concept extracted |
semantic.contract.broken | Upstream removed a referenced intent or tier |
semantic.contract.stale | Upstream values changed since downstream compiled |
All events include tenant_id, timestamp, and correlation_id for tracing.
Governance Dashboard
Section titled “Governance Dashboard”The Portal provides a governance dashboard with:
- Classification overview: Products by classification level, propagation chain visualization
- Quality scores: Rolling quality scores per product, trend charts, failure history
- Lineage graph: Interactive visualization of data flow across products and domains
- Concept browser: Business glossary with maturity filters and graph visualization
- Deletion audit: History of deletion requests with status and verification results
- SLA status: Freshness and availability monitoring per product
The dashboard is available to all team members. Actions like concept promotion and deletion requests require appropriate permissions.
Writing governance.yaml
Section titled “Writing governance.yaml”While most governance configuration lives in product.yaml (classification, retention) and quality.yaml (quality rules), the governance.yaml file configures additional governance behaviors:
apiVersion: akili/v1kind: Governance
# Access controlaccess: teams: - finance - analytics deny: - field-ops
# Ownershipsteward: jane.mwangi@example.com
# Compliance tagscompliance: - gdpr - sox| Field | Required | Description |
|---|---|---|
access.teams | no | Teams with explicit access (for confidential/restricted products) |
access.deny | no | Teams explicitly denied access |
steward | no | Data steward contact for governance questions |
compliance | no | Regulatory frameworks this product falls under |
Next Steps
Section titled “Next Steps”- Writing Manifests — Classification and retention in product.yaml
- Quality Rules — Quality enforcement details
- Serving Configuration — Access control at the serving layer
- End-to-End Tutorial — See governance in a full lifecycle