Skip to content
GitLab

Governance

Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time. This guide covers classification, retention, lineage, concept management, and compliance features.

PillarPurposeEnforcement Point
ClassificationControl who can access whatDeploy time + query time
QualityEnsure data correctnessAfter every materialization
LineageTrack data provenanceRegistration + deploy + run time
SLA ManagementMonitor freshness and availabilityContinuous (sensor every 5 min)

Every data product declares a sensitivity level in product.yaml. Classification drives access control across the entire platform.

public --> internal --> confidential --> restricted

Ordered by sensitivity. Each level is a strict superset of the previous.

LevelWho Can AccessTypical Use
publicAny authenticated user in the tenantCountry codes, product catalogs
internalAny team member in the tenantSales aggregates, operational metrics
confidentialExplicit team grant requiredCustomer segments, financial data
restrictedNamed individuals only, audit loggedPII, salary data, health records
product.yaml
metadata:
name: daily-payroll-summary
domain: finance
owner: finance-team
classification: confidential

The platform enforces a critical rule: the output classification of a data product must be greater than or equal to the highest classification of any input.

raw-orders (internal) + raw-payroll (confidential)
--> output MUST be >= confidential

This prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation.

Enforcement at deploy time:

POST /api/v1/products/{name}/deploy
|
v
Resolve ALL upstream products (transitively)
|
v
Compute max(input classifications)
|
v
Verify: product.classification >= max(inputs)
|
+-- PASS --> Continue deploy
+-- FAIL --> 422 ClassificationLaundering error

In addition to product-level classification, the platform checks that the deploying developer has clearance to access all upstream products:

For each upstream product:
if developer_clearance does not include upstream.classification:
--> 403 InsufficientClearance

This prevents a developer from building a product that references data they cannot personally access.

For finer-grained control, individual columns can declare their sensitivity:

output.yaml
schema:
- name: customer_id
type: string
role: identity
classification: pii.identifier
- name: full_name
type: string
classification: pii.name
- name: email
type: string
classification: pii.contact
- name: customer_segment
type: string
classification: business.internal
- name: country_code
type: string
classification: public

Classification taxonomy (highest to lowest sensitivity):

ClassificationCategoryExample
pii.identifierPIINational ID, SSN, passport number
pii.namePIIFirst name, last name
pii.contactPIIEmail, phone, address
business.confidentialBusinessRisk ratings, profit margins
business.internalBusinessCustomer segments, product categories
publicPublicCountry codes, currency codes

The serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.


Products declare how long data should be retained. The platform evaluates retention daily and notifies when data exceeds the retention period.

product.yaml
metadata:
name: raw-transactions
# ... other fields ...
retention:
period: "365d" # How long to retain data
basis: created_at # Which timestamp determines age
review_date: "2026-06-01" # Next scheduled retention review
FieldRequiredDescription
periodyesDuration string. Examples: "90d", "365d", "7y"
basisyesTimestamp column that determines record age: created_at, event_time, or ingested_at
review_datenoISO date for next retention review
  1. The platform evaluates retention daily
  2. When data exceeds the retention period, a retention.expired governance event is emitted
  3. The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention
  4. This deliberate design prevents accidental data loss from misconfigured retention

When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:

  1. Identify data files containing records to delete

  2. Write position delete files listing row positions to logically delete

  3. Subsequent reads skip deleted positions — data is logically erased

  4. Physical deletion occurs during data lake compaction (configurable schedule)

This approach is non-destructive (original files untouched), auditable (delete files serve as a record), and reversible before compaction.


Lineage answers: “Where did this data come from, and what depends on it?”

Metadata flows into OpenMetadata (the governance catalog) from four sources:

PathwayWhenWhat
1. Manifest registrationPOST /api/v1/productsProduct identity, schemas, classification, access teams
2. Asset graphAfter deployDependency edges, source-to-product lineage, product-to-serving lineage
3. Execution eventsAfter each materializationFreshness, row count, duration, quality scores
4. Deployment lineageEvery akili deployWhich manifest version produced which data snapshots

Impact analysis: “What breaks if raw-outlet-visits fails?”

The platform traverses the dependency graph to show all downstream products affected by a failure. This is available via the API (GET /api/v1/registry/entity-graph) and visualized in the Portal.

Provenance: “Where does monthly-cost-report get its data?”

Full upstream lineage shows every transformation step from external source to final output.

Deployment audit: “Which version of the transform logic produced the data in snapshot X?”

Deployment lineage tracks manifest versions alongside data lake snapshots, enabling precise debugging and backfill decisions.

The platform automatically derives an entity graph from column roles across all deployed products:

  • Products with role: identity columns become nodes
  • role: event_key columns create directed edges between products
  • The graph spans all domains within a tenant

This graph powers impact analysis, deletion cascades, and the concept registry.


As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.

SourceRuleExample
identity columnsEach unique identity column name becomes a conceptcustomer_id becomes concept “Customer”
Domain namesEach domain becomes a conceptanalytics becomes concept “Analytics Domain”
Product tagsTags in product.yaml become concept associationstags: [revenue] becomes concept “Revenue”
StateMeaning
draftAuto-extracted, not yet reviewed
proposedReviewed by product owner, submitted for approval
acceptedApproved by domain owner
canonicalOrganization-wide standard term (Published Language)
deprecatedNo longer in active use

Canonical concepts surface in akili init suggestions (“Did you mean customer_id (canonical)?”) and trigger validation warnings when new products introduce colliding column names.

Terminal window
# List concepts in a domain
akili governance concepts --domain analytics
# Promote a concept
akili governance concept promote customer_id --to accepted
# Register a manual concept
akili governance concept create \
--name "Net Revenue" \
--domain finance \
--description "Revenue after discounts and gift cards"

The concept graph is synced to OpenMetadata as a glossary and visualized in the Portal.


The platform supports structured deletion workflows that propagate through the lineage graph.

Terminal window
# Submit a deletion request
akili governance deletion-request \
--entity-type customer \
--identity-column customer_id \
--identity-value "CUST-12345" \
--reason "GDPR Art. 17 request" \
--cascade

Deletion workflow:

  1. Request — Deletion request submitted via API or CLI

  2. Impact analysis — Platform traverses lineage graph to identify all affected products

  3. Plan — Generates a deletion plan: which products, which records, which method

  4. Execute — Position delete files written in the data lake for each affected product

  5. Verify — Post-deletion check confirms no residual data in serving endpoints

  6. Audit — Permanent audit record created (never deleted, exempt from retention)

Cascade behavior:

When --cascade is set, deletion propagates through the entity graph:

ScenarioBehavior
Downstream product has event_key to deleted entityDelete matching rows
Downstream product aggregates the entity (SUM, COUNT)Propagation stops — individual contributions are not identifiable
Downstream product has no aggregationContinue propagation

Every deletion is permanently recorded in the deletion_audit_log:

FieldDescription
entity_typeType of entity deleted
identity_valueSpecific value deleted
reasonRegulatory basis
requested_byWho requested the deletion
products_affectedList of products that were modified
rows_deletedTotal rows removed
audit_hashSHA-256 for tamper detection

The audit log is permanently retained and exempt from any retention policy.


All governance-relevant events flow through the event bus for audit and downstream processing:

EventTrigger
product.registeredNew product created
product.deployedProduct deployed to execution
quality.check.failedBlocking quality check fails
classification.violationClassification laundering attempt at deploy
sla.breachFreshness or quality threshold exceeded
deletion.requestedRight-to-erasure request submitted
deletion.completedDeletion fully executed and verified
retention.expiredData exceeds retention period
concept.createdNew business concept extracted
semantic.contract.brokenUpstream removed a referenced intent or tier
semantic.contract.staleUpstream values changed since downstream compiled

All events include tenant_id, timestamp, and correlation_id for tracing.


The Portal provides a governance dashboard with:

  • Classification overview: Products by classification level, propagation chain visualization
  • Quality scores: Rolling quality scores per product, trend charts, failure history
  • Lineage graph: Interactive visualization of data flow across products and domains
  • Concept browser: Business glossary with maturity filters and graph visualization
  • Deletion audit: History of deletion requests with status and verification results
  • SLA status: Freshness and availability monitoring per product

The dashboard is available to all team members. Actions like concept promotion and deletion requests require appropriate permissions.


While most governance configuration lives in product.yaml (classification, retention) and quality.yaml (quality rules), the governance.yaml file configures additional governance behaviors:

apiVersion: akili/v1
kind: Governance
# Access control
access:
teams:
- finance
- analytics
deny:
- field-ops
# Ownership
steward: jane.mwangi@example.com
# Compliance tags
compliance:
- gdpr
- sox
FieldRequiredDescription
access.teamsnoTeams with explicit access (for confidential/restricted products)
access.denynoTeams explicitly denied access
stewardnoData steward contact for governance questions
compliancenoRegulatory frameworks this product falls under