Skip to content
GitLab

Governance Model

Akili implements federated computational governance — domain teams own their data product quality and classification, while the platform enforces global policies computationally. Policies are codified in YAML manifests and enforced automatically at build, deploy, and run time.

PillarPurposeEnforcement Point
ClassificationControl data access by sensitivity levelDeploy time + query time
QualityEnsure data correctness and completenessAfter every materialization
LineageTrack data provenance end-to-endRegistration + deploy + run time
SLA ManagementMonitor freshness and availabilityContinuous (every 5 minutes)

Governance rules are enforced at four distinct points in the data product lifecycle:

%%{init: {'flowchart': {'curve': 'basis'}}}%%
flowchart TB
    subgraph EP1["1. Ingestion"]
        I1[Schema validation]
        I2[Connection auth verification]
        I3[Input fitness functions]
    end

    subgraph EP2["2. Transformation"]
        T1[Classification propagation check]
        T2[Dependency graph validation]
        T3[Semantic contract verification]
    end

    subgraph EP3["3. Quality Gate"]
        Q1[Blocking checks must pass]
        Q2[SLA threshold evaluation]
        Q3[Quality score update]
    end

    subgraph EP4["4. Serving"]
        S1[Column-level access control]
        S2[Classification-based masking]
        S3[Rate limiting per tenant]
    end

    EP1 --> EP2 --> EP3 --> EP4

When data enters the platform through input ports, the following checks execute:

  • Schema validation — incoming data must match the declared schema in inputs.yaml
  • Connection verification — the source connection must be active and authorized
  • Input fitness functions — row count, null ratio, and freshness thresholds

Enforcement Point 2: Transformation (Deploy Time)

Section titled “Enforcement Point 2: Transformation (Deploy Time)”

At deploy time, the platform validates governance rules before allowing a data product to execute:

  • Classification propagation — the output classification must be greater than or equal to the highest classification of any input (transitive)
  • Dependency graph validation — all upstream products must be deployed and accessible
  • Semantic contract verification — referenced intents and tiers must still exist upstream

Enforcement Point 3: Quality Gate (Run Time)

Section titled “Enforcement Point 3: Quality Gate (Run Time)”

After every materialization, quality checks from quality.yaml execute:

  • Blocking checks (severity: critical) — must pass before data is promoted to serving stores
  • Warning checks (severity: warning) — logged but do not block promotion
  • SLA evaluation — freshness, completeness, and availability thresholds checked

When consumers query data products:

  • Column-level access control — columns with high-sensitivity classifications are masked or omitted based on the consumer’s clearance level
  • Classification-based filtering — consumers can only access products at or below their clearance level
  • Rate limiting — per-tenant query rate limits prevent resource exhaustion

Every data product declares a sensitivity level. Classification drives access control across the entire platform.

public --> internal --> confidential --> restricted
LevelWho Can AccessTypical Use
publicAny authenticated user in the tenantCountry codes, product catalogs
internalAny team member in the tenantSales aggregates, operational metrics
confidentialExplicit team grant requiredCustomer segments, financial data
restrictedNamed individuals only, audit loggedPII, salary data, health records

The platform enforces a critical rule: the output classification must be greater than or equal to the highest classification of any input.

raw-orders (internal) + raw-payroll (confidential)
--> output MUST be >= confidential

This prevents classification laundering — creating a “public” product that reads from “confidential” inputs, effectively bypassing access controls through aggregation. Propagation is transitive: if product C depends on B, which depends on A (restricted), then C must be at least restricted.

For finer-grained control, individual columns can declare their sensitivity:

schema:
- name: customer_id
type: string
classification: pii.identifier
- name: email
type: string
classification: pii.contact
- name: country_code
type: string
classification: public

The serving layer dynamically masks or omits columns based on the querying consumer’s clearance level.

Products declare how long data should be retained:

retention:
period: "365d"
basis: created_at
review_date: "2026-06-01"
FieldRequiredDescription
periodyesDuration: "90d", "365d", "7y"
basisyesTimestamp column for age calculation
review_datenoISO date for next retention review

The platform evaluates retention daily. When data exceeds the retention period, a retention.expired governance event is emitted. The platform does not auto-delete — the product owner must explicitly trigger deletion or extend retention. This deliberate design prevents accidental data loss.

When data needs to be deleted (retention expiry or regulatory request), the platform uses position delete files in the data lake:

  1. Identify data files containing records to delete
  2. Write position delete files listing row positions to logically delete
  3. Subsequent reads skip deleted positions
  4. Physical deletion occurs during data lake compaction

Each data product defines freshness and availability SLAs. The platform monitors these continuously and emits alerts when thresholds are breached.

sla:
freshness: 24h
completeness: 0.99
SLA TypeDescriptionAlert Mechanism
FreshnessMaximum age of the latest materializationsla.breach event when threshold exceeded
CompletenessMinimum ratio of non-null values in key columnsQuality check after each run
AvailabilityUptime of serving endpointsHealth check every 5 minutes

SLA breach notifications can be routed to multiple channels:

ChannelConfiguration
EmailPer-product owner email from product.yaml
WebhookCustom HTTP endpoint for integration
PagerDutyIncident creation for critical breaches

Lineage answers: “Where did this data come from, and what depends on it?”

Four metadata pathways feed the lineage graph:

PathwayWhenWhat
Manifest registrationProduct creationProduct identity, schemas, classification
Asset graphAfter deployDependency edges, source-to-product lineage
Execution eventsAfter each materializationFreshness, row count, duration, quality scores
Deployment lineageEvery deployManifest version to data snapshot mapping

The platform automatically derives an entity graph from column roles across all deployed products:

  • Products with role: identity columns become nodes
  • role: event_key columns create directed edges between products
  • The graph spans all domains within a tenant

This graph powers impact analysis, deletion cascades, and the concept registry.

As products are deployed, the platform automatically builds a business ontology — a structured vocabulary of the organization’s data concepts.

StateMeaning
draftAuto-extracted, not yet reviewed
proposedReviewed by product owner, submitted for approval
acceptedApproved by domain owner
canonicalOrganization-wide standard term
deprecatedNo longer in active use

Canonical concepts surface in akili init suggestions and trigger validation warnings when new products introduce colliding column names.