Troubleshooting
This guide covers common problems you may encounter when using the Akili platform and how to diagnose and resolve them.
Quick Diagnostic Commands
Section titled “Quick Diagnostic Commands”Before diving into specific issues, these commands help you assess the current state:
# Platform health checkakili status
# Product deployment statusakili product status <product-name>
# Recent executions for a productakili run list <product-name>
# Quality check resultsakili governance quality <product-name>
# SLA statusakili governance sla <product-name>
# DLQ entriesakili dlq list --product <product-name>Authentication Issues
Section titled “Authentication Issues”Token Expired or Invalid
Section titled “Token Expired or Invalid”Symptom: All commands fail with 401 Unauthorized.
Error: API returned 401: Token expired or invalidResolution:
# Check current auth statusakili auth status
# Re-authenticate with a fresh tokenakili auth login <new-token>
# Verifyakili statusNo Token Configured
Section titled “No Token Configured”Symptom: akili status shows WARN Auth: no token configured.
Resolution:
# Initialize config if neededakili config init --api-url https://api.akili.io
# Login with your tokenakili auth login <token>Wrong Tenant
Section titled “Wrong Tenant”Symptom: Commands succeed but return empty results or unexpected data.
Resolution:
Check which tenant your token is scoped to:
akili auth status# Look at the email/tenant displayed
# If using multiple tenants, verify your profileakili config showDeployment Failures
Section titled “Deployment Failures”Classification Laundering
Section titled “Classification Laundering”Symptom: akili product deploy fails with 422 ClassificationLaundering.
Error: Classification violation: product 'daily-report' is classified as'internal' but depends on 'raw-payroll' which is 'confidential'Resolution:
Raise the product’s classification to at least match its highest-classified input:
metadata: classification: confidential # was 'internal', must be >= 'confidential'Classification propagation is transitive — check the full dependency chain, not just direct inputs.
Dependency Not Found
Section titled “Dependency Not Found”Symptom: Deploy fails because an upstream product is not deployed.
Error: Upstream product 'raw-orders' is not deployedResolution:
# Check upstream product statusakili product status raw-orders
# Deploy the upstream firstakili product deploy raw-orders
# Then deploy your productakili product deploy daily-summaryManifest Validation Errors
Section titled “Manifest Validation Errors”Symptom: akili validate or akili product deploy --dry-run reports errors.
Resolution:
# Validate locally to see all errorsakili validate .akili/
# Common fixes:# - Missing required fields in product.yaml# - Invalid strategy in inputs.yaml# - Malformed SQL in quality.yaml checks# - Unknown serving intent in serving.yamlExecution Failures
Section titled “Execution Failures”Transform SQL Error
Section titled “Transform SQL Error”Symptom: Execution fails with a SQL error in the execution logs.
# Check recent runsakili run list daily-orders
# Get details on the failed runakili run get run-abc123
# View execution stepsakili run steps run-abc123Common causes:
- Column name mismatch between
inputs.yamland the actual source schema - SQL syntax errors in
transform.sql - Missing
{{ ref() }}macro usage - Source schema changed without updating the product
Resource Exhaustion
Section titled “Resource Exhaustion”Symptom: Execution times out or is killed.
Resolution: Increase resource limits in compute.yaml:
compute: resources: cpu: "2" # was "1" memory: 4Gi # was 2Gi timeout: 3600 # was 1800 (30 min -> 60 min)Retry Exhaustion
Section titled “Retry Exhaustion”Symptom: Execution retries are exhausted and the event is in the DLQ.
# Check DLQ for the productakili dlq list --product daily-orders
# Inspect the failed entryakili dlq get dlq-entry-abc123See DLQ Management for detailed replay and recovery procedures.
Quality Check Failures
Section titled “Quality Check Failures”Blocking Check Failed
Section titled “Blocking Check Failed”Symptom: Data is not promoted to serving stores. Consumers see stale data.
# Check quality resultsakili governance quality daily-orders
# View quality history for trendsakili governance quality-history daily-ordersResolution depends on the check type:
| Check Type | Typical Fix |
|---|---|
not_null | Fix the transform to handle NULL values or update the source |
unique | Add deduplication to the transform |
freshness | Investigate why the pipeline is not running on schedule |
row_count | Check if the source has data for the expected period |
custom SQL | Review the SQL check logic for false positives |
SLA Breach
Section titled “SLA Breach”Symptom: sla.breach alert or akili governance sla shows a threshold exceeded.
akili governance sla daily-orders
# Check if the pipeline is runningakili run list daily-orders
# Check if there are DLQ entriesakili dlq list --product daily-ordersCommon causes of SLA breaches:
- Pipeline not executing (scheduling issue)
- Pipeline executing but failing (check DLQ)
- Pipeline succeeding but quality gate blocking promotion
- Upstream product delayed (cascading freshness breach)
Connection Issues
Section titled “Connection Issues”Connection Test Fails
Section titled “Connection Test Fails”akili connection test conn-abc123# FAIL Connection 'production-db': timeout after 30sDiagnosis:
- Network connectivity — is the source reachable from the platform?
- Credentials — have the Kubernetes secrets been updated?
- Firewall rules — does the source allow connections from the platform’s IP range?
- SSL configuration — is the
ssl_modecorrect?
# Get connection detailsakili connection get conn-abc123 --json
# Verify the connection is listedakili connection listConnection Credential Rotation
Section titled “Connection Credential Rotation”After rotating credentials in the source system:
# 1. Update the Kubernetes secret (cluster admin task)# 2. Test the connectionakili connection test conn-abc123
# 3. No product redeployment neededPlatform Connectivity
Section titled “Platform Connectivity”API Unreachable
Section titled “API Unreachable”Symptom: akili status shows FAIL API Health.
FAIL API Health: Connection refusedPossible causes:
- Platform is down or undergoing maintenance
- Network configuration changed
- API URL is incorrect
# Check your configured API URLakili config show
# Try with an explicit URLakili status --api-url https://api.akili.ioSlow Responses
Section titled “Slow Responses”Symptom: Commands take longer than expected.
# Increase timeoutakili product list --timeout 60
# Check if the issue is specific to one commandakili status # Quick health checkGovernance Issues
Section titled “Governance Issues”Retention Policy Expired
Section titled “Retention Policy Expired”Symptom: retention.expired event in governance events.
akili governance retention daily-ordersResolution: Either extend the retention period or acknowledge and delete the expired data:
# Extend retentionakili governance set-retention \ --product daily-orders \ --period 730d \ --basis created_at
# Or investigate and approve deletion via the governance workflowConcept Conflicts
Section titled “Concept Conflicts”Symptom: Validation warning about colliding column names with canonical concepts.
Resolution: Align your column names with the canonical concepts in the business glossary:
# List concepts in your domainakili governance concepts daily-orders
# Use canonical names in your output schemaGetting Help
Section titled “Getting Help”If the issue is not covered here:
- Check execution logs:
akili run steps <run-id> - Check DLQ entries:
akili dlq list --product <name> - Check governance events:
akili governance quality <name> - Verify platform health:
akili status
Related
Section titled “Related”- DLQ Management — failed event handling
- Data Lifecycle — failure handling at each stage
akili status— platform health check