Crosscutting Concepts
Multi-Tenancy
Section titled “Multi-Tenancy”Multi-tenancy is not a feature — it is an invariant. Every layer of the platform enforces tenant isolation.
| Layer | Mechanism |
|---|---|
| API | JWT extraction, tenant_id on every service call |
| Database | PostgreSQL Row-Level Security (SET LOCAL app.tenant_id) |
| Events | Per-tenant Redpanda topics (tenant.{id}.{domain}) |
| Storage | Ceph path prefix ({tenant_id}/{domain}/{product}/) |
| Serving | Tenant-scoped queries, per-tenant resource limits |
| CRDs | Kubernetes CRDs include tenant_id in spec |
No cross-tenant data access is possible. Tenant isolation is enforced at the service layer, not the database layer alone.
Security
Section titled “Security”Classification Propagation
Section titled “Classification Propagation”Every column in the platform has a classification level. Classification propagates through the data pipeline using the high-water mark rule: when data from multiple sources is combined, the output inherits the highest classification of any input.
| Level | Access | Masking |
|---|---|---|
| Public | All consumers | None |
| Internal | Authenticated users | None |
| Confidential (PII) | Role-based | SHA-256 hash, last-4, or REDACTED |
| Restricted | Named principals only | Full column masking |
Column Masking
Section titled “Column Masking”When a consumer’s clearance is below the column’s classification, the platform applies automatic masking:
- PII Name — SHA-256 hash (first 16 hex characters)
- PII Identifier — Last 4 characters visible, rest masked
- PII Contact —
[REDACTED] - Business Confidential — Column omitted entirely
Masking is applied at query time in the serving layer, not at rest.
Authentication
Section titled “Authentication”- Protocol: OIDC via Authentik
- Portal: NextAuth v5 with BFF token relay (ADR-039)
- API: JWT Bearer token validation
- CLI: Device authorization flow
Resilience (Residuality Theory)
Section titled “Resilience (Residuality Theory)”The platform applies different failure strategies based on the criticality of the operation.
Fail-Closed (Security)
Section titled “Fail-Closed (Security)”Security operations never degrade. If the auth service is unavailable, requests are rejected — not permitted with reduced security.
- JWT validation failure → 401 Unauthorized
- Classification check failure → most restrictive level applied
- Masking pipeline failure → column omitted
Fail-Open (Quality)
Section titled “Fail-Open (Quality)”Quality operations degrade gracefully. If a quality check cannot run, the platform serves the last-known-good data rather than returning nothing.
- Quality check timeout → warning logged, data served
- Serving store unavailable → fallback to next tier
- Analytics query timeout → partial results with degradation notice
Circuit Breakers
Section titled “Circuit Breakers”All external dependencies have circuit breakers:
- Serving stores (StarRocks, Redis, TimescaleDB)
- Notification service
- Intelligence service (Claude API)
- KServe inference endpoints
States: Closed (normal) → Open (failing, fast-fail) → Half-Open (probe).
Observability
Section titled “Observability”| Signal | Tool | Purpose |
|---|---|---|
| Metrics | Prometheus + Grafana | Infrastructure and application metrics |
| Logs | Loki + Alloy | Structured logs from all services |
| Traces | Tempo | Distributed request tracing |
| Execution | Dagster UI | Pipeline monitoring and debugging |
| Alerts | Alertmanager | On-call notification routing |
Every execution produces structured events that are queryable in the control-plane API.
GitOps
Section titled “GitOps”All infrastructure changes follow the GitOps pattern:
- Commit changes to git
- ArgoCD detects the change
- ArgoCD syncs the cluster to match git state
- Drift is detected and auto-healed
Manual kubectl patches are never applied in production. Drift between git and the cluster is treated as a P0 issue.
Specification Depth
Section titled “Specification Depth”Each crosscutting concern has a detailed specification in the platform design documents:
| Concern | Specification | Key sections |
|---|---|---|
| Multi-tenancy | API Authentication | JWT extraction, RLS enforcement, tenant scoping |
| Classification | Governance Model | Classification taxonomy, propagation rules, column masking |
| Serving isolation | Serving Layer | 5-point enforcement, store-level namespacing |
| Event isolation | Orchestration | Per-tenant topics, event contracts |
| Quality gates | Quality & Governance | Blocking vs warning severity, SLA tracking |
| Resilience | API Middleware | Circuit breakers, rate limiting, request tracing |