Skip to content

Hash chain and residency model

This page covers the chain row's life cycle from Sink.Record to durable storage and the per-Domain residency contract that scopes every chain to exactly one Domain. For the underlying byte format see Storage topology; for the entry point and ubiquitous language see the index.

Hash-chain state machine

A chain row transitions through three observable states from the moment the application service hands an Entry to the Sink. The pending and appending states are in-transaction states the append commit passes through; from the operator's HTTP-response vantage a successful Sink.Record returns with the row already in committed.

mermaid
stateDiagram-v2
    [*] --> pending: Sink.Record(entry) — DomainResolver, Pepper resolved
    pending --> appending: BEGIN; pg_advisory_xact_lock(hashtext('plexsphere.audit.' || domain_id))
    appending --> committed: INSERT audit_log_entry RETURNING; UPDATE chain_head; COMMIT
    committed --> archived: archiver drains row → zstd → blobstore PUT → archive_etag stamped
    archived --> [*]
    committed --> quarantined: ReconcileChain detects entry_hash divergence
    quarantined --> [*]
    note right of pending
      Resolver failure -> ErrDomainUnresolved (fail-closed).
      Pepper unavailable -> ErrPepperUnavailable (fail-closed).
      Neither path lands a row on a default chain.
    end note
    note right of appending
      Per-Domain advisory lock scopes the seq+head_hash
      read-compute-write to one writer at a time per
      Domain. Other Domains append concurrently.
    end note
    note right of committed
      First externally-observable state. seq is dense and
      monotonic; entry_hash anchors row N to row N-1; the
      chain row is bit-for-bit immutable from here on.
    end note
    note right of quarantined
      Detection lands in a SIBLING table (audit_tamper_quarantine).
      The chain row itself is NEVER mutated — see the
      DECISION block in 0011_audit_log.sql.
    end note

Each transition is bound to a specific code seam:

  • [*] → pendingaudit.Service.Record resolves the Domain and pepper, computes the subject pseudonym, and validates the Entry (caveat names, lengths) BEFORE the transaction opens. A resolver failure surfaces as ErrDomainUnresolved and the Sink fails closed.
  • pending → appendingaudit/repo.AppendEntry opens the transaction and acquires pg_advisory_xact_lock(hashtext('plexsphere.audit.' || domain_id)). The lock scope is per-Domain audit appends only; concurrent heartbeat ingestions, registration commits, and other writers proceed unhindered.
  • appending → committed — within the lock the application reads (next_seq, head_hash) from audit_log_chain_head, computes entry_hash = sha256(head_hash ‖ sha256(canonical_bytes)), INSERTs the chain row (RETURNING the assigned id), upserts the audit_subject_pii row, advances the chain head, and commits. The (domain_id, seq) UNIQUE and (domain_id, entry_hash) UNIQUE constraints catch programmer error if the in-process state ever drifts from the database.
  • committed → archived — the archiver worker pull-drains rows where archived_at IS NULL (the partial index audit_log_entry_unarchived_idx keeps the scan small), zstd-compresses the JSON projection, uploads to audit/<domain_id>/<seq:020d>.json.zst, and stamps archived_at and archive_etag. Idempotent on the (domain_id, seq) tuple.
  • committed → quarantinedReconcileChain (boot-time and every /readyz poll) re-derives entry_hash for every row in a bounded segment, compares against the stored value, and writes a row into audit_tamper_quarantine on divergence. The chain row itself is never mutated; the file-header DECISION block in 0011_audit_log.sql records why.

Residency model

The Platform Audit Log runs a dual residency model. A Domain-scoped privileged action lands on that Domain's chain — the AuditDomainID path parameter on the per-Domain read endpoints is not a filter, it is the chain identity. A platform-scoped action that no Domain owns (Cloud lifecycle, platform-scope label definitions, invitation expiry sweeps) lands on the single platform-residency chain anchored at a reserved, non-zero platform anchor, read through the dedicated platform read endpoints. Cross-Domain decisions still fan out: when a single API request affects Domain A and Domain B, the DomainResolver.Resolve port returns a primary plus a secondary set, and the Sink emits one chain row per affected Domain with a shared correlation_id.

The four audit tables key on domain_id, but that column is the audit context's own chain anchor — a Domain id for per-Domain residency, or the reserved sentinel for platform residency — not a tenancy foreign key. The original per-Domain REFERENCES plexsphere.domains(id) constraint on audit_log_entry, audit_log_chain_head, audit_subject_pii and audit_tamper_quarantine was lifted in 0024_audit_platform_anchor.sql: it rejected the reserved anchor, and an immutable audit trail must outlive the entity it audits rather than couple its retention to Domain lifecycle. Referential integrity for the live tenancy aggregates is unaffected — those foreign keys never lived on the audit tables.

Three load-bearing invariants follow from this rule:

  • Explicit residency, never a guessed anchor. A platform-scoped object resolves deliberately to the reserved platform anchor; the platform chain is hash-chained with the exact same machinery as a per-Domain chain (head, verify, reconcile, pepper, pseudonym, erasure) and is not a catch-all that swallows resolver bugs. An object whose residency cannot be determined at all (ErrDomainUnresolved) MUST fail closed at the Sink rather than be written under an arbitrary anchor. The integration test tests/integration/audit_per_domain_residency_test.go pins the cross-Domain fan-out path, the platform-anchor routing path, and the fail-closed path.
  • No cross-Domain reads. GetAuditEntry returns 404 for a (domainId, seq) that exists on a different Domain's chain — the same shape as a truly-unknown seq, so the endpoint cannot be used as a cross-Domain enumeration oracle.
  • Cursor scoping. The opaque cursor returned by ListAuditEntries carries the domain_id inside the HMAC-signed payload, not out-of-band in the URL. A cursor minted for Domain A surfaces as ErrCursorInvalid (code: cursor_invalid, HTTP 400) when replayed against Domain B (internal/audit/query/cursor.go).