Skip to content

Retention and right to erasure

This page maps every data-protection invariant to the seam that enforces it, and walks through the right-to-erasure flow that preserves chain integrity while honouring GDPR Article 17 / SOC 2 deletion requests. For the chain mechanics that make the design possible see Hash chain and residency model; for the entry point and ubiquitous language see the index.

Retention matrix

The README's data-protection invariants are the contract this context implements. The matrix below maps every value to the seam that enforces it.

PropertyValueEnforcement
Default retention7 years (configurable per Domain)Operator policy, surfaced through the Domain settings; the chain itself does not expire rows.
Hot retention (Postgres)Full retention windowaudit_log_entry rows are never deleted by an automated process. A migration Down is refused with SQLSTATE 0A000 (0011_audit_log.sql).
Cold retention (object store)Full retention window, 7-year P0 restore priorityThe archiver mirrors every chain row to audit/<domain_id>/<seq:020d>.json.zst; the Chainsaw audit-archive-restore suite proves the round-trip.
Write-onceNon-negotiableChain rows have no UPDATE path: every code path is INSERT … RETURNING. The (domain_id, seq) UNIQUE and (domain_id, entry_hash) UNIQUE constraints catch attempted re-writes.
Right-to-erasurePseudonymise-from-inception, then DELETE the PII rowThe chain row references only the 32-byte pseudonym. Erasure DROPs the audit_subject_pii mapping; the chain remains mathematically valid. See the next section.
PII minimisationPseudonym + structured PII row onlyEntry.Subject plaintext does not land on the chain. The audit_subject_pii table holds (subject_plaintext, email_claim, source_ip, idp_claim_names); nothing else touches PII.
ResidencyMandatoryEvery chain row resolves to an owning anchor: a Domain chain for a Domain-scoped action (DomainResolver fans out cross-Domain decisions across each affected Domain) or the single platform-residency chain for a platform-scoped action that no Domain owns. An object whose residency cannot be determined fails closed. The platform chain is erasure-capable on the same mechanics, keyed under the reserved platform anchor.
Caveat valuesNEVER persistedNames-only enforced at three layers (DB CHECK, OpenAPI extension, Spectral rule).

Right-to-erasure flow — pseudonymise-from-inception rationale

The right-to-erasure workflow (GDPR Article 17, SOC 2 trust services criteria) requires the operator to delete an Identity's personal data on request. The Platform Audit Log honours that request without breaking the chain, by a structural choice: chain rows NEVER reference plaintext.

text
   1. Identity exists. audit_subject_pii row binds
      (domain_id, subject_pseudonym) → (subject_plaintext, email_claim, ...).
      Chain rows for this Identity reference subject_pseudonym only.

   2. Right-to-erasure request lands on
      POST /v1/domains/{domainId}/audit/erase-identity
      with body { identity_id: "..." }.

   3. erasure.Service derives subject_pseudonym = sha256(pepper(domain_id) || identity_id),
      DELETEs the audit_subject_pii row (idempotent — second call is a no-op),
      and APPENDs a self-audit row with relation="audit.erase-identity"
      (the erasure event is itself auditable).

   4. The chain still verifies end-to-end: chain rows still reference
      subject_pseudonym, the pseudonym is preserved, only the plaintext
      lookup is gone. ListAuditEntries continues to return the rows
      with subject.identity_id_ref = null (the pseudonym is now an
      orphan — by design).

The pseudonym-from-inception design was chosen against two alternatives the file-header DECISION block on 0011_audit_log.sql records:

  • Subject plaintext on the chain row. Erasure would force an UPDATE to redact the bytes that produced entry_hash, snapping the chain or forcing a re-hash that destroys the tamper-evidence property. Rejected.
  • Subject id as an opaque UUID joining a sibling table. A random UUID is not derivable from the subject identifier; two appends for the same subject in different transactions would land different UUIDs and the verifier could not group rows by subject without a sibling join. Rejected.

The 32-byte HMAC-style pseudonym is deterministic for a given (subject, audit pepper) pair, so the same subject always maps to the same 32 bytes on the chain, and the sibling audit_subject_pii table is a straight upsert on first sighting. Bare concatenation without a separator is safe because peppers are fixed-length 32-byte OpenBao secrets in production (the file-header DECISION block on pseudonym.go records why).

The integration test tests/integration/audit_pseudonymisation_roundtrip_test.go proves the round-trip: append 100 entries for an Identity, call EraseIdentity, assert the audit_subject_pii row is gone, the 100 entries' subject_pseudonym is intact, the chain still verifies end-to-end, and the pseudonym is deterministic across calls in the same Domain.