Skip to content

Cloud Credentials Custodian

This is the per-port API reference for the Cloud Credentials Custodian. It maps each value object, port, and outbox event to its field-level shape. The reference is a map, not a duplicate contract — the authoritative source is the Go code under internal/provisioning/cloudcredentials/ and the SQL schema under 0022_cloud_credentials.sql.

For the bounded-context narrative (the aggregate state machine, the deterministic KV-v2 path, the at-most-once outbox token, the sweeper cadence, the threat model) see ../../contexts/provisioning/credential-pool.md.

The custodian exposes no /v1 HTTP surface of its own. Callers reach it through the in-process Custodian facade. A workspace-level drift gate scans the generated OpenAPI document for any /v1/cloud-credentials* path and asserts the count is zero.

Custodian — facade methods

The Custodian interface (internal/provisioning/cloudcredentials/ports.go) is the only port application services reach directly. Implementations orchestrate the Repository, Materialiser, and AuditSink in one Postgres transaction.

MethodInputOutputMandatory failure modes
Issue(ctx, IssueInput)IssueInputIssueResultErrMaterialiserUnavailable, ErrIssueAtomicityViolated, ErrCloudCredentialAlreadyExists, ErrPathAlreadyMaterialised, ErrCloudNotFound, ErrKVStoreCASConflict, ErrInvalidPathInput
Rotate(ctx, RotateInput)RotateInputRotateResultErrCloudCredentialNotFound, ErrCloudCredentialRevoked, ErrBrokerCASConflict, ErrKVStoreCASConflict, ErrMaterialiserUnavailable
Revoke(ctx, RevokeInput)RevokeInputerrorErrCloudCredentialNotFound (the already-revoked outcome is a no-op, not an error)
Lookup(ctx, CloudCredentialID)CloudCredentialIDCloudCredentialRowErrCloudCredentialNotFound

NewCustodian(CustodianConfig) refuses construction on any nil load-bearing port and surfaces one of ErrCustodianRepositoryRequired / ErrCustodianMaterialiserRequired / ErrCustodianAuditSinkRequired so the operator-misconfigured composition root fails at build time rather than on the first call.

Value objects

CloudCredentialID

Binary form of the credential UUIDv7 used as the broker-row primary key. Defined as type CloudCredentialID [16]byte.

Field / methodTypeNotes
underlying[16]byteUUIDv7 raw bytes. The zero value is treated as "not yet assigned" and is rejected by every aggregate invariant that requires a concrete reference.
String() stringmethodReturns the canonical 8-4-4-4-12 hyphenated UUID textual form. The output matches the format used elsewhere in the platform so logs join cleanly with sibling contexts.
IsZero() boolmethodReports whether the CloudCredentialID has not been assigned.
ParseCloudCredentialID(s string)functionParses a canonical hyphenated UUID string into a CloudCredentialID. Returned errors are package-local so the caller does not need to import an external UUID library to branch on the failure mode.

Material

Opaque secret value the caller hands the custodian on Issue and Rotate. Caller-owned bytes — the custodian MUST defensively copy both Payload and KeyValues before any persistence call so subsequent caller mutation cannot reach the stored row. The type's String() method returns a redacted descriptor ("cloudcredentials.Material{REDACTED}") so accidental %v / %+v / slog formatting cannot leak the secret bytes into a log surface.

FieldGo typePersistenceNotes
Payload[]bytedata.payload in the KV-v2 row.The opaque bytes the caller wants stored.
TTLtime.Durationexpires_at = now() + TTL on the broker row.The custodian's expiry budget; drives the Sweeper's eligibility predicate. Non-positive falls back to CustodianConfig.DefaultTTL, which itself falls back to DefaultDefaultTTL = 24h.
KeyValuesmap[string]stringdata.* in the KV-v2 row.The flat data map written alongside Payload.

IssueInput

Value object Custodian.Issue accepts.

FieldGo typeRequiredNotes
CloudID[16]byteyesThe residency pivot. Drives kv_path derivation and the FK target. Zero value is rejected before any persistence call.
DisplayNamestringyesOperator-supplied human-readable name. Empty string is rejected before any persistence call.
MaterialMaterialyesThe bytes the custodian writes to KV-v2 plus the TTL from which expires_at is derived.

IssueResult

Value object Custodian.Issue returns on success.

FieldGo typeSource
RowCloudCredentialRowThe freshly-inserted broker row, with Version = 1 and KVVersion = 1.
MaterialMaterialThe defensive copy of the caller's Material that the custodian wrote — observable at the call site without round-tripping to KV-v2.

The defensive copy is returned so the caller can confirm exactly which bytes landed in KV-v2 without re-reading the secret. The custodian never retains a reference to the caller's original Material.Payload or Material.KeyValues.

RotateInput

Value object Custodian.Rotate accepts.

FieldGo typeRequiredNotes
IDCloudCredentialIDyesThe aggregate to rotate. Zero value is rejected.
ExpectedVersionint64yesThe broker-row version the caller observed via Lookup. A mismatch surfaces ErrBrokerCASConflict so concurrent rotates fail closed. Must be >= 0.
MaterialMaterialyesThe new bytes; expires_at is recomputed from Material.TTL (or the DefaultTTL fallback).

RotateResult

Value object Custodian.Rotate returns on success.

FieldGo typeSource
Versionint64New broker-row version (ExpectedVersion + 1).
KVVersionint64New KV-v2 version reported by Materialiser.Put.
ExpiresAttime.TimeUTC; now() + Material.TTL.

The CloudCredentialID is implicit in the input and therefore not echoed on the result.

RevokeInput

Value object Custodian.Revoke accepts.

FieldGo typeRequiredNotes
IDCloudCredentialIDyesThe aggregate to revoke. Zero value is rejected.
ReasonstringyesOperator-supplied audit string. Recorded on the CloudCredentialRevoked outbox event payload and on the audit Entry.

Revocation is terminal — once revoked_at is non-null no further rotation is possible. A second Revoke is an idempotent no-op: the custodian observes alreadyRevoked = true from Repository.Revoke and skips the outbox append, the token record, and the audit Entry.

CloudCredentialRow

Projection Repository.FindByID and Repository.ListExpired return. The shape mirrors the plexsphere.cloud_credential row one-for-one so the repository adapter is a thin marshalling layer with no domain computation.

FieldGo typeColumnNotes
IDCloudCredentialIDcloud_credential_idUUIDv7 primary key.
CloudID[16]bytecloud_idUUIDv7. ON DELETE RESTRICT — a Cloud with non-expired credentials cannot be deleted.
DisplayNamestringdisplay_nameOperator-supplied human-readable name.
KVMountstringkv_mountThe custodian's configured Config.KVMount at the time of issuance.
KVPathstringkv_pathThe deterministic projection — see path derivation. UNIQUE per (kv_mount, kv_path).
KVVersionint64kv_versionThe OpenBao KV-v2 store's own version, mirrored on the broker row so operators can correlate without round-tripping to OpenBao.
Versionint64versionBroker-row CAS counter. Incremented by every application-side mutation.
ExpiresAttime.Timeexpires_atWall-clock instant the credential ceases to be valid.
RevokedAt*time.Timerevoked_atNon-nil once the operator has issued a soft-delete. SQL CHECK gates revoked_at / expired_at exclusivity.
ExpiredAt*time.Timeexpired_atNon-nil once the Sweeper has observed the credential's expiry.
CreatedAttime.Timecreated_atUTC.
UpdatedAttime.Timeupdated_atUTC; bumped by every RotateCAS / Revoke / MarkExpired.

Materialiser — KV-v2 adapter

The Materialiser interface is the narrow port the cloudcredentials package consumes for KV-v2 access. The default in-package adapter returns ErrMaterialiserUnavailable from Put/Delete and exposes a working DerivePath; the OpenBao-backed adapter ships under the cloudcredentials_openbao build tag in materialiser/openbao/.

MethodSignatureReturnsNotes
PutPut(ctx, mount, path string, data Material, cas int64) (newVersion int64, err error)New KV-v2 version on success.Writes data at /<mount>/data/<path> with the supplied CAS expectation. CAS mismatch → ErrKVStoreCASConflict. Network / transport failure → ErrMaterialiserUnavailable. The default stub always returns ErrMaterialiserUnavailable so a binary that has not been wired with the OpenBao adapter cannot accidentally issue credentials with no secret material behind them.
DeleteDelete(ctx, mount, path string) errornil on success.Soft-deletes the secret at /<mount>/data/<path>. Non-CAS failure → ErrMaterialiserUnavailable.
DerivePathDerivePath(cloudID, credentialID [16]byte) (mount, path string)The deterministic (mount, path) pair.Pure logic. Zero cloudID or credentialID → empty ("", "") pair; the custodian short-circuits with ErrInvalidPathInput on observing it.

Repository — Postgres port

The Repository interface is the persistence port the Custodian writes through. The Postgres adapter (repo/credentials_pg.go) is a thin wrapper over the sqlc-generated queries from E0_cloud_credentials.sql. Constraint-name dispatch maps SQLSTATE 23505 collisions and SQLSTATE 23503 FK violations to the canonical sentinels.

MethodReturnsMandatory failure modes
Create(ctx, CloudCredentialRow)errorErrCloudCredentialAlreadyExists, ErrPathAlreadyMaterialised, ErrCloudNotFound, ErrInvalidPathInput
FindByID(ctx, CloudCredentialID)(CloudCredentialRow, error)ErrCloudCredentialNotFound
RotateCAS(ctx, id, expectedVersion, newKVVersion, expiresAt)(newVersion int64, error)ErrCloudCredentialNotFound, ErrBrokerCASConflict, ErrCloudCredentialRevoked
Revoke(ctx, CloudCredentialID)(alreadyRevoked bool, error)ErrCloudCredentialNotFound (the already-revoked outcome is a value, not an error).
ListExpired(ctx, now, limit)([]CloudCredentialRow, error)none — empty page on no results.
MarkExpired(ctx, id, when)errorErrCloudCredentialNotFound, ErrCloudCredentialRevoked
AppendOutboxEvent(ctx, id, eventType, payload)(outboxEventID [16]byte, error)Returns the freshly-minted outbox event id for the caller to anchor the token record on. Writes the row with aggregate_type = "cloud_credential".
RecordOutboxToken(ctx, id, eventType, outboxEventID)(recorded bool, error)recorded = false signals an idempotent skip (the unique violation on (cloud_credential_id, event_type) was caught) so the caller knows the outbox event was already emitted previously.
FindOutboxToken(ctx, id, eventType)(outboxEventID [16]byte, found bool, error)Read-side seam exposed for the relay's reconciliation path; found = false means no token row exists yet.
RunInTx(ctx, fn func(tx Repository) error)errorpropagates fn's error verbatim; rolls back on any non-nil return. The wrapped Repository shares the transaction's MVCC snapshot.

Sweeper — TTL expiry worker

The Sweeper ports and configuration. See Sweeper cadence for the boot vs steady-state contract.

SurfaceSignatureNotes
New(repo, audit, clock, opts...)(*Sweeper, error)Refuses construction on any nil collaborator.
(*Sweeper).Run(ctx)(scanned, expired int, err error)Idempotent and self-terminating. Walks ListExpired in pages of defaultPageSize = 256 until an empty page; per row applies MarkExpired + AppendOutboxEvent + RecordOutboxToken inside one RunInTx. First clean return flips the /readyz readiness flag.
(*Sweeper).ProbeFunc(ctx)errorReturns errProbePending until the first Run completes; nil thereafter. Does NOT re-trigger Run.
WithPageSize(n int32)OptionOverrides the per-tick ListExpired page width. Non-positive values are ignored (degrade to default).
WithLogger(logger *slog.Logger)OptionOverrides the structured logger; nil ignored.
WithRegisterer(reg prometheus.Registerer)OptionWires Prometheus counters; nil keeps zero-value mode. The prometheus.AlreadyRegisteredError branch reuses the existing collector so multiple service instances sharing one registry do not panic on the second registration.
ProbeNameconst string = "cloud-credentials-sweeper"Operator-facing probe identifier. Part of the wire contract — runbooks grep for this string verbatim.

The Sweeper also declares package-local AuditEntry and AuditSinkPort types whose shapes mirror the package-level cloudcredentials.AuditEntry and cloudcredentials.AuditSink. The composition root wires a one-method shim so the sub-package stays free of an internal/audit import.

Configuration

The Config value type carried by the cloudcredentials package and the production composition root.

FieldGo typeSourceEffect
SweepIntervaltime.DurationPLEXSPHERE_CLOUD_CREDENTIALS_SWEEP_INTERVALSteady-state period between Sweeper.Run invocations. Zero falls back to defaultCloudCredentialsSweepInterval = 30s.
KVMountstringPLEXSPHERE_CLOUD_CREDENTIALS_KV_MOUNTThe OpenBao KV-v2 mount prefix every derived credential path is anchored under. Empty when DSN is set is a build-time error (ErrCloudCredentialsKVMountRequired).

The CustodianConfig value type (custodian.go) holds the per-instance wiring:

FieldGo typeRequiredEffect
RepositoryRepositoryyesPersistence port. Nil → ErrCustodianRepositoryRequired.
MaterialiserMaterialiseryesKV-v2 adapter port. Nil → ErrCustodianMaterialiserRequired.
AuditSinkAuditSinkyesAudit emission port. Nil → ErrCustodianAuditSinkRequired.
ClockClocknoWall-clock seam. Nil falls back to a time.Now().UTC() implementation.
DefaultTTLtime.DurationnoFallback TTL when IssueInput.Material.TTL is non-positive. Non-positive falls back to DefaultDefaultTTL = 24h.
Logger*slog.LoggernoStructured logger. Nil falls back to slog.Default().

The production composition root holds additional knobs on productionCloudCredentialsConfig (DSN, KVAddress, AllowInsecureMaterialiser, SweepInterval) — see cmd/plexsphere/cloudcredentials_factory_prod.go for the full inventory and the build-time gate.

Error sentinels

Closed enumeration. Callers branch on these via errors.Is — wrapping is fine, identity must remain intact. The set is authoritative; adding a twelfth sentinel without updating errors_test.go trips the TestErrors_AreClosedSet drift gate at build time.

SentinelSource layerTrigger
ErrCloudCredentialNotFoundRepository / CustodianFindByID / Lookup for an absent CloudCredentialID.
ErrCloudCredentialAlreadyExistsRepository (PRIMARY KEY collision on cloud_credential_pkey)A second Issue for the same credential id with mismatched fields.
ErrPathAlreadyMaterialisedRepository (constraint-name dispatch on cloud_credential_kv_path_unique UNIQUE on (kv_mount, kv_path))Chosen-credential-id collision against an existing Cloud's deterministic path. Distinct from ErrCloudCredentialAlreadyExists — they return for different SQL constraints and have different remediations.
ErrCloudNotFoundRepository (constraint-name dispatch on SQLSTATE 23503 FK violation on cloud_id)Issue against a Cloud row that does not exist in plexsphere.clouds.
ErrCloudCredentialRevokedCustodian / RepositoryRotate or MarkExpired on a row whose revoked_at is non-null.
ErrBrokerCASConflictRepository.RotateCASBroker-row version advanced past caller's ExpectedVersion.
ErrKVStoreCASConflictMaterialiser.PutKV-v2 store's kv_version advanced past broker's expected value. Distinct from ErrBrokerCASConflict — different remediation.
ErrMaterialiserUnavailableMaterialiser.Put / DeleteNon-CAS KV-v2 failure (network, transport timeout, OpenBao unsealed-but-blocked); also the default in-package stub's Put/Delete posture when no OpenBao adapter is wired.
ErrIssueAtomicityViolatedCustodianCompensating Materialiser.Delete fired after a Postgres rollback AND the delete itself also failed — KV-v2 row orphaned. The Custodian re-raises with the original tx error AND the compensating-delete error joined via errors.Join.
ErrAuditUnavailableCustodian (counter only)AuditSink.Record fails after the custodian has committed. The custodian decision is durable but the audit chain has gapped. Counter-only — operators alert on AuditSinkFailuresTotal().
ErrInvalidPathInputRepository / Materialiserkv_path violates the path-format invariant before SQL UNIQUE runs (typically zero-UUID input to DerivePath).

Construction-time sentinels (refused at NewCustodian):

SentinelTrigger
ErrCustodianRepositoryRequiredCustodianConfig.Repository is nil.
ErrCustodianMaterialiserRequiredCustodianConfig.Materialiser is nil.
ErrCustodianAuditSinkRequiredCustodianConfig.AuditSink is nil.

Composition-root sentinel (refused at BuildProductionCloudCredentialsFactory):

SentinelTrigger
ErrCloudCredentialsKVMountRequiredproductionCloudCredentialsConfig.KVMount is empty while DSN is set.

Outbox event schemas

Closed enumeration of four event types. The event_type column value is the discriminator stored verbatim in plexsphere.outbox_events; each row carries aggregate_type = "cloud_credential". The payload is the JSON marshal of the matching Go struct under internal/provisioning/cloudcredentials/events/.

The package-local drift gate TestEventTypesAreClosedSet enforces the four-event allow-list — adding a fifth value is a breaking schema change, not a switch-statement extension.

The token-table event_type column carries a parallel snake_case discriminator gated by a CHECK allow-list. The in-Go mapping from the outbox literal to the token-table discriminator lives on events.TokenEventType.

Outbox event_typeToken-table event_type
cloudcredentials.CloudCredentialIssuedcloud_credential_issued
cloudcredentials.CloudCredentialRotatedcloud_credential_rotated
cloudcredentials.CloudCredentialRevokedcloud_credential_revoked
cloudcredentials.CloudCredentialExpiredcloud_credential_expired

CloudCredentialIssued — event_type = "cloudcredentials.CloudCredentialIssued"

JSON fieldGo typeNotes
event_id[16]byteUUIDv7 minted at emission.
occurred_attime.TimeUTC.
credential_id[16]byteUUIDv7.
cloud_id[16]byteUUIDv7. The residency pivot.
kv_mountstringVerbatim from Config.KVMount.
kv_pathstringThe deterministic projection (see path derivation).
versionint64Broker-row version. Always 1 for issuance.
kv_versionint64KV-v2 version. Always 1 for issuance.
expires_attime.TimeUTC.

CloudCredentialRotated — event_type = "cloudcredentials.CloudCredentialRotated"

JSON fieldGo typeNotes
event_id[16]byteUUIDv7 minted at emission.
occurred_attime.TimeUTC.
credential_id[16]byteUUIDv7.
versionint64New broker-row version.
kv_versionint64New KV-v2 version.
expires_attime.TimeUTC; recomputed from Material.TTL.

kv_mount and kv_path are deliberately not carried — they are stable across rotations (the path derivation is invariant on the CloudCredentialID) and would just bloat the outbox row. Consumers that need them re-derive from (cloud_id, credential_id) or look them up on the broker row.

CloudCredentialRevoked — event_type = "cloudcredentials.CloudCredentialRevoked"

JSON fieldGo typeNotes
event_id[16]byteUUIDv7 minted at emission.
occurred_attime.TimeUTC.
credential_id[16]byteUUIDv7.
reasonstringOperator-supplied audit string from RevokeInput.Reason.

CloudCredentialExpired — event_type = "cloudcredentials.CloudCredentialExpired"

JSON fieldGo typeNotes
event_id[16]byteUUIDv7 minted at emission.
occurred_attime.TimeUTC. The expiry timestamp by definition; the credential's expires_at is <= occurred_at.
credential_id[16]byteUUIDv7.

The event carries only the credential identity — downstream consumers re-derive everything else from the broker row or from the prior CloudCredentialIssued payload.

Metrics

The Sweeper exports two zero-value-tolerant counter vectors via metrics.go.

MetricTypeIncrements when
plexsphere_cloud_credentials_sweeper_invocations_totalcounterSweeper.Run is entered (per call).
plexsphere_cloud_credentials_sweeper_expirations_totalcounterA row is successfully MarkExpired and the matching CloudCredentialExpired outbox row is appended.

A nil prometheus.Registerer keeps the counters in zero-value mode — Run still walks the in-memory loop but no scrape surface is registered. The integration test cloudcredentials_pool_sweeper_test.go asserts the invocations counter advances as the ticker fires.

The Custodian also exposes a process-wide counter via AuditSinkFailuresTotal() uint64 — the composition root bridges it onto a Prometheus counter without making the cloudcredentials module import a metrics library. Each increment represents a custodian decision whose audit row never landed even though the broker row + outbox event are durable.

Cross-references