Appearance
Cloud Credentials Custodian
This is the per-port API reference for the Cloud Credentials Custodian. It maps each value object, port, and outbox event to its field-level shape. The reference is a map, not a duplicate contract — the authoritative source is the Go code under internal/provisioning/cloudcredentials/ and the SQL schema under 0022_cloud_credentials.sql.
For the bounded-context narrative (the aggregate state machine, the deterministic KV-v2 path, the at-most-once outbox token, the sweeper cadence, the threat model) see ../../contexts/provisioning/credential-pool.md.
The custodian exposes no /v1 HTTP surface of its own. Callers reach it through the in-process Custodian facade. A workspace-level drift gate scans the generated OpenAPI document for any /v1/cloud-credentials* path and asserts the count is zero.
Custodian — facade methods
The Custodian interface (internal/provisioning/cloudcredentials/ports.go) is the only port application services reach directly. Implementations orchestrate the Repository, Materialiser, and AuditSink in one Postgres transaction.
| Method | Input | Output | Mandatory failure modes |
|---|---|---|---|
Issue(ctx, IssueInput) | IssueInput | IssueResult | ErrMaterialiserUnavailable, ErrIssueAtomicityViolated, ErrCloudCredentialAlreadyExists, ErrPathAlreadyMaterialised, ErrCloudNotFound, ErrKVStoreCASConflict, ErrInvalidPathInput |
Rotate(ctx, RotateInput) | RotateInput | RotateResult | ErrCloudCredentialNotFound, ErrCloudCredentialRevoked, ErrBrokerCASConflict, ErrKVStoreCASConflict, ErrMaterialiserUnavailable |
Revoke(ctx, RevokeInput) | RevokeInput | error | ErrCloudCredentialNotFound (the already-revoked outcome is a no-op, not an error) |
Lookup(ctx, CloudCredentialID) | CloudCredentialID | CloudCredentialRow | ErrCloudCredentialNotFound |
NewCustodian(CustodianConfig) refuses construction on any nil load-bearing port and surfaces one of ErrCustodianRepositoryRequired / ErrCustodianMaterialiserRequired / ErrCustodianAuditSinkRequired so the operator-misconfigured composition root fails at build time rather than on the first call.
Value objects
CloudCredentialID
Binary form of the credential UUIDv7 used as the broker-row primary key. Defined as type CloudCredentialID [16]byte.
| Field / method | Type | Notes |
|---|---|---|
| underlying | [16]byte | UUIDv7 raw bytes. The zero value is treated as "not yet assigned" and is rejected by every aggregate invariant that requires a concrete reference. |
String() string | method | Returns the canonical 8-4-4-4-12 hyphenated UUID textual form. The output matches the format used elsewhere in the platform so logs join cleanly with sibling contexts. |
IsZero() bool | method | Reports whether the CloudCredentialID has not been assigned. |
ParseCloudCredentialID(s string) | function | Parses a canonical hyphenated UUID string into a CloudCredentialID. Returned errors are package-local so the caller does not need to import an external UUID library to branch on the failure mode. |
Material
Opaque secret value the caller hands the custodian on Issue and Rotate. Caller-owned bytes — the custodian MUST defensively copy both Payload and KeyValues before any persistence call so subsequent caller mutation cannot reach the stored row. The type's String() method returns a redacted descriptor ("cloudcredentials.Material{REDACTED}") so accidental %v / %+v / slog formatting cannot leak the secret bytes into a log surface.
| Field | Go type | Persistence | Notes |
|---|---|---|---|
Payload | []byte | data.payload in the KV-v2 row. | The opaque bytes the caller wants stored. |
TTL | time.Duration | expires_at = now() + TTL on the broker row. | The custodian's expiry budget; drives the Sweeper's eligibility predicate. Non-positive falls back to CustodianConfig.DefaultTTL, which itself falls back to DefaultDefaultTTL = 24h. |
KeyValues | map[string]string | data.* in the KV-v2 row. | The flat data map written alongside Payload. |
IssueInput
Value object Custodian.Issue accepts.
| Field | Go type | Required | Notes |
|---|---|---|---|
CloudID | [16]byte | yes | The residency pivot. Drives kv_path derivation and the FK target. Zero value is rejected before any persistence call. |
DisplayName | string | yes | Operator-supplied human-readable name. Empty string is rejected before any persistence call. |
Material | Material | yes | The bytes the custodian writes to KV-v2 plus the TTL from which expires_at is derived. |
IssueResult
Value object Custodian.Issue returns on success.
| Field | Go type | Source |
|---|---|---|
Row | CloudCredentialRow | The freshly-inserted broker row, with Version = 1 and KVVersion = 1. |
Material | Material | The defensive copy of the caller's Material that the custodian wrote — observable at the call site without round-tripping to KV-v2. |
The defensive copy is returned so the caller can confirm exactly which bytes landed in KV-v2 without re-reading the secret. The custodian never retains a reference to the caller's original Material.Payload or Material.KeyValues.
RotateInput
Value object Custodian.Rotate accepts.
| Field | Go type | Required | Notes |
|---|---|---|---|
ID | CloudCredentialID | yes | The aggregate to rotate. Zero value is rejected. |
ExpectedVersion | int64 | yes | The broker-row version the caller observed via Lookup. A mismatch surfaces ErrBrokerCASConflict so concurrent rotates fail closed. Must be >= 0. |
Material | Material | yes | The new bytes; expires_at is recomputed from Material.TTL (or the DefaultTTL fallback). |
RotateResult
Value object Custodian.Rotate returns on success.
| Field | Go type | Source |
|---|---|---|
Version | int64 | New broker-row version (ExpectedVersion + 1). |
KVVersion | int64 | New KV-v2 version reported by Materialiser.Put. |
ExpiresAt | time.Time | UTC; now() + Material.TTL. |
The CloudCredentialID is implicit in the input and therefore not echoed on the result.
RevokeInput
Value object Custodian.Revoke accepts.
| Field | Go type | Required | Notes |
|---|---|---|---|
ID | CloudCredentialID | yes | The aggregate to revoke. Zero value is rejected. |
Reason | string | yes | Operator-supplied audit string. Recorded on the CloudCredentialRevoked outbox event payload and on the audit Entry. |
Revocation is terminal — once revoked_at is non-null no further rotation is possible. A second Revoke is an idempotent no-op: the custodian observes alreadyRevoked = true from Repository.Revoke and skips the outbox append, the token record, and the audit Entry.
CloudCredentialRow
Projection Repository.FindByID and Repository.ListExpired return. The shape mirrors the plexsphere.cloud_credential row one-for-one so the repository adapter is a thin marshalling layer with no domain computation.
| Field | Go type | Column | Notes |
|---|---|---|---|
ID | CloudCredentialID | cloud_credential_id | UUIDv7 primary key. |
CloudID | [16]byte | cloud_id | UUIDv7. ON DELETE RESTRICT — a Cloud with non-expired credentials cannot be deleted. |
DisplayName | string | display_name | Operator-supplied human-readable name. |
KVMount | string | kv_mount | The custodian's configured Config.KVMount at the time of issuance. |
KVPath | string | kv_path | The deterministic projection — see path derivation. UNIQUE per (kv_mount, kv_path). |
KVVersion | int64 | kv_version | The OpenBao KV-v2 store's own version, mirrored on the broker row so operators can correlate without round-tripping to OpenBao. |
Version | int64 | version | Broker-row CAS counter. Incremented by every application-side mutation. |
ExpiresAt | time.Time | expires_at | Wall-clock instant the credential ceases to be valid. |
RevokedAt | *time.Time | revoked_at | Non-nil once the operator has issued a soft-delete. SQL CHECK gates revoked_at / expired_at exclusivity. |
ExpiredAt | *time.Time | expired_at | Non-nil once the Sweeper has observed the credential's expiry. |
CreatedAt | time.Time | created_at | UTC. |
UpdatedAt | time.Time | updated_at | UTC; bumped by every RotateCAS / Revoke / MarkExpired. |
Materialiser — KV-v2 adapter
The Materialiser interface is the narrow port the cloudcredentials package consumes for KV-v2 access. The default in-package adapter returns ErrMaterialiserUnavailable from Put/Delete and exposes a working DerivePath; the OpenBao-backed adapter ships under the cloudcredentials_openbao build tag in materialiser/openbao/.
| Method | Signature | Returns | Notes |
|---|---|---|---|
Put | Put(ctx, mount, path string, data Material, cas int64) (newVersion int64, err error) | New KV-v2 version on success. | Writes data at /<mount>/data/<path> with the supplied CAS expectation. CAS mismatch → ErrKVStoreCASConflict. Network / transport failure → ErrMaterialiserUnavailable. The default stub always returns ErrMaterialiserUnavailable so a binary that has not been wired with the OpenBao adapter cannot accidentally issue credentials with no secret material behind them. |
Delete | Delete(ctx, mount, path string) error | nil on success. | Soft-deletes the secret at /<mount>/data/<path>. Non-CAS failure → ErrMaterialiserUnavailable. |
DerivePath | DerivePath(cloudID, credentialID [16]byte) (mount, path string) | The deterministic (mount, path) pair. | Pure logic. Zero cloudID or credentialID → empty ("", "") pair; the custodian short-circuits with ErrInvalidPathInput on observing it. |
Repository — Postgres port
The Repository interface is the persistence port the Custodian writes through. The Postgres adapter (repo/credentials_pg.go) is a thin wrapper over the sqlc-generated queries from E0_cloud_credentials.sql. Constraint-name dispatch maps SQLSTATE 23505 collisions and SQLSTATE 23503 FK violations to the canonical sentinels.
| Method | Returns | Mandatory failure modes |
|---|---|---|
Create(ctx, CloudCredentialRow) | error | ErrCloudCredentialAlreadyExists, ErrPathAlreadyMaterialised, ErrCloudNotFound, ErrInvalidPathInput |
FindByID(ctx, CloudCredentialID) | (CloudCredentialRow, error) | ErrCloudCredentialNotFound |
RotateCAS(ctx, id, expectedVersion, newKVVersion, expiresAt) | (newVersion int64, error) | ErrCloudCredentialNotFound, ErrBrokerCASConflict, ErrCloudCredentialRevoked |
Revoke(ctx, CloudCredentialID) | (alreadyRevoked bool, error) | ErrCloudCredentialNotFound (the already-revoked outcome is a value, not an error). |
ListExpired(ctx, now, limit) | ([]CloudCredentialRow, error) | none — empty page on no results. |
MarkExpired(ctx, id, when) | error | ErrCloudCredentialNotFound, ErrCloudCredentialRevoked |
AppendOutboxEvent(ctx, id, eventType, payload) | (outboxEventID [16]byte, error) | Returns the freshly-minted outbox event id for the caller to anchor the token record on. Writes the row with aggregate_type = "cloud_credential". |
RecordOutboxToken(ctx, id, eventType, outboxEventID) | (recorded bool, error) | recorded = false signals an idempotent skip (the unique violation on (cloud_credential_id, event_type) was caught) so the caller knows the outbox event was already emitted previously. |
FindOutboxToken(ctx, id, eventType) | (outboxEventID [16]byte, found bool, error) | Read-side seam exposed for the relay's reconciliation path; found = false means no token row exists yet. |
RunInTx(ctx, fn func(tx Repository) error) | error | propagates fn's error verbatim; rolls back on any non-nil return. The wrapped Repository shares the transaction's MVCC snapshot. |
Sweeper — TTL expiry worker
The Sweeper ports and configuration. See Sweeper cadence for the boot vs steady-state contract.
| Surface | Signature | Notes |
|---|---|---|
New(repo, audit, clock, opts...) | (*Sweeper, error) | Refuses construction on any nil collaborator. |
(*Sweeper).Run(ctx) | (scanned, expired int, err error) | Idempotent and self-terminating. Walks ListExpired in pages of defaultPageSize = 256 until an empty page; per row applies MarkExpired + AppendOutboxEvent + RecordOutboxToken inside one RunInTx. First clean return flips the /readyz readiness flag. |
(*Sweeper).ProbeFunc(ctx) | error | Returns errProbePending until the first Run completes; nil thereafter. Does NOT re-trigger Run. |
WithPageSize(n int32) | Option | Overrides the per-tick ListExpired page width. Non-positive values are ignored (degrade to default). |
WithLogger(logger *slog.Logger) | Option | Overrides the structured logger; nil ignored. |
WithRegisterer(reg prometheus.Registerer) | Option | Wires Prometheus counters; nil keeps zero-value mode. The prometheus.AlreadyRegisteredError branch reuses the existing collector so multiple service instances sharing one registry do not panic on the second registration. |
ProbeName | const string = "cloud-credentials-sweeper" | Operator-facing probe identifier. Part of the wire contract — runbooks grep for this string verbatim. |
The Sweeper also declares package-local AuditEntry and AuditSinkPort types whose shapes mirror the package-level cloudcredentials.AuditEntry and cloudcredentials.AuditSink. The composition root wires a one-method shim so the sub-package stays free of an internal/audit import.
Configuration
The Config value type carried by the cloudcredentials package and the production composition root.
| Field | Go type | Source | Effect |
|---|---|---|---|
SweepInterval | time.Duration | PLEXSPHERE_CLOUD_CREDENTIALS_SWEEP_INTERVAL | Steady-state period between Sweeper.Run invocations. Zero falls back to defaultCloudCredentialsSweepInterval = 30s. |
KVMount | string | PLEXSPHERE_CLOUD_CREDENTIALS_KV_MOUNT | The OpenBao KV-v2 mount prefix every derived credential path is anchored under. Empty when DSN is set is a build-time error (ErrCloudCredentialsKVMountRequired). |
The CustodianConfig value type (custodian.go) holds the per-instance wiring:
| Field | Go type | Required | Effect |
|---|---|---|---|
Repository | Repository | yes | Persistence port. Nil → ErrCustodianRepositoryRequired. |
Materialiser | Materialiser | yes | KV-v2 adapter port. Nil → ErrCustodianMaterialiserRequired. |
AuditSink | AuditSink | yes | Audit emission port. Nil → ErrCustodianAuditSinkRequired. |
Clock | Clock | no | Wall-clock seam. Nil falls back to a time.Now().UTC() implementation. |
DefaultTTL | time.Duration | no | Fallback TTL when IssueInput.Material.TTL is non-positive. Non-positive falls back to DefaultDefaultTTL = 24h. |
Logger | *slog.Logger | no | Structured logger. Nil falls back to slog.Default(). |
The production composition root holds additional knobs on productionCloudCredentialsConfig (DSN, KVAddress, AllowInsecureMaterialiser, SweepInterval) — see cmd/plexsphere/cloudcredentials_factory_prod.go for the full inventory and the build-time gate.
Error sentinels
Closed enumeration. Callers branch on these via errors.Is — wrapping is fine, identity must remain intact. The set is authoritative; adding a twelfth sentinel without updating errors_test.go trips the TestErrors_AreClosedSet drift gate at build time.
| Sentinel | Source layer | Trigger |
|---|---|---|
ErrCloudCredentialNotFound | Repository / Custodian | FindByID / Lookup for an absent CloudCredentialID. |
ErrCloudCredentialAlreadyExists | Repository (PRIMARY KEY collision on cloud_credential_pkey) | A second Issue for the same credential id with mismatched fields. |
ErrPathAlreadyMaterialised | Repository (constraint-name dispatch on cloud_credential_kv_path_unique UNIQUE on (kv_mount, kv_path)) | Chosen-credential-id collision against an existing Cloud's deterministic path. Distinct from ErrCloudCredentialAlreadyExists — they return for different SQL constraints and have different remediations. |
ErrCloudNotFound | Repository (constraint-name dispatch on SQLSTATE 23503 FK violation on cloud_id) | Issue against a Cloud row that does not exist in plexsphere.clouds. |
ErrCloudCredentialRevoked | Custodian / Repository | Rotate or MarkExpired on a row whose revoked_at is non-null. |
ErrBrokerCASConflict | Repository.RotateCAS | Broker-row version advanced past caller's ExpectedVersion. |
ErrKVStoreCASConflict | Materialiser.Put | KV-v2 store's kv_version advanced past broker's expected value. Distinct from ErrBrokerCASConflict — different remediation. |
ErrMaterialiserUnavailable | Materialiser.Put / Delete | Non-CAS KV-v2 failure (network, transport timeout, OpenBao unsealed-but-blocked); also the default in-package stub's Put/Delete posture when no OpenBao adapter is wired. |
ErrIssueAtomicityViolated | Custodian | Compensating Materialiser.Delete fired after a Postgres rollback AND the delete itself also failed — KV-v2 row orphaned. The Custodian re-raises with the original tx error AND the compensating-delete error joined via errors.Join. |
ErrAuditUnavailable | Custodian (counter only) | AuditSink.Record fails after the custodian has committed. The custodian decision is durable but the audit chain has gapped. Counter-only — operators alert on AuditSinkFailuresTotal(). |
ErrInvalidPathInput | Repository / Materialiser | kv_path violates the path-format invariant before SQL UNIQUE runs (typically zero-UUID input to DerivePath). |
Construction-time sentinels (refused at NewCustodian):
| Sentinel | Trigger |
|---|---|
ErrCustodianRepositoryRequired | CustodianConfig.Repository is nil. |
ErrCustodianMaterialiserRequired | CustodianConfig.Materialiser is nil. |
ErrCustodianAuditSinkRequired | CustodianConfig.AuditSink is nil. |
Composition-root sentinel (refused at BuildProductionCloudCredentialsFactory):
| Sentinel | Trigger |
|---|---|
ErrCloudCredentialsKVMountRequired | productionCloudCredentialsConfig.KVMount is empty while DSN is set. |
Outbox event schemas
Closed enumeration of four event types. The event_type column value is the discriminator stored verbatim in plexsphere.outbox_events; each row carries aggregate_type = "cloud_credential". The payload is the JSON marshal of the matching Go struct under internal/provisioning/cloudcredentials/events/.
The package-local drift gate TestEventTypesAreClosedSet enforces the four-event allow-list — adding a fifth value is a breaking schema change, not a switch-statement extension.
The token-table event_type column carries a parallel snake_case discriminator gated by a CHECK allow-list. The in-Go mapping from the outbox literal to the token-table discriminator lives on events.TokenEventType.
Outbox event_type | Token-table event_type |
|---|---|
cloudcredentials.CloudCredentialIssued | cloud_credential_issued |
cloudcredentials.CloudCredentialRotated | cloud_credential_rotated |
cloudcredentials.CloudCredentialRevoked | cloud_credential_revoked |
cloudcredentials.CloudCredentialExpired | cloud_credential_expired |
CloudCredentialIssued — event_type = "cloudcredentials.CloudCredentialIssued"
| JSON field | Go type | Notes |
|---|---|---|
event_id | [16]byte | UUIDv7 minted at emission. |
occurred_at | time.Time | UTC. |
credential_id | [16]byte | UUIDv7. |
cloud_id | [16]byte | UUIDv7. The residency pivot. |
kv_mount | string | Verbatim from Config.KVMount. |
kv_path | string | The deterministic projection (see path derivation). |
version | int64 | Broker-row version. Always 1 for issuance. |
kv_version | int64 | KV-v2 version. Always 1 for issuance. |
expires_at | time.Time | UTC. |
CloudCredentialRotated — event_type = "cloudcredentials.CloudCredentialRotated"
| JSON field | Go type | Notes |
|---|---|---|
event_id | [16]byte | UUIDv7 minted at emission. |
occurred_at | time.Time | UTC. |
credential_id | [16]byte | UUIDv7. |
version | int64 | New broker-row version. |
kv_version | int64 | New KV-v2 version. |
expires_at | time.Time | UTC; recomputed from Material.TTL. |
kv_mount and kv_path are deliberately not carried — they are stable across rotations (the path derivation is invariant on the CloudCredentialID) and would just bloat the outbox row. Consumers that need them re-derive from (cloud_id, credential_id) or look them up on the broker row.
CloudCredentialRevoked — event_type = "cloudcredentials.CloudCredentialRevoked"
| JSON field | Go type | Notes |
|---|---|---|
event_id | [16]byte | UUIDv7 minted at emission. |
occurred_at | time.Time | UTC. |
credential_id | [16]byte | UUIDv7. |
reason | string | Operator-supplied audit string from RevokeInput.Reason. |
CloudCredentialExpired — event_type = "cloudcredentials.CloudCredentialExpired"
| JSON field | Go type | Notes |
|---|---|---|
event_id | [16]byte | UUIDv7 minted at emission. |
occurred_at | time.Time | UTC. The expiry timestamp by definition; the credential's expires_at is <= occurred_at. |
credential_id | [16]byte | UUIDv7. |
The event carries only the credential identity — downstream consumers re-derive everything else from the broker row or from the prior CloudCredentialIssued payload.
Metrics
The Sweeper exports two zero-value-tolerant counter vectors via metrics.go.
| Metric | Type | Increments when |
|---|---|---|
plexsphere_cloud_credentials_sweeper_invocations_total | counter | Sweeper.Run is entered (per call). |
plexsphere_cloud_credentials_sweeper_expirations_total | counter | A row is successfully MarkExpired and the matching CloudCredentialExpired outbox row is appended. |
A nil prometheus.Registerer keeps the counters in zero-value mode — Run still walks the in-memory loop but no scrape surface is registered. The integration test cloudcredentials_pool_sweeper_test.go asserts the invocations counter advances as the ticker fires.
The Custodian also exposes a process-wide counter via AuditSinkFailuresTotal() uint64 — the composition root bridges it onto a Prometheus counter without making the cloudcredentials module import a metrics library. Each increment represents a custodian decision whose audit row never landed even though the broker row + outbox event are durable.
Cross-references
../../contexts/provisioning/credential-pool.md— the bounded-context narrative: aggregates, state machine, Sweeper cadence, KV-v2 path derivation, threat model.../../../internal/provisioning/cloudcredentials/ports.go— the authoritative Go declarations for every type listed in the Value objects section.../../../internal/provisioning/cloudcredentials/errors.go— the authoritative Go declarations and DECISION blocks for every sentinel listed in the Error sentinels section.../../../internal/provisioning/cloudcredentials/events/events.go— the authoritative event payload struct definitions and theTokenEventTypemapping.../../../internal/platform/db/migrations/0022_cloud_credentials.sql— the persistence schema forplexsphere.cloud_credentialandplexsphere.cloud_credential_outbox_token.../../../internal/platform/db/queries/E0_cloud_credentials.sql— the sqlc-annotated queries the Postgres adapter wraps.../../../cmd/plexsphere/cloudcredentials_factory_prod.go— the production composition root, the env-var inventory, and the build-time validation sentinel.../../../tests/e2e/provisioning/credential-pool-rotation/chainsaw-test.yaml— Chainsaw e2e suite covering Issue + Rotate against a real Postgres + OpenBao stack.