Appearance
Key and Peer Manager
This document is the authoritative bounded-context reference for the peers sub-context that ships under ../../../internal/mesh/peers/. It covers the ubiquitous language, the schema overview for the plexsphere.peer and plexsphere.psk tables, the closed set of six outbox event types the Manager emits, the AES-256-GCM encrypt-at-rest envelope the wrapped pairwise PSK travels in, the threat model that keeps plaintext PSK material off both the wire and the database, and the rationale behind the internal/mesh/peers depguard carve-out .
The peers sub-context is the Key and Peer Manager: it owns the per-Domain Peer aggregate (one row per (Domain, Node) pair) and the wrapped pairwise PSK lifecycle that travels alongside it. The aggregate is addressed by surrogate id and kept in plexsphere.peer; the live PSK is kept in plexsphere.psk and is unwrapped under a per-Domain AES-256 wrap key resolved from plexsphere.domain_nsk_wrap_key. The wrap-key ledger itself is owned by the Node-registration context — peers consumes wrap-key rows but does not mint or rotate them.
Ubiquitous language
The terms below travel verbatim across the Go code, the SQL schema, the outbox event payloads, and the per-Domain audit trail. Keep the names stable in error messages, structured-log attributes, and any operator-facing tooling.
| Term | Definition |
|---|---|
| Peer | The aggregate root for one (Domain, Node) registration the mesh owns. Surrogate id (peer.id, UUIDv7) plus a natural-key UNIQUE on (domain_id, node_id). Lifecycle: registered (deregistered_at IS NULL) -> deregistered (deregistered_at IS NOT NULL); soft-delete keeps the row available for forensic replay. |
| PSK (pairwise pre-shared key) | The symmetric secret a plexd agent and its peer share for WireGuard handshake authentication. Wrapped at rest under the per-Domain wrap key; plaintext PSK is never persisted. |
| Wrap Key | A per-Domain AES-256 key whose lifecycle is owned by the Node-registration context. The peers package consumes wrap-key rows from plexsphere.domain_nsk_wrap_key via Repository.GetActiveWrapKey but does not mint or rotate them. |
| kid | Opaque human-readable wrap-key identifier the Manager stamps onto every freshly-issued PSK row. Mirrors the JOSE kid convention. SQL CHECK on plexsphere.psk.kid pins the regex ^[A-Za-z0-9._:-]+$. The software wrapper defaults to psk-software-v1. |
| wrap_key_version | Per-Domain monotonic version of the wrap key the wrap was produced under. Composite FK from plexsphere.psk(domain_id, wrap_key_version) to plexsphere.domain_nsk_wrap_key(domain_id, version) guarantees a PSK row cannot reference a wrap-key version that does not exist for its Domain. |
| Wrapped PSK | The AES-256-GCM ciphertext of the PSK plaintext, formatted as <12-byte nonce> || <ciphertext+16-byte tag>. SQL CHECK pins the length to BETWEEN 28 AND 4096 bytes. |
| Manager | The sole command-side entry point for the sub-context (Register, AssignPSK, Deregister). Defined in ../../../internal/mesh/peers/manager.go; emits exactly six outbox event types from the closed wire-contract set. |
| Wrapper | The AES-256-GCM seam (Wrap / Unwrap) the Manager uses to encrypt PSK plaintext under the active wrap key. Production wires *softwareWrapper; a future HSM-backed adapter can override the wrap path without touching the Manager. |
| Repository | The per-transaction port the Manager calls inside the Store callback. Production wires a sqlc-generated adapter; tests inject a peers-package value-type fake without importing pgx. |
| Store | The transaction-callback seam (InTx(ctx, fn) error) the Manager runs every command inside. Mirrors the relay convention in ../../../internal/mesh/sse/relay.go so the repo-wide convention stays uniform. |
| Provider | The read-side projection (SnapshotForDomain) the registration service consumes. Lands in the follow-on peer-snapshot integration; owns the read port and keeps the Manager / Provider split between command and query surfaces. |
| Aggregate id | peer.id (UUIDv7). The id the Manager stamps onto every outbox row's aggregate_id column so consumers can route per-peer events without joining back to plexsphere.peer. |
| Endpoint observation | The publicly-observed (host, port) tuple a plexd agent reports through the NAT-discovery intake surface, captured on the Peer aggregate as the (last_endpoint, last_endpoint_port, last_endpoint_reported_at) triple. The control plane gates reported_at against the server's wall clock before persisting; the wire layer renders the typed columns as a host:port string. See ./endpoints.md for the full intake surface. |
| Endpoint sweeper | The periodic background sweep that stamps last_endpoint_stale_at on every Peer whose last_endpoint_reported_at is older than the per-Domain endpoint_ttl. Each transition emits a peer_endpoint_changed event with an empty endpoint so downstream consumers know the direct path is no longer trustworthy. Implemented in ../../../internal/mesh/peers/endpoint_sweeper.go; see ./endpoints.md for the operational contract. |
Schema
The peers sub-context owns four tables in the plexsphere schema — plexsphere.peer, plexsphere.psk, plexsphere.peer_relay_assignment, and plexsphere.peer_key_rotation. The authoritative sources are the migrations at ../../../internal/platform/db/migrations/0012_peers.sql, ../../../internal/platform/db/migrations/0030_peer_relay_assignment.sql, and ../../../internal/platform/db/migrations/0032_peer_key_rotation.sql; the column meanings below paraphrase those files' DECISION blocks. All three migrations' Down blocks deliberately refuse the downgrade with SQLSTATE 0A000 because the wrapped-PSK ciphertext chain, the per-peer relay-assignment history, and the per-Node key-rotation history are all security-critical audit evidence.
plexsphere.peer — Peer aggregate row
| Column | Type | Meaning |
|---|---|---|
id | uuid PRIMARY KEY | Surrogate aggregate id (UUIDv7). Stamped onto every outbox row's aggregate_id so consumers route per-peer without joining back to this table. |
domain_id | uuid NOT NULL | FK to plexsphere.domains(id) ON DELETE RESTRICT. Half of the natural key. |
node_id | uuid NOT NULL | FK to plexsphere.nodes(id) ON DELETE RESTRICT. The other half of the natural key. RESTRICT prevents an in-flight Node deletion from orphaning a live peer row; the Deregister flow runs first. |
created_at | timestamptz NOT NULL DEFAULT now() | Insertion timestamp. |
updated_at | timestamptz NOT NULL DEFAULT now() | Maintained by the BEFORE UPDATE trigger plexsphere.peer_set_updated_at so the application layer cannot accidentally drift the column off the wall-clock. |
deregistered_at | timestamptz NULL | Soft-delete tombstone. NULL means "live"; non-NULL means "no longer live" but kept for forensic replay. |
last_endpoint | inet NULL | Host portion of the most recent NAT-observed endpoint the agent reported, stored as a Postgres inet so address-family is validated at the storage boundary and containment queries (<<) work for triage. NULL means "no observation has ever landed". Added by migration 0029_peer_endpoint.sql. |
last_endpoint_port | integer NULL CHECK (last_endpoint_port IS NULL OR last_endpoint_port BETWEEN 1 AND 65535) | Port portion of the most recent NAT-observed endpoint, paired with last_endpoint. The CHECK pins the RFC 6056 ephemeral-port range so a malformed handler push cannot smuggle a zero or out-of-band port into the column. |
last_endpoint_reported_at | timestamptz NULL | Wall-clock instant the agent attached to its endpoint PUT body, coerced to UTC. Backs the staleness range scan and the freshness gate; NULL means "no observation has ever landed". |
last_endpoint_stale_at | timestamptz NULL | Stale-tombstone timestamp the endpoint sweeper stamps when an observation passes its per-Domain endpoint_ttl without refresh. NULL means "the live observation is still within TTL"; non-NULL means "the sweeper has tombstoned this row and emitted a stale peer_endpoint_changed event". |
Constraints and indexes:
peer_domain_node_uq UNIQUE (domain_id, node_id)— the natural-key invariant Manager.Register relies on. A duplicate INSERT trips pgerrcode 23505 and the repository classifies it aspeers.ErrPeerAlreadyRegistered.peer_deregistered_after_created CHECK (deregistered_at IS NULL OR deregistered_at >= created_at)— pins time monotonicity for the soft-delete.peer_domain_idx ON plexsphere.peer (domain_id) WHERE deregistered_at IS NULL— backs the Provider'sSnapshotForDomainread so the snapshot stays a cheap b-tree range scan even after long-running Domains accumulate soft-deleted rows.peer_endpoint_stale_idx ON plexsphere.peer (last_endpoint_reported_at) WHERE deregistered_at IS NULL AND last_endpoint_stale_at IS NULL— backs the endpoint sweeper's range scan "Peers whoselast_endpoint_reported_atis older than the per-Domainendpoint_ttland which are still in a watchable state". The partial predicate keeps deregistered and already-stale rows out of the index so each sweeper tick stays a cheap b-tree range scan rather than a sequential scan across the whole tenancy. Added by migration0029_peer_endpoint.sql.
plexsphere.psk — wrapped pairwise PSK
| Column | Type | Meaning |
|---|---|---|
peer_id | uuid PRIMARY KEY | 1:1 with plexsphere.peer(id), ON DELETE RESTRICT. The 1:1 design lets the partial unique index psk_live_peer_uq enforce "at most one live PSK per peer" while retired rows sit alongside. |
domain_id | uuid NOT NULL | Half of the composite FK to the wrap-key ledger. |
kid | text NOT NULL CHECK (kid ~ '^[A-Za-z0-9._:-]+$') | Opaque wrap-key identifier. |
wrapped_psk | bytea NOT NULL CHECK (length(wrapped_psk) BETWEEN 28 AND 4096) | The AES-256-GCM ciphertext (<nonce> || <ct+tag>). The lower bound is the GCM floor (12 nonce + 16 tag); the upper bound caps any future HSM/PKCS#11 wrap form without inviting unbounded blob writes. |
wrap_key_version | integer NOT NULL CHECK (wrap_key_version >= 1) | Other half of the composite FK to the wrap-key ledger. |
issued_at | timestamptz NOT NULL DEFAULT now() | Insertion timestamp. |
retired_at | timestamptz NULL | Soft-retirement tombstone. NULL means "live"; non-NULL means "retired" but kept for forensic replay of the wrap-key chain. |
Constraints and indexes:
- `psk_retired_after_issued CHECK (retired_at IS NULL OR retired_at
= issued_at)` — time-monotonic retirement.
psk_wrap_key_fk FOREIGN KEY (domain_id, wrap_key_version) REFERENCES plexsphere.domain_nsk_wrap_key (domain_id, version) ON DELETE RESTRICT— the composite FK that pins the wrap-key chain at the SQL layer. A single-column FK onwrap_key_versionalone would let a PSK row reference a wrap-key version belonging to a different Domain, silently breaking the wrap chain on forensic replay; the composite FK forecloses that drift.psk_live_peer_uq UNIQUE INDEX ON plexsphere.psk (peer_id) WHERE retired_at IS NULL— the partial unique index that surfaces duplicate live-PSK INSERTs as pgerrcode 23505. The repository classifies it ontopeers.ErrPSKAlreadyAssigned.psk_wrap_key_idx ON plexsphere.psk (domain_id, wrap_key_version) WHERE retired_at IS NULL— backs the rotation sweep that lists every PSK still wrapped under a given(domain_id, wrap_key_version)so a future rotation pass can re-wrap in batches without a sequential scan.
plexsphere.peer_relay_assignment — relay-fallback assignment row
The relay-fallback aggregate the Mesh Fabric sub-story attaches to: one live row per Peer aggregate carrying the bridge Node that serves as the peer's relay-fallback, the (inet, port) pair the plexd agent dials when the direct WireGuard handshake times out, and the wall-clock window (assigned_at, retired_at) the assignment was active in. Soft-retired rows accumulate alongside live rows so a forensic replay can reconstruct "which bridge served this peer at 03:14 UTC". At most one live row per Peer is enforced by the partial unique index peer_relay_assignment_live_uq, mirroring the psk_live_peer_uq pattern. Added by migration 0030_peer_relay_assignment.sql.
| Column | Type | Meaning |
|---|---|---|
id | uuid PRIMARY KEY | Surrogate aggregate id (UUIDv7). |
domain_id | uuid NOT NULL | FK to plexsphere.domains(id) ON DELETE RESTRICT. Anchors the assignment to its tenant. |
peer_id | uuid NOT NULL | FK to plexsphere.peer(id) ON DELETE RESTRICT. Names the Peer aggregate the assignment is bound to. RESTRICT prevents an in-flight Peer deletion from orphaning a live relay row; the Deregister flow runs first. |
bridge_node_id | uuid NOT NULL | FK to plexsphere.nodes(id) ON DELETE RESTRICT. Names the bridge Node that serves as the peer's relay-fallback. RESTRICT prevents a bridge teardown from dropping live assignments out from under their peers — the relay-assigner sweep must retire every dependent assignment first. |
fallback_endpoint | inet NOT NULL | Host portion of the dial address plexd uses when the direct handshake times out. Sourced from the selected bridge Node's last_endpoint (or its mesh_ip when no NAT observation has landed yet). The (inet, integer) split lets an operator triage a relay outage with a real containment query (fallback_endpoint << inet '10.0.0.0/8') rather than a regex hack on a text column. |
fallback_endpoint_port | integer NOT NULL CHECK (fallback_endpoint_port BETWEEN 1 AND 65535) | Port portion paired with fallback_endpoint. Defaults to peers.DefaultBridgeRelayPort (51820) when no per-bridge override has been observed; a future story will surface a per-bridge configured listen port. |
assigned_at | timestamptz NOT NULL DEFAULT now() | Wall-clock instant the assignment became live. |
retired_at | timestamptz NULL | Soft-retirement tombstone. NULL means "live"; non-NULL means "retired" but kept for forensic replay of the per-peer relay chain. |
Constraints and indexes:
peer_relay_assignment_retired_after_assigned CHECK (retired_at IS NULL OR retired_at >= assigned_at)— time-monotonic retirement, mirroring thepsk_retired_after_issuedshape.peer_relay_assignment_live_uq UNIQUE INDEX ON plexsphere.peer_relay_assignment (peer_id) WHERE retired_at IS NULL— partial unique index that surfaces duplicate live-INSERTs as pgerrcode 23505. The Manager always soft-retires the prior row inside the same transaction before upserting a fresh assignment; a 23505 here is a programmer error, not a steady-state outcome.peer_relay_assignment_bridge_idx ON plexsphere.peer_relay_assignment (bridge_node_id) WHERE retired_at IS NULL— partial helper index backing the relay assigner's bridge-churn sweep ("every peer whose currently- assigned bridge just transitioned to unreachable"). The partial predicate keeps soft-retired tombstones out of the index so each sweep tick stays a cheap b-tree range scan rather than a sequential scan across every historical assignment.
The migration's Down block refuses the downgrade with SQLSTATE 0A000. The assignment history is the forensic evidence chain that links every peer-to-bridge relay decision to the wall-clock window the assignment was active in; dropping the table is a regulatory- retention regression mirroring the 0012_peers and 0008_node_secret_keys stance. Operators performing a legitimate wipe-and-reinstall must drop the Postgres database itself; migrations.Down is not a destructive teardown tool.
plexsphere.peer_key_rotation — KeyRotation aggregate row
The key-rotation aggregate the Key Rotation Workflow sub-story attaches to: one row per key-rotation attempt a Node makes, carrying the lifecycle the rotation walks (pending → completed, or pending → superseded when a fresh rotation request pre-empts an in-flight one), the wall-clock window (requested_at, completed_at) the attempt spanned, and the before/after WireGuard public keys so a forensic replay can reconstruct which public key a Node presented at any past instant. At most one pending row per Node is enforced by the partial unique index peer_key_rotation_pending_uq, mirroring the psk_live_peer_uq pattern. Added by migration 0032_peer_key_rotation.sql. See ./key-rotation.md for the full lifecycle, the two-channel dispatch model, and the rotation-ordering invariants.
| Column | Type | Meaning |
|---|---|---|
id | uuid PRIMARY KEY | Surrogate aggregate id (UUIDv7). Stamped onto the outbox row's aggregate_id. |
domain_id | uuid NOT NULL | FK to plexsphere.domains(id) ON DELETE RESTRICT. Carries the Domain isolation boundary. |
node_id | uuid NOT NULL | FK to plexsphere.nodes(id) ON DELETE RESTRICT. The Node whose mesh key is rotating; the column the partial unique index keys on. RESTRICT keeps the rotation evidence chain intact across a Node teardown. |
peer_id | uuid NOT NULL | FK to plexsphere.peer(id) ON DELETE RESTRICT. The live Peer aggregate the rotation is bound to. |
state | text NOT NULL CHECK (state IN ('pending', 'completed', 'superseded')) | The lifecycle discriminator. The SQL CHECK pins the closed set so a mis-wired INSERT fails at the database rather than promoting an unknown value into the domain. |
requested_at | timestamptz NOT NULL DEFAULT now() | Wall-clock instant the rotation was requested. |
previous_public_key | bytea NULL CHECK (previous_public_key IS NULL OR octet_length(previous_public_key) = 32) | The Node's Curve25519 public key before the rotation. NULL when the caller did not supply it. |
new_public_key | bytea NULL CHECK (new_public_key IS NULL OR octet_length(new_public_key) = 32) | The Node's Curve25519 public key after the rotation. NULL while the row is pending; the completion flow stamps it. |
completed_at | timestamptz NULL | Wall-clock instant the rotation completed. NULL marks the row in flight. |
Constraints and indexes:
peer_key_rotation_completed_after_requested CHECK (completed_at IS NULL OR completed_at >= requested_at)— time-monotonic completion, mirroring thepsk_retired_after_issuedandpeer_relay_assignment_retired_after_assignedshapes.peer_key_rotation_pending_uq UNIQUE INDEX ON plexsphere.peer_key_rotation (node_id) WHERE state = 'pending'— the partial unique index that is the durable name of the one-pending-rotation-per-Node invariant. A second rotation request against a Node with an in-flight rotation first flips the prior row tosupersededinside the same transaction so the index admits the freshpendingrow.
The migration's Down block refuses the downgrade with SQLSTATE 0A000. The rotation history is the forensic evidence chain that links every Node WireGuard public-key transition to the wall-clock window the rotation spanned; dropping it is a security-audit regression mirroring the 0012_peers, 0029_peer_endpoint, and 0030_peer_relay_assignment stance.
Event types
The Manager emits exactly six outbox event types — no more, no less. The literals below are the bare snake_case strings the relay routes peer events on; downstream consumers switch on these exact literals. The closed set is enforced by the AST gate at ../../../tests/workspace/peers_event_type_set_test.go; new peer-side event types must land in ../../../internal/identity/tenancy/events/events.go first and be added to the gate's allow-list.
The exact literals the wire contract pins are:
peer_registeredpeer_psk_assignedpeer_deregisteredpeer_endpoint_changedrotate_keyspeer_key_rotated
Five of the six — every literal except rotate_keys — are past-tense state-change notifications: something has happened to a Peer aggregate and the bus carries the fact. rotate_keys is the lone exception: it is an imperative command addressed to a single Node, telling it to generate a fresh Curve25519 keypair and call POST /v1/keys/rotate. The wire shape is identical (an outbox row routed on its bare snake_case event_type); only the semantics differ. See ./key-rotation.md for the command's role in the rotation workflow.
The bare snake_case form is deliberate. The earlier tenancy events use the aggregate-prefixed CamelCase form (tenancy.NodeRegistered etc.); the wire-contract spec explicitly fixes the bare snake_case strings as the closed set the relay routes peer events on, so the new constants land in this form to satisfy the spec without churning the prior wire contract.
peer_registered
Constructed by events.NewPeerRegistered; constants TypePeerRegistered = "peer_registered". Emitted by Manager.Register when a fresh Peer aggregate lands for a (DomainID, NodeID) pair. The payload deliberately omits any key material — the wire-shape spec is "ID triple plus event metadata, plus the additive relay-fallback hint when a bridge candidate is available".
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>",
"fallback_endpoint": "<host:port, omitted when no bridge candidate is available>"
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero. A zero now defaults to time.Now().UTC(); a non-zero now is coerced to UTC.
The fallback_endpoint field is additive and optional: a non-empty value carries the dial address the plexd agent uses when the direct WireGuard handshake times out, sourced from the bridge Node the relay-assigner chose for this peer; an empty value is dropped from the JSON wire-form by the omitempty tag so a legacy consumer that pre-dates the relay-fallback surface observes the original 5-field payload shape byte-for-byte. The empty-string case is the canonical no-bridge- candidate signal — see ./relay-fallback.md for the chooser's decision tree and the composition of the empty fallback_endpoint with the (separate) endpoint-snapshot field on the silent-unreachability path.
peer_psk_assigned
Constructed by events.NewPeerPSKAssigned; constant TypePeerPSKAssigned = "peer_psk_assigned". Emitted by Manager.AssignPSK after the Manager has wrapped the PSK plaintext and persisted the row. The payload deliberately carries NO ciphertext and NO plaintext — only kid and wrap_key_version travel on the bus. A bus consumer that needs to unwrap the PSK reads the canonical wrapped row out of plexsphere.psk and threads (kid, wrap_key_version) from the event into the unwrap call. This keeps the "no plaintext on the wire" invariant load-bearing at the event boundary, not just at the storage boundary.
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>",
"kid": "<wrap-key identifier, e.g. psk-software-v1>",
"wrap_key_version": 1
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero; kid must be non-empty (after trim); wrap_key_version must be >= 1 because the SQL CHECK on plexsphere.psk.wrap_key_version requires a positive integer.
peer_deregistered
Constructed by events.NewPeerDeregistered; constant TypePeerDeregistered = "peer_deregistered". Emitted by Manager.Deregister after the soft-delete UPDATE has stamped deregistered_at. The payload carries the timestamp the SQL UPDATE stamped onto the row so consumers can distinguish "was deregistered now" from "was already deregistered before this replay" without joining back to plexsphere.peer.
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>",
"deregistered_at": "<RFC3339Nano UTC>"
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero; deregistered_at must be set — a soft-delete with a zero timestamp would silently desync the bus from the SQL row.
peer_endpoint_changed
Constructed by events.NewPeerEndpointChanged; constant TypePeerEndpointChanged = "peer_endpoint_changed". Emitted by the endpoint-intake handler on the first observation, on every subsequent observation whose (host, port) differs from the persisted value, on an un-stale transition where the prior row was tombstoned, and by the endpoint sweeper as a stale-tombstone signal (endpoint == "" in that case). The payload carries the new host:port string and the prior host:port so a downstream consumer can distinguish "first observation" (previous_endpoint is empty), "endpoint changed" (both are non-empty and differ), and "observation went stale" (endpoint is empty) without joining back to plexsphere.peer. See ./endpoints.md for the intake surface and the sweeper's tick contract.
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>",
"endpoint": "<host:port, or empty on the stale-tombstone signal>",
"endpoint_reported_at": "<RFC3339Nano UTC>",
"previous_endpoint": "<host:port, or empty on the first observation>",
"fallback_endpoint": "<host:port, omitted when no bridge candidate is available>"
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero; endpoint_reported_at must be set — a zero observation timestamp would silently desync the bus from the SQL row. The endpoint, previous_endpoint, and fallback_endpoint strings MAY be empty: an empty endpoint is the stale-tombstone signal emitted by the sweeper when an observation passes TTL without refresh, an empty previous_endpoint is the first-observation signal where no prior endpoint exists, and an empty fallback_endpoint is the no-bridge-candidate signal where the relay-assigner could not find a healthy bridge to compose with the observation. Handler-side parsing of host:port lives at the intake boundary; the constructor deliberately does not re-parse so the same payload shape carries both a live observation and a stale tombstone.
The fallback_endpoint field is additive and optional in the same sense as on peer_registered above: a non-empty value carries the dial address the plexd agent uses when the direct WireGuard handshake times out; an empty value is dropped from the JSON wire-form by the omitempty tag so a legacy consumer that pre-dates the relay-fallback surface observes the original 8-field payload shape byte-for-byte. The two empty-string fields compose: endpoint == "" AND fallback_endpoint == "" is the silent-unreachability signal — the direct path has gone stale AND no relay-fallback is available, so the plexd agent has nowhere left to dial. See ./relay-fallback.md for the relay- assigner's decision tree and the composition rules.
rotate_keys
Constructed by events.NewRotateKeys; constant TypeRotateKeys = "rotate_keys". Appended by Manager.RequestRotation when an operator requests a mesh-key rotation for a Node. Unlike the five past-tense peer notifications, rotate_keys is an imperative command dispatched to the addressed Node's per-node SSE subject leaf — it instructs the Node to generate a fresh Curve25519 keypair and call POST /v1/keys/rotate. The payload carries only the (Peer, Domain, Node) ID triple; the command needs no further parameters because the Node owns its own keypair generation.
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>"
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero — rotate_keys is addressed at a single Node, so a zero ID would dispatch the command at no Node at all. A zero now defaults to time.Now().UTC(); a non-zero now is coerced to UTC.
peer_key_rotated
Constructed by events.NewPeerKeyRotated; constant TypePeerKeyRotated = "peer_key_rotated". Appended by Manager.CompleteRotation as the last write of the rotation transaction, so a consumer observing the event is guaranteed every aggregate state change it describes has already landed. Every peer of the rotated Node observes it so each peer can update its WireGuard configuration with the rotated Node's new public key and the re-issued PSK reference. The payload carries the new public key plus the (kid, wrap_key_version) reference to the re-issued PSK row, but deliberately carries NO PSK plaintext and NO ciphertext — exactly like peer_psk_assigned. A consumer that needs the PSK reads the canonical wrapped row out of plexsphere.psk and threads (kid, wrap_key_version) into the unwrap call.
JSON payload shape:
json
{
"event_id": "<uuidv7>",
"occurred_at": "<RFC3339Nano UTC>",
"peer_id": "<uuidv7>",
"domain_id": "<uuidv7>",
"node_id": "<uuidv7>",
"new_public_key": "<base64 of the 32-byte Curve25519 public key>",
"kid": "<wrap-key identifier, e.g. psk-software-v1>",
"wrap_key_version": 1
}Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero; new_public_key must be exactly 32 bytes (a wrong-length value is a Curve25519 key the Node never generated); kid must be non-empty (after trim); wrap_key_version must be >= 1. The constructor stores a defensive copy of new_public_key so a caller mutation cannot reach into the event after construction. See ./key-rotation.md for the rotation lifecycle this event terminates.
Encrypt-at-rest envelope
The *softwareWrapper in ../../../internal/mesh/peers/wrapper.go is the dev-only AES-256-GCM PSK wrapper. The wire layout the wrapper emits and the unwrap path parses is:
text
<12-byte nonce> || <ciphertext + 16-byte GCM tag>The constants pinned at the package boundary:
wrapKeySize = 32— the AES-256 key length. A misconfigured Secret mount surfaces asErrWrapKeyMisconfiguredfromNewSoftwareWrapper, not as a confused mid-Wrappanic .gcmNonceSize = 12— the standard AES-256-GCM nonce length per NIST SP 800-38D §5.2.1.1 (96 bits). Pinning it as a package-level constant keeps the layout self-documenting.- GCM tag length is the cipher-suite-fixed 16 bytes; the floor ciphertext (empty plaintext) is therefore 12 + 0 + 16 = 28 bytes, which is exactly the lower bound the SQL CHECK on
plexsphere.psk.wrapped_pskaccepts (length BETWEEN 28 AND 4096).
Wrap(ctx, plaintext):
- Reads a fresh 12-byte nonce from the wrapper's random source (
crypto/rand.Readerin production; tests inject deterministic readers for stable per-call assertions). - Calls
gcm.Sealwith no additional-authenticated-data (AAD) — see the DECISION block on AAD below. - Returns
(<nonce>||<ct+tag>, kid, wrapKeyVersion, nil).
Unwrap(ctx, wrapped, kid, wrapKeyVersion):
- Rejects buffers shorter than
gcmNonceSize + gcm.Overhead()(28 bytes) by wrappingErrPSKNotFoundso the caller can branch on "row present but unwrappable" the same way it branches on "row missing". - Logs a
slog.Warnif the supplied(kid, wrapKeyVersion)diverges from the wrapper's bound values but does NOT short-circuit the GCM-Open — the "retired wrap key tolerated for read" path leans on this tolerance during a rotation. - Calls
gcm.Open. A tag-check failure surfaces wrappingErrPSKNotFoundso the caller distinguishes "missing row" from "row present but the bytes don't authenticate".
DECISION (carried over from wrapper.go): kid and wrap_key_version are NOT bound into the AAD. Binding them would force a rotation flow to re-wrap every PSK in place rather than carry the (kid, version) tuple in the row, and the "retired wrap key tolerated for read" rule implies the row's persisted version may not match the wrapper's currently-active version. The tolerant unwrap matches that requirement; the warn-log surfaces drift to operators without blocking the read.
The Manager additionally fail-closes on a wrapper-version / active-row divergence at write time: Manager.AssignPSK resolves the active wrap-key row first, calls Wrapper.Wrap, and then asserts version == active.Version. A drift here means the boot wiring is misconfigured (or a rotation is mid-flight) and persisting a row whose wrap_key_version FK references a different version than the bytes were sealed under would silently corrupt the unwrap path; the Manager refuses with ErrNoActiveWrapKey rather than write the divergence.
The software wrapper emits exactly one slog.Warn at construction time announcing its dev-only posture. The exact line the constructor emits (pinned in bootWarningMessage so tests can assert on the substring without re-specifying the text) is:
peers: software PSK wrapper is dev-only and MUST NOT be used in production
A future HSM-backed Wrapper adapter slots in here without touching Manager.
Threat model
The peers sub-context defends six classes of attack along the register-assign-deregister path. Each mitigation is implemented in one named place so a reader chasing a security claim does not have to assemble it from multiple files.
- Plaintext PSK exfiltration via the database. Plaintext PSK bytes never reach
plexsphere.psk. The Manager callsWrapper.Wrapexactly once perAssignPSK; only the<nonce>||<ct+tag>envelope is persisted, alongside the(kid, wrap_key_version)tuple needed to identify the wrap key. The SQL CHECK onplexsphere.psk.wrapped_psk(length BETWEEN 28 AND 4096) catches the "stored a 32-byte plaintext PSK by mistake" branch coincidentally; the broader gate is theWrapperinterface itself, which refuses to return less than nonce+tag overhead. - Plaintext PSK exfiltration via the bus. The
peer_psk_assignedevent payload deliberately carries NO ciphertext and NO plaintext — onlykidandwrap_key_versiontravel on the wire. A bus consumer that needs the wrapped bytes readsplexsphere.pskdirectly. The constantsTypePeerPSKAssigned = "peer_psk_assigned"and thePeerPSKAssignedpayload struct in../../../internal/identity/tenancy/events/events.gopin the wire shape so a future contributor cannot accidentally widen it. - Plaintext PSK exfiltration via in-process state.
Manager.AssignPSKdefersdestroyPlaintext(cmd.Plaintext)so the caller's plaintext slice is wiped on every return path — error or success. The wrapper takes its own copy duringWrap, so zeroing the original is the last opportunity for the Manager to scrub the secret. The Manager's structured-log line forAssignPSKdeliberately logs only(peer_id, domain_id, kid, wrap_key_version)— the plaintext bytes never reachslogand the wrapped ciphertext is not interesting to operators tailing the log. - Wrap-key drift / rotation footgun.
Manager.AssignPSKfail-closes on a wrapper-version vs active-row divergence (see the encrypt-at-rest section above) so a misconfigured boot wiring or a mid-flight rotation cannot persist a PSK row whosewrap_key_versionFK references a different version than the bytes were sealed under. The composite FKpsk_wrap_key_fk (domain_id, wrap_key_version) REFERENCES plexsphere.domain_nsk_wrap_key (domain_id, version)adds a second line of defence at the SQL layer: a PSK row can never reference a wrap-key version that does not exist for its Domain . - Retired-key intolerance.
Wrapper.Unwraptolerates a(kid, wrap_key_version)that diverges from the wrapper's currently-bound values — it emits aslog.Warnbut proceeds withgcm.Open. The "retired wrap key tolerated for read" path leans on this so that a rotation in flight does not immediately invalidate every plexd-side WireGuard handshake the relay is trying to deliver. AAD is deliberately NOT bound to(kid, wrap_key_version)for the same reason (see the encrypt-at-rest DECISION above). - Audit-trail loss. The migration's Down block REFUSES the downgrade with SQLSTATE
0A000(feature_not_supported). The wrapped-PSK ciphertext chain links every peer-to-peer WireGuard handshake to the wrap-key version that produced the ciphertext; dropping the tables is a security regression that mirrors the NSK invariant 0008 protects. Soft-delete on bothpeerandpskkeeps the rows available for forensic replay; a hard DELETE would orphan the outbox events that consumers replay .
Sentinel errors callers branch on (defined in ../../../internal/mesh/peers/errors.go):
| Sentinel | Path | Caller branch |
|---|---|---|
ErrPeerAlreadyRegistered | Pre-flight read or 23505 race in Register | Idempotent re-registration |
ErrPeerAlreadyDeregistered | No-op SoftDeletePeer UPDATE in Deregister | Idempotent re-deregistration |
ErrPSKAlreadyAssigned | 23505 against psk_live_peer_uq in AssignPSK | Refuse double-assign |
ErrNoActiveWrapKey | No state = 'active' row, 23503 against psk_wrap_key_fk, or wrapper-version drift | Fail-closed; surface misconfiguration |
ErrNilManagerCollaborator | NewManager with nil Store or Wrapper | Boot-time misconfig alert |
ErrPeerNotFound | pgx.ErrNoRows on GetPeerByID / GetPeerByDomainAndNode | Translated to ErrPeerAlreadyRegistered / ErrPeerAlreadyDeregistered by Manager paths |
ErrPSKNotFound | pgx.ErrNoRows on GetActivePSKByPeer or short/tag-failed buffer in Unwrap | Distinguish "no live PSK" from a hard repository failure |
ErrWrapKeyMisconfigured | NewSoftwareWrapper with non-32-byte WrapKey | Boot-time misconfig alert |
ErrEndpointObservationStale | Manager.UpdateEndpoint when reported_at falls outside the per-Domain endpoint_ttl window | Surface 400 with a dedicated problem-code so observation-replay attempts dashboard separately from clock-skew rejections |
ErrEndpointUnparseable | Manager.UpdateEndpoint when the supplied endpoint string is not a parseable host:port | Refuse the malformed PUT at the boundary; intake handler surfaces 400 endpoint_unparseable |
ErrEndpointClockSkew | Manager.UpdateEndpoint when reported_at drifts past MaxEndpointSkew of the server's wall clock | Refuse before any aggregate write so a forged or drifted client clock cannot smuggle a misdated observation past the freshness invariant |
Depguard rationale
The peers sub-context lives at internal/mesh/peers and is governed by a dedicated depguard rule (no-cross-context-imports-mesh-peers) in ../../../.golangci.yml. The pattern mirrors the existing no-cross-context-imports-mesh-sse and no-cross-context-imports-mesh-reachability carve-outs — one rule per allowing-context, single allow list overriding the broader internal/identity deny entry. The generic no-cross-context-imports rule excludes internal/mesh/peers/** from its files list so the two rules do not collide.
The peers publish pipeline legitimately needs a narrow set of cross-context imports the rule permits:
internal/access—SignerClient(Ed25519 envelope signing) for the follow-on integration that extends the publish path with operator audit signatures.internal/audit— append-only audit row for peer registration, PSK assignment, and deregistration decisions.internal/signing(and subtree likeinternal/signing/envelope) — the canonicalEnvelopevalue type and its hash primitives.internal/identity/tenancy— the strongly typedtenancy.IDDomain/Node/Peer identifiers the Manager threads through every command. Allow-listed explicitly to override the broaderinternal/identitydeny entry.internal/identity/tenancy/events— thepeer_registered/peer_psk_assigned/peer_deregisteredevent constructors and payload structs the Manager emits. Allow-listed explicitly.internal/identity/nodes/nsk— wrap-key envelope precedent the PSK custody path mirrors. Allow-listed explicitly.internal/mesh/sse— the Signed Event Bus the peers Manager hands envelopes to.
internal/access, internal/audit, internal/signing, and internal/mesh/sse are intentionally absent from the deny list (they need no allow entry); listing them in the allow: block would be redundant. The three internal/identity/... allow entries are the load-bearing ones — without them the broader internal/identity deny entry would block the Manager from importing the strongly typed IDs and the event constructors it cannot function without .
The workspace alignment test ../../../tests/workspace/mesh_peers_depguard_test.go parses .golangci.yml and asserts that the carve-out's allow-list matches the documented seven import roots above. A drift between this document, the code's actual imports, and the depguard rule fails CI before review.
Cross-references
./relay-fallback.md— the relay-fallback assignment surface that owns theplexsphere.peer_relay_assignmentaggregate and the additivefallback_endpointfield on thepeer_registeredandpeer_endpoint_changedpayloads. The chooser's decision tree, the bridge-candidate ranking heuristic, the silent-unreachability signal (emptyendpointAND emptyfallback_endpoint), and the relay-assigner sweep that recomputes assignments on a bridge reachability transition all live there../key-rotation.md— the key-rotation workflow that owns theplexsphere.peer_key_rotationaggregate and therotate_keys/peer_key_rotatedevent types. TheRequestRotation/CompleteRotationManager commands, the pending-to-completed-or-superseded lifecycle, the two-channel SSE-plus-heartbeat dispatch model, the rotation-ordering invariants, and the no-sweeper rationale all live there../endpoints.md— the NAT endpoint intake surface that drivesManager.UpdateEndpointand the per-Peer endpoint snapshot the relay-assigner reads when composing apeer_endpoint_changedpayload.../../contexts/mesh/sse.md— the Signed SSE Event Bus the peers Manager hands envelopes to. Same audience and quadrant; the wire-levelTypediscriminator the bus pins (node_state_updated) is the envelope-shape consumer of the six peer event types this document defines.../../how-to/mesh/operate-peers.md— operator how-to for inspecting the liveplexsphere.peerandplexsphere.pskrows, the wrap-key ledger, and the publisher metrics. Lands with the follow-on operator integration; the link is forward-referenced so this document stays the entry point once the how-to ships.../../../internal/mesh/peers/doc.go— package-level overview and the load-bearing DECISION blocks (sub-context boundary, Manager / Wrapper / Repository triple, closed event-type set).../../../.golangci.yml— theno-cross-context-imports-mesh-peersdepguard rule pinning the allow-list of seven cross-context import roots.../../../tests/workspace/mesh_peers_depguard_test.go— workspace alignment test that asserts the depguard rule's allow-list matches the documented set above.../../reference/cli/plexctl/peer.md,../../reference/cli/plexctl/mesh.md— the matching plexctl per-command references.