Appearance
Relay Fallback Assignment
This document is the authoritative bounded-context reference for the relay-fallback surface that ships alongside the peers sub-context under ../../../internal/mesh/peers/. It covers the ubiquitous language, the plexsphere.peer_relay_assignment schema, the bridge-selection heuristic, the reactive RelayAssigner sweep, the additive fallback_endpoint payload extension on the two peer-aggregate event types (peer_registered, peer_endpoint_changed), and the deferred-latency / deferred-bridge- relay-port seams that future stories own.
Relay fallback is the control-plane surface that pre-computes, per peer pair, a fallback path via one of the Domain's reachable bridge Nodes. The assigned bridge's mesh-relay listener endpoint is included as fallback_endpoint on the two peer-aggregate event payloads the SSE bus already carries. The agent attempts the direct WireGuard endpoint first; if the handshake times out, it dials the relay locally. The control plane re-evaluates assignments when a bridge becomes unreachable or a better candidate appears, and rolls changes out via peer_endpoint_changed with an updated (or empty) fallback_endpoint.
Ubiquitous language
The terms below travel verbatim across Go code, SQL schema, outbox event payloads, structured-log attributes, and any operator-facing tooling. Keep the names stable.
| Term | Definition |
|---|---|
| Relay assignment | The per-peer-pair binding recording which bridge currently serves as the peer's fallback. Persisted as a row in plexsphere.peer_relay_assignment keyed on the peer's surrogate id. Lifecycle: live (retired_at IS NULL) → retired (retired_at IS NOT NULL); soft-retirement keeps the row available for forensic replay. |
| Bridge candidate | A bridge Node admissible by the chooser. The repository projection joins plexsphere.nodes → plexsphere.resources (kind = bridge) → plexsphere.peer and surfaces (node_id, mesh_ip, last_endpoint, last_endpoint_port, reachability_state, last_endpoint_stale_at) per candidate. |
| Fallback endpoint | The canonical host:port wire-form plexd dials when the direct WireGuard handshake fails. Carried as an optional omitempty string field on the peer_registered and peer_endpoint_changed outbox payloads, and as a netip.AddrPort value-object on the peers.NodePeer / state.Peer projections. Empty value means "no relay assigned" — a legitimate steady state, not corruption. |
| Relay assigner | The reactive background sweeper that consumes a (bridge_node_id, domain_id) pair (driven by a bridge's reachability transition) and recomputes every dependent peer's assignment in one bounded batch. Implemented in ../../../internal/mesh/peers/relay_assigner.go; the L4 dispatcher in the production binary translates tenancy.NodeReachabilityChanged events into Sweep calls. |
| Chooser | The pure function chooseBridge(candidates, observer, peer, logger, domain) → (BridgeCandidate, bool) that ranks the candidate slice and picks one (or zero) winner. Lives in ../../../internal/mesh/peers/relay.go; no I/O, no mutation. |
| Silent unreachability | The composed signal observed by a downstream consumer when BOTH the direct endpoint observation has gone stale (empty endpoint string) AND no fallback bridge is available (empty fallback_endpoint string) on the latest peer_endpoint_changed payload. The dashboard's reachability badge and the future mesh-map render the degraded state directly without inferring it. |
| Default bridge relay port | The canonical UDP port the chooser stamps onto a BridgeCandidate.Endpoint when no per-bridge override has been observed. Currently the WireGuard well-known port 51820; a future Bridge Orchestrator story owns the per-bridge listener-port lifecycle and replaces the constant with a per-row value carried through ListBridgeNodesByDomain. |
Schema
The relay-fallback surface owns one new table in the plexsphere schema. The authoritative source is the migration at ../../../internal/platform/db/migrations/0030_peer_relay_assignment.sql; the column meanings below paraphrase that file's DECISION blocks. The migration's Down block deliberately refuses the downgrade with SQLSTATE 0A000 because the assignment history is forensic evidence of past relay decisions.
plexsphere.peer_relay_assignment — relay-assignment aggregate
| Column | Type | Meaning |
|---|---|---|
id | uuid PRIMARY KEY | Surrogate id (UUIDv7) the application layer mints per fresh assignment row. |
domain_id | uuid NOT NULL | FK to plexsphere.domains(id) ON DELETE RESTRICT. Carries the Domain isolation boundary the chooser scopes every read against. |
peer_id | uuid NOT NULL | FK to plexsphere.peer(id) ON DELETE RESTRICT. The half of the natural key the partial unique index keys on. |
bridge_node_id | uuid NOT NULL | FK to plexsphere.nodes(id) ON DELETE RESTRICT. Names the bridge Node serving as the peer's current fallback. |
fallback_endpoint | inet NOT NULL | Address portion of the bridge's mesh-relay listener endpoint, stored as a Postgres inet so address-family is validated at the storage boundary. |
fallback_endpoint_port | integer NOT NULL CHECK (fallback_endpoint_port BETWEEN 1 AND 65535) | Port portion of the relay listener endpoint, paired with fallback_endpoint. |
assigned_at | timestamptz NOT NULL DEFAULT now() | Wall-clock instant the assignment landed. |
retired_at | timestamptz NULL CHECK (retired_at IS NULL OR retired_at >= assigned_at) | Soft-retirement tombstone. NULL means "live"; non-NULL means "no longer serving" but kept for forensic replay. |
The schema carries two indexes:
peer_relay_assignment_live_uq UNIQUE (peer_id) WHERE retired_at IS NULL— at most one live assignment per peer aggregate. Mirrors thepsk_live_peer_uqposture onplexsphere.psk; the assignment row is the same write-once / soft-retire shape as the wrapped PSK row.peer_relay_assignment_bridge_idx (bridge_node_id) WHERE retired_at IS NULL— the assigner's per-bridge range scan when a bridge transitions to unreachable. Partial so the largest cohort (retired rows kept for forensic replay) stays out of the index.
Bridge-selection heuristic
The chooser is a pure function. Given a slice of BridgeCandidate rows projected by ListBridgeNodesByDomain plus the observer's and the peer's Node ids, the chooser applies the following rules in order:
- Self-exclusion — any candidate whose
NodeIDequals the observer's or the peer's Node id is filtered before ranking. A bridge cannot relay its own traffic. - Reachability filter — any candidate whose
ReachabilityStateis neitherhealthynorstale(the canonicalunreachablestring maps here, as does any future state value the chooser has not learnt about) is dropped on the floor. - Freshness preference — healthy candidates beat stale candidates outright. A stale candidate is admitted only when no healthy candidate survives the prior filters; on that branch the chooser emits a WARN log naming the Domain so an operator can correlate a relay-quality dip with the specific Domain it affects.
- Deterministic tiebreaker — within the winning cohort (healthy if non-empty, otherwise stale), the lowest ascending
NodeID(compared lexicographically as the canonical UUIDv7 string) wins. Two consecutive runs against the same MVCC view produce a byte-identical pick.
A zero-survivor outcome returns (BridgeCandidate{}, false) — the canonical "first-bridge-pending" or "no-bridges-left" signal. Callers distinguish the two by reading the peer's prior assignment row, not by interpreting the chooser return value.
text
chooseBridge(candidates, observer, peer)
│
▼
┌─────────────────────────────┐
│ Filter: NodeID != observer │
│ Filter: NodeID != peer │
└──────────────┬──────────────┘
▼
┌─────────────────────────────┐
│ Filter: ReachabilityState │
│ in {healthy, stale} │
└──────────────┬──────────────┘
healthy │ │ stale-only
▼ ▼
lowest NodeID │ │ lowest NodeID + WARN
▼ ▼
winner winner
│
▼
(BridgeCandidate, true)Manager command — AssignRelay
Manager.AssignRelay(ctx, AssignRelayCmd) → (AssignRelayResult, error) is the sole command-side entry point for relay assignment. Inside a single Store.InTx callback it:
- Reads the candidate slice via
ListBridgeNodesByDomain. - Calls
chooseBridgeto pick a winner (or none). - Reads the peer's current live assignment row via
GetActivePeerRelayAssignment. - Branches on the (proposed, live) pair:
- proposed == live, live is non-stale — no-op. Zero events emitted, zero rows written. The
EmittedEventflag on the result isfalse. - proposed exists, differs from live (or no live row) — soft-retire any prior live row, upsert the fresh row, append one outbox event of the caller-supplied
EmitAsevent type.EmittedEventistrue. - no proposed candidate, live row exists — soft-retire the prior row, emit one outbox event with an empty
fallback_endpoint, surfaceErrRelayBridgeUnavailableso the caller dashboards the "lost the last bridge" state. - no proposed candidate, no live row — for the registration emitter (
EmitAs = peer_registered), still emit one event so registration always lands exactly onepeer_registered. For the endpoint-change emitter (EmitAs = peer_endpoint_changed), the call is a zero-event no-op.
- proposed == live, live is non-stale — no-op. Zero events emitted, zero rows written. The
The hot-path callers are Manager.Register (registration-time assignment) and Manager.UpdateEndpoint (re-evaluation on each endpoint observation); both invoke assignRelayInTx inside their own existing transaction so the assignment row, any retirement stamp, and the outbox event all land atomically.
Reactive sweep — RelayAssigner.Sweep
RelayAssigner.Sweep(ctx, bridgeNodeID, domainID) → (processed, rotated, error) is the cold-path entry the L4 dispatcher invokes when a bridge transitions to unreachable. The sweep:
- Paginates
peer_relay_assignment_bridge_idxfor every live row naming the bridge, bounded by the configurable batch-size cap (default 256). - For each row, calls
Manager.AssignRelaywithEmitAs = peer_endpoint_changedso the per-peer rotation is transactional and the bus observes one event per affected peer. - Returns
(processed, rotated)so the dispatcher can decide whether to fire a follow-up tick (the contract: "drain untilprocessed < batchSize").
The sweeper carries single-flight semantics keyed on bridgeNodeID — a second Sweep for the same bridge while the first is still running is an idempotent no-op (zero processed, zero rotated, nil error). Two reachability events for the same bridge coalesce into a single recomputation pass without standing up a per-bridge mutex in the dispatcher itself.
A two-counter Prometheus port (IncProcessed, IncRotated) lets the production binary expose peers_relay_assigner_processed_total and peers_relay_assigner_pending metrics; the unit tests inject a recording fake so the assertion budget stays free of a metrics registry.
text
NodeReachabilityChanged
│
▼
dispatcher coalesces by bridgeNodeID
│
▼
RelayAssigner.Sweep(bridgeID, domainID)
│
┌────────────────────┴─────────────────────┐
▼ ▼
ListPeerRelayAssignmentsByBridge single-flight guard
│ │
▼ ▼
for each row: second Sweep for same bridge
Manager.AssignRelay(peer, ENDPOINT) short-circuits to (0,0,nil)
│
▼
retire prior + upsert fresh + emit peer_endpoint_changedWire-shape extensions
The relay-fallback surface DOES NOT add a new outbox event type. It extends the two existing peer-aggregate event payloads with an additive optional field. The closed-set AST gate enforced by ../../../tests/workspace/peers_event_type_set_test.go continues to admit exactly the six peer-side event-type literals (peer_registered, peer_psk_assigned, peer_deregistered, peer_endpoint_changed, rotate_keys, peer_key_rotated) — no seventh literal is introduced.
peer_registered payload
jsonc
{
"event_id": "...",
"occurred_at": "...",
"peer_id": "...",
"domain_id": "...",
"node_id": "...",
"fallback_endpoint": "10.77.0.50:51820" // optional, omitempty
}peer_endpoint_changed payload
jsonc
{
"event_id": "...",
"occurred_at": "...",
"peer_id": "...",
"domain_id": "...",
"node_id": "...",
"endpoint": "203.0.113.10:51820",
"endpoint_reported_at": "...",
"previous_endpoint": "203.0.113.9:51820",
"fallback_endpoint": "10.77.0.50:51820" // optional, omitempty
}The omitempty choice is load-bearing. Existing JetStream consumers and the chainsaw fixture under ../../../tests/e2e/mesh/peers-events/ continue to parse the envelope without modification because an absent field deserialises to the zero value. Two empty signals on the same payload (empty endpoint AND empty fallback_endpoint) together communicate the silent-unreachability state.
/v1/register and /v1/nodes/{id}/state
Both wire surfaces gain an optional fallback_endpoint string field on every RegisterPeer / NodeStatePeer entry. The OpenAPI schemas declare the field as optional string; the JSON wire form drops the key when the value is empty. The corresponding Go projections on peers.NodePeer, registration.Peer, and state.Peer carry a netip.AddrPort value-object — a zero AddrPort is the canonical "no fallback assigned" signal. The composition-root adapters at ../../../cmd/plexsphere/state_factory_prod.go and ../../../cmd/plexsphere/registration_factory_prod.go LEFT JOIN plexsphere.peer_relay_assignment WHERE retired_at IS NULL and project the (fallback_endpoint, fallback_endpoint_port) columns into the per-peer projection.
Lifecycle states
The assignment row's lifecycle is symmetric to the wrapped-PSK row's: write-once, then either churn-by-soft-retirement or remain live indefinitely.
text
(no row)
│
│ Manager.AssignRelay (proposed candidate available)
▼
live row (retired_at IS NULL)
│
│ bridge transitions unreachable
│ OR
│ chooser picks a different bridge
│
▼
retired row (retired_at IS NOT NULL)
│
│ Manager.AssignRelay (next pick)
▼
fresh live row (new id, new bridge or same bridge with fresh assigned_at)The partial unique index peer_relay_assignment_live_uq keeps at most one live row per peer at any moment; the retired cohort is unbounded but cheap because the partial index lives only on the live cohort.
Refusal codes
The Manager.AssignRelay body and the underlying repository return the following canonical sentinels. Callers MUST use errors.Is(...) against the package-level variables exported from ../../../internal/mesh/peers/errors.go; the string forms are diagnostic only and may evolve.
| Sentinel | Meaning | When surfaced |
|---|---|---|
ErrPeerAlreadyDeregistered | The peer's soft-delete tombstone was stamped before the assignment write landed. | Upsert under a deregistered peer; the FK preserves the integrity even when the application layer races. |
ErrRelayBridgeUnavailable | A live assignment existed but no replacement candidate is available. | Sweep observing a bridge that has gone unreachable and finding zero healthy or stale bridges in the Domain; the call still soft-retires the prior row and emits a peer_endpoint_changed with an empty fallback_endpoint before surfacing the sentinel. |
ErrNilRelayAssignerCollaborator | The RelayAssigner constructor was handed a nil Store or Manager. | Misconfigured composition root; the production boot path treats this as a fatal startup error. |
Cross-references
./peers.md— Peer aggregate and the closed event-type set the additivefallback_endpointfield extends../endpoints.md— NAT-endpoint intake surface and the empty-endpointtombstone signal that composes with emptyfallback_endpointto produce the silent-unreachability composite../reachability.md— per-Node reachability projection the chooser ranks on../sse.md— SSE relay's pass-through event-type behaviour; thefallback_endpointfield travels verbatim across the relay boundary.../bridge/events.md— the bridge events surface that dispatches the per-bridge relay listen port onto thebridge_config_updatedmesh SSE wire literal.
Deferred surfaces
Two seams in this surface are deliberately under-specified and owned by later stories.
Latency-aware ranking
The chooser ignores BridgeCandidate.LatencyMillis today; the field is a *int32 carried through the projection so a future ranking adapter can populate it without re-shaping the candidate type. The latency telemetry itself depends on the observability ingest path that is not yet in place — the plexctl mesh / peer command families are the operator-facing entry point that names the residual follow-up. The Phase-2 operator surfaces (the plexctl peer / mesh families) are now landed and project the directed-edge mode and per-Node reachability projection; the latency-aware ranking follow-up sits on top of that operator contract and waits on the per-edge handshake-latency telemetry the observability ingest path will produce. Until that telemetry lands the existing healthy / stale / ascending-id heuristic is the contract.
Per-bridge relay-listener port
The chooser's fallback endpoint defaults its port to the canonical WireGuard port via the package-level constant DefaultBridgeRelayPort in ../../../internal/mesh/peers/relay.go. The Bridge Orchestrator surface that materialises a per-bridge relay listen port has now landed: the EffectiveConfigBuilder folds each bridge's BridgeRelay configuration together with the per-peer relay-assignment rows into the relay block of the effective config it projects, and dispatches that block onto the bridge_config_updated mesh SSE wire literal — see ../bridge/events.md for the closed-outbox- vs-wire-literal split and the per-Node fan-out. The chooser itself is unchanged: this surface still stamps DefaultBridgeRelayPort onto a BridgeCandidate.Endpoint for which ListBridgeNodesByDomain carries no per-row override, and the bridge effective config is the channel that delivers the per-bridge listen port to a hosted Node.