Skip to content

Per-peer NAT endpoint intake

This document is the authoritative bounded-context reference for the NAT endpoint intake surface that ships under ../../../internal/mesh/peers/ and is exposed through PUT /v1/nodes/{id}/endpoint. It covers the ubiquitous language, the wire envelopes, the per-Peer endpoint snapshot schema, the per-Domain endpoint TTL policy, the stale-endpoint sweeper, the peer_endpoint_changed outbox event, the clock-skew admission rule, the six-gate handler ordering, the audit contract, the OpenAPI cross-reference, and the threat model.

The endpoint intake surface is the plexd → plexsphere NAT observation channel: a per-Node, NSK-authenticated PUT endpoint that stamps the agent's observed (host, port) tuple onto the addressed Peer aggregate and emits a peer_endpoint_changed event whenever the tuple differs from the prior observation or transitions out of a stale window. Anything outside that surface — the Signed SSE Event Bus that fans out the event payload, the reachability evaluator that owns the per-Node liveness verdict, the Key & Peer Manager's Register / AssignPSK / Deregister commands, the relay-fallback assignment surface that consumes the endpoint snapshot — is a collaborator the endpoint sub-context emits to or projects from, not a concern of this document. The three surfaces this document pins are the in-process command pipeline (Manager.UpdateEndpoint), the read-only side (Provider.SnapshotEndpointsForDomain), and the periodic stale-endpoint sweeper (EndpointSweeper.Sweep).

Status — partial delivery

The writer-side closure for the endpoint surface is load-bearing in the production binary:

  • Per-Peer aggregate writeManager.UpdateEndpoint resolves the prior endpoint snapshot, gates the observation through the clock-skew and per-Domain TTL windows, runs the UpdatePeerEndpoint UPDATE inside a single Store.InTx, and conditionally appends a peer_endpoint_changed outbox event in the same transaction.
  • Stale-endpoint sweeperEndpointSweeper.Sweep paginates the peer_endpoint_stale_idx partial index by per-Domain TTL, stamps last_endpoint_stale_at on each tombstoned row, and appends a peer_endpoint_changed event with an empty endpoint string per row. The same Store.InTx boundary guards the batch.
  • HTTP intakePUT /v1/nodes/{id}/endpoint runs the six-gate ordering (NSK auth, path-id, body cap, body decode, clock-skew, AddrPort parse) before invoking the recorder. The endpoint_dispatch.go fail-closed gate returns 501 endpoint_not_provisioned until every load-bearing port is wired.

One downstream surface remains deferred: the SSE relay drain for peer_endpoint_changed that surfaces the outbox row on GET /v1/nodes/{id}/events, tracked in ../../architecture/mesh-event-bus-roadmap.md. The outbox row is durable on every transition today; the wire envelope reaches a subscriber once the relay drain lands.

Load-bearing today (each detailed in its own section below): the Manager.UpdateEndpoint command path, the EndpointSweeper goroutine, the Provider.SnapshotEndpointsForDomain read path, the handler with its six refusal codes, the 0029_peer_endpoint.sql migration (four Peer columns, one Domain column, the peer_endpoint_stale_idx partial index), the MaxEndpointSkew golden constant, the per-Domain EndpointPolicy value object with its 30s floor and 1h ceiling, and the OpenAPI surface. The exact file references are in Cross-references.

Downstream consumers (relay-fallback assignment, audit log, dashboard endpoint badges) MUST treat the deferred SSE drain as scaffolding: every wire shape is stable from day one, and the peer_endpoint_changed outbox row is the single dependency they consume the observation stream from rather than the read-side projection.

Cross-references

  • ./peers.md — the Key and Peer Manager that owns the Peer aggregate the endpoint snapshot rides on. The endpoint surface extends the same aggregate with the (last_endpoint, last_endpoint_port, last_endpoint_reported_at, last_endpoint_stale_at) quadruple; the peer_endpoint_changed event sits inside the closed six-event peer-side set (peer_registered, peer_psk_assigned, peer_deregistered, peer_endpoint_changed, rotate_keys, peer_key_rotated) documented there.
  • ./relay-fallback.md — the relay-fallback assignment surface that composes the endpoint snapshot with the per-peer bridge-node assignment. The fallback_endpoint field on peer_endpoint_changed (and on peer_registered) is the additive optional dial address plexd uses when the direct WireGuard handshake times out; the silent-unreachability signal — the composite where endpoint == "" AND fallback_endpoint == "" — is documented there. The endpoint-stale-tombstone path emitted by the EndpointSweeper and the relay-assigner's bridge-churn sweep are independent observers of the same per-peer state, so a silent-unreachability transition can be reconstructed from either surface's outbox stream.
  • ./reachability.md — the per-Node liveness sibling. The endpoint intake and the heartbeat intake share the NSK middleware, the MaxEndpointSkew / MaxHeartbeatSkew 60-second bound, and the path-id defense-in-depth gate; the two surfaces are independent observation channels on the same per-Node admission envelope.
  • ./sse.md — the Signed SSE Event Bus the peer_endpoint_changed outbox row travels onto once the relay drain lands. The wire-level Type discriminator the bus pins (node_state_updated) is the envelope-shape consumer of the endpoint event.
  • ../../architecture/mesh-event-bus-roadmap.md — the deferred-wiring tracker that records the SSE relay's production-wiring gap the endpoint outbox-emitter currently rides on.
  • ../../how-to/mesh/operate-peers.md — operator how-to for inspecting the live plexsphere.peer rows, the per-Domain endpoint_ttl knob, and the sweeper metrics. The same operator guide that covers Peer registration also covers the endpoint observation snapshot.
  • ../../../api/openapi/plexsphere-v1.yaml — OpenAPI spec for PUT /v1/nodes/{id}/endpoint, including the EndpointRequest, EndpointResponse, Problem, PermissionDenied schemas and the six refusal codes.
  • ../../../internal/mesh/peers/endpoint.goManager.UpdateEndpoint, the UpdateEndpointCmd / UpdateEndpointResult shapes, the renderEndpoint canonical "host:port" formatter, and the DomainPolicyLookup port.
  • ../../../internal/mesh/peers/endpoint_sweeper.goEndpointSweeper, the tick loop, the per-tick batch cap, and the EndpointSweeperCounter metrics port.
  • ../../../internal/mesh/peers/provider.goProvider.SnapshotEndpointsForDomain and the NodePeerEndpoint read-side projection.
  • ../../../internal/identity/tenancy/domain.go — the per-Domain EndpointPolicy value object, the endpointTTLMinInterval / endpointTTLMaxInterval floor/ceiling constants, and DefaultEndpointPolicy.
  • ../../../internal/identity/tenancy/events/events.go — the PeerEndpointChanged payload struct, the TypePeerEndpointChanged = "peer_endpoint_changed" constant, and the NewPeerEndpointChanged constructor.
  • ../../../internal/platform/clock/skew.go — the MaxEndpointSkew constant and the WithinEndpointSkew helper.
  • ../../../internal/platform/db/migrations/0029_peer_endpoint.sql — the schema migration adding the four Peer columns, the per-Domain endpoint_ttl column, and the peer_endpoint_stale_idx partial index.
  • ../../../internal/transport/http/v1/handlers/endpoint.go — the six-gate handler body for putNodeEndpoint.
  • ../../../internal/transport/http/v1/handlers/endpoint_deps.go — the EndpointRecorder / PeerLookup ports, the problem-code constants, the audit-relation constants, and the EndpointMaxBodyBytes 4 KiB cap.
  • ../../../internal/transport/http/v1/handlers/endpoint_dispatch.go — the fail-closed gate that returns 501 endpoint_not_provisioned until every load-bearing port is wired.
  • ../../../cmd/plexsphere/peers_factory_prod.go — the composition root that wires Manager.UpdateEndpoint behind the EndpointRecorder adapter, the pool-bound PeerLookup adapter, and the EndpointSweeper goroutine.

Ubiquitous language

The terms below travel verbatim across the Go code, the SQL schema, the OpenAPI contract, the outbox event payloads, the audit log, the structured-log attributes, and the per-Domain audit trail. Keep the names stable in error messages and operator-facing tooling.

TermDefinition
Endpoint observationA (host, port) tuple plexd reports via PUT /v1/nodes/{id}/endpoint. The wire envelope is EndpointRequest{endpoint, nat_type, reported_at}. The ingestion stamps the (host, port, reported_at) triple on the addressed Peer row.
EndpointThe publicly-observed host:port wire form. Stored as the typed pair (last_endpoint inet, last_endpoint_port int) on plexsphere.peer; the wire string is composed at the boundary by renderEndpoint, which brackets IPv6 hosts via netip.AddrPort.String() semantics.
reported_atThe wall-clock instant the agent attached to its PUT body. Compared against the server's now via clock.WithinEndpointSkew|delta| > MaxEndpointSkew rejects with 400 endpoint_clock_skew. Also compared against now() - TTL to refuse observations older than the per-Domain freshness window.
StaleAn endpoint observation older than now() - EndpointPolicy.TTL for its Domain. The sweeper stamps last_endpoint_stale_at on the Peer row and emits a peer_endpoint_changed event with an empty endpoint string so downstream consumers know to fall back to the relay assignment.
EndpointPolicyThe per-Domain value object ({TTL}) on the Tenancy aggregate that drives the freshness window. Persisted as the endpoint_ttl INTERVAL column on plexsphere.domains.
EndpointPolicy.TTLThe per-Domain endpoint freshness window. Floor ≥ 30s (so a heartbeat-cadence agent always refreshes inside the window), ceiling ≤ 1h (mirrors the reachability policy ceilings). Default 5m.
MaxEndpointSkewThe inclusive upper bound on |client_reported_at − server_now| accepted by the endpoint handler. Pinned at 60 * time.Second in ../../../internal/platform/clock/skew.go against a golden literal; widening is a security regression, not a tuning knob.
NSK (Node Session Key)The Node's per-session bearer credential, presented as Authorization: Bearer nsk_<env>_<...>. The NSKAuthenticator middleware resolves it to a tenancy.Node and refuses path-id ≠ resolved-Node-id with 403 node_id_mismatch. Shared with the heartbeat surface.
Manager.UpdateEndpointThe sole command-side entry point that stamps a fresh observation onto the Peer aggregate and conditionally emits a peer_endpoint_changed event. Lives in ../../../internal/mesh/peers/endpoint.go.
EndpointSweeperThe periodic background ticker that paginates the peer_endpoint_stale_idx partial index, stamps last_endpoint_stale_at on each row past TTL, and emits a tombstone peer_endpoint_changed event per stamped row. Lives in ../../../internal/mesh/peers/endpoint_sweeper.go.
Provider.SnapshotEndpointsForDomainThe read-side projection that returns []NodePeerEndpoint for every live peer in a Domain whose observation is fresh (last_endpoint_stale_at IS NULL and last_endpoint IS NOT NULL). The relay-fallback assignment surface consumes the snapshot.
DomainPolicyLookupThe narrow read seam Manager.UpdateEndpoint consults to resolve the per-Domain EndpointPolicy. Production wires a sqlc-bound adapter against plexsphere.domains.endpoint_ttl; a nil at construction falls back to a stub returning tenancy.DefaultEndpointPolicy.
EndpointRecorderThe transport-layer port the handler depends on to persist an observation. Production binds it to Manager.UpdateEndpoint via a small adapter in cmd/plexsphere; tests substitute a recording fake.
PeerLookupThe transport-layer read-only port that translates the path-id (a Node id) to the live Peer row's id. Held as a separate port from EndpointRecorder so the writer fake stays trivial; the production adapter is a raw pgx query against plexsphere.peer.
AuditRelationNodeEndpointIngestThe canonical Relation string the endpoint handler stamps on every ingestion-phase audit row: node_endpoint.record. Stable so audit consumers can filter by relation without parsing free-form text.
AuditRelationNodeEndpointPathGateThe canonical Relation string the endpoint handler stamps when its defense-in-depth path-id gate fires: node_endpoint.path_gate. Distinct from the ingestion relation so dashboards can alert on "middleware was bypassed but handler caught it" without conflating it with ingestion-phase entries.
peer_endpoint_changedThe bare snake_case outbox event literal the Manager and the sweeper emit. One member of the closed six-event peer-side set documented in ./peers.mdpeer_registered, peer_psk_assigned, peer_deregistered, peer_endpoint_changed, rotate_keys, peer_key_rotated — the wire contract pins.

Wire shape — PUT /v1/nodes/{id}/endpoint

The request envelope is the EndpointRequest value type defined in ../../../internal/transport/http/v1/handlers/endpoint.go, mirrored on the OpenAPI spec at ../../../api/openapi/plexsphere-v1.yaml under components.schemas.EndpointRequest.

FieldTypeMeaning
endpointstring ("host:port")The publicly-observed (host, port) tuple. Parsed via netip.ParseAddrPort — an unparseable string, a port outside 1..65535, or an unparseable IPv6 host rejects with 400 endpoint_unparseable.
nat_typestring (cone / restricted / port_restricted / symmetric / unknown)The agent-classified NAT traversal posture. The handler does NOT branch on the value — it is forwarded verbatim for downstream observability.
reported_atRFC 3339 timestampThe agent's local wall-clock at observation time. Compared against the server's now via clock.WithinEndpointSkew|delta| > MaxEndpointSkew rejects with 400 endpoint_clock_skew. Also compared against now() - TTL; an observation older than the per-Domain TTL rejects with the same 400 endpoint_clock_skew code (with an audit reason disambiguating "stale" from "skew").

The success response envelope is the EndpointResponse value type in the same file.

FieldTypeMeaning
accepted_atRFC 3339 timestampThe server-side admission instant the recorder stamped on the row. Matches last_endpoint_reported_at for the row just written.
stale_afterRFC 3339 timestampaccepted_at + EndpointPolicy.TTL. The agent uses this to schedule the next observation before the sweeper would tombstone the row.

The six refusal codes the handler emits (Problem-Details code field):

CodeHTTPTriggerSource file
endpoint_clock_skew400Two distinct semantic flavours share one wire code. Clock-drift flavour|client_reported_at − server_now| > MaxEndpointSkew (60s); the agent's local clock has drifted off server time. Stale-observation flavourreported_at is fresh against the skew window but older than the per-Domain endpoint TTL; the agent's keep-alive cadence has lapsed. Operators disambiguate via the audit row's reason field — the recorder-side stale arm stamps reported_at older than per-Domain endpoint TTL while the handler-side drift arm stamps reported_at outside MaxEndpointSkew window. Agent remediation is the same in both flavours (refresh reported_at from a fresh time.Now() and retry), so the two share one wire code; widening the wire surface would add a code agents cannot act on differently.endpoint.go step 5; endpoint.go handleEndpointRecordError for the recorder-side stale path
endpoint_unparseable400netip.ParseAddrPort(req.Endpoint) fails (no colon, non-numeric port, port outside 1..65535, unparseable IPv6); OR the recorder's defense-in-depth aggregate invariant rejects the AddrPort.endpoint.go step 6; endpoint.go handleEndpointRecordError
malformed_endpoint_request400JSON decode fails (invalid JSON, unknown field, missing required field).endpoint.go step 4
endpoint_body_too_large413Body exceeds EndpointMaxBodyBytes (4 KiB) — caught by http.MaxBytesReader before the JSON decoder ever sees the bytes.endpoint.go step 3
node_id_mismatch403URL path id ≠ resolved NSK Node id (defense-in-depth; the NSK middleware also runs this gate).endpoint.go step 2
endpoint_peer_not_found404PeerLookup.ResolveLivePeer reports the authenticated Node has no live Peer row in any Domain.endpoint.go step 7
endpoint_peer_gone410The recorder returned ErrEndpointPeerAlreadyDeregistered — the Peer row was soft-deleted between the handler's PeerLookup admission and the recorder's UPDATE.endpoint.go handleEndpointRecordError

Two adjacent refusal arms are owned by collaborators rather than the handler body:

  • 401 nsk_revoked — the NSK middleware in ../../../internal/identity/authn/middleware/nsk.go refuses a missing, malformed, or revoked NSK envelope before the handler runs.
  • 501 endpoint_not_provisioned — the endpoint_dispatch.go fail-closed gate fires when ANY of EndpointRecorder, NSKResolver, NodeRepo, PeerLookup is nil at composition time, so a half-wired posture cannot admit traffic past gates that have not yet been wired.

Why 404 and 410 exist as endpoint-specific codes

endpoint_peer_not_found (404) and endpoint_peer_gone (410) are operator-visible response codes that complement the four-code refusal taxonomy quoted in the original task acceptance criteria. They are NOT scaffolding — both arms cover legitimate lifecycle states that the NSK-bearer surface must distinguish so a plexd agent can take the right remediation:

  • 404 endpoint_peer_not_found — the NSK authenticates a Node the schema knows about, but PeerLookup.ResolveLivePeer returns zero rows: the Node has not yet been bound to a Peer aggregate (a fresh enrollment whose Manager.Register flow has not yet landed) or the Peer was drained out of the Domain after the NSK was minted. A 404 here tells the agent its NSK is valid but the Peer side of its lifecycle has not yet caught up — the agent retries on its bootstrap cadence rather than treating the observation as poisoned. Distinguishing this from a 403 node_id_mismatch (cross-Node impersonation) lets a leaked NSK on a missing Peer surface differently from a leaked NSK on a sibling Peer.
  • 410 endpoint_peer_gone — the recorder returned peers.ErrPeerAlreadyDeregistered: the Peer existed when the handler's PeerLookup gate ran but was soft-deleted between the admission read and the recorder's UPDATE. A 410 (rather than a 404 retry) tells the agent the row will never accept further observations — the correct action is to stop reporting against the path-id and either acquire a fresh NSK against a new Peer or exit the loop. The narrowness of the lookup-vs-update race makes 410 the right code shape: the resource was there, then went permanently away.

The OpenAPI surface at ../../../api/openapi/plexsphere-v1.yaml documents both codes alongside the four task-AC codes; they are part of the stable wire contract and the v1 spec will not narrow them without a deprecation cycle.

Six-gate handler ordering

endpoint.go runs the handler body as a strict, cheapest-first ordering. Each gate is cheaper than the next; reversing any pair would either burn a SQL round-trip on a malformed envelope or admit a wire-shape failure past a security boundary.

  1. NSK admission read — the middleware ran upstream; step 1 reads the resolved Node off the request context. A missing Node defensively returns 401 unauthorized so a misconfigured route that mounted the handler without the middleware fails closed.
  2. Path-id gatemiddleware.MatchesPathID(node, id). A mismatch stamps AuditRelationNodeEndpointPathGate + AuditOutcomeNodeIDMismatch (distinct from the ingestion-phase relation so dashboards can split admission-versus-ingestion) and returns 403 node_id_mismatch.
  3. Body caphttp.MaxBytesReader(w, r.Body, EndpointMaxBodyBytes). An oversize body trips a *http.MaxBytesError from the decoder; the handler maps it to 413 endpoint_body_too_large with the (limit, body_size>=) pair in the audit reason.
  4. Body decodejson.NewDecoder(...).DisallowUnknownFields(). A malformed envelope returns 400 malformed_endpoint_request before the clock-skew gate or any persistence read runs.
  5. Clock-skew gateclock.WithinEndpointSkew(req.ReportedAt, now). A drifted or forged reported_at returns 400 endpoint_clock_skew before the AddrPort parse or peer lookup.
  6. AddrPort parsenetip.ParseAddrPort(req.Endpoint). A pure-CPU validation that rejects unparseable strings before any database touch. Returns 400 endpoint_unparseable.

After the six gates pass the handler runs the peer lookup (step 7) and the recorder call (step 8). The recorder owns the transactional surface — every failure inside the Store.InTx boundary surfaces through handleEndpointRecordError's sentinel switch, which maps peers.ErrEndpointObservationStale, peers.ErrEndpointUnparseable, peers.ErrEndpointClockSkew, and peers.ErrPeerAlreadyDeregistered (translated via the peersEndpointRecorderAdapter) onto the wire-stable HTTP codes listed in the refusal table.

DECISION (carried over from endpoint.go): PeerLookup runs AFTER the AddrPort parse gate. Alternative considered: resolve the peer first so a non-existent Node-to-Peer mapping fails fast. REJECTED because the cheaper pure-CPU AddrPort parse rejects a malformed wire envelope without touching the database; reversing the order would burn a SQL round-trip on every malformed request and let an attacker scanning for unparseable endpoints discover Node presence via the relative latency of the two refusal paths.

Schema

The endpoint surface extends the existing Peer aggregate from ./peers.md and the existing Domain aggregate. The authoritative source is the migration at ../../../internal/platform/db/migrations/0029_peer_endpoint.sql; the column meanings below paraphrase that file's DECISION blocks.

plexsphere.peer — endpoint observation columns

ColumnTypeMeaning
last_endpointinet NULLABLEThe publicly-observed IP address from the most recent admitted observation. NULL means "no observation yet". The schema uses Postgres' native inet type rather than text so an operator can run SELECT * FROM plexsphere.peer WHERE last_endpoint << inet '10.0.0.0/8' for NAT-outage triage.
last_endpoint_portinteger NULLABLE CHECK (... BETWEEN 1 AND 65535)The publicly-observed transport port. NULL means "no observation yet". The CHECK pins the RFC 6056 port range at the storage boundary.
last_endpoint_reported_attimestamptz NULLABLEThe agent's reported_at from the admitted observation. The sweeper compares now() - last_endpoint_reported_at against the per-Domain TTL.
last_endpoint_stale_attimestamptz NULLABLEThe sweeper's tombstone timestamp. NULL means "still fresh (or never observed)"; non-NULL means "the sweeper stamped this row as stale and emitted a tombstone event". An UpdateEndpoint for a row with non-NULL last_endpoint_stale_at clears the stamp on the next successful write (un-stale recovery).

Index:

  • peer_endpoint_stale_idx ON plexsphere.peer (last_endpoint_reported_at) WHERE deregistered_at IS NULL AND last_endpoint_stale_at IS NULL — backs the sweeper's range scan over rows still in the "watchable" cohort. Deregistered rows and already-stale rows are excluded by the partial predicate so the index stays small even as the tenancy grows.

plexsphere.domains — per-Domain endpoint TTL

ColumnTypeMeaning
endpoint_ttlinterval NOT NULL DEFAULT '5 minutes'The per-Domain endpoint freshness window. The schema-side default mirrors tenancy.DefaultEndpointPolicy.TTL; the application-layer EndpointPolicy value object enforces the 30s floor and 1h ceiling so a hand-edited row that violates the invariants fails fast at Hydrate time.

The migration's Down block REFUSES the downgrade with SQLSTATE 0A000 (feature_not_supported). The four endpoint observation columns and the per-Domain endpoint_ttl carry the forensic evidence chain that links every NAT rebind to the wall-clock window the relay fallback was assigned in; dropping them is a regulatory retention regression. This Down stance mirrors 0008_node_secret_keys.sql, 0010_node_reachability.sql, 0011_audit_log.sql, and 0012_peers.sql — the four prior migrations that protect security- and audit-critical material with the same RAISE 0A000 refusal.

TTL policy & per-Domain knob

The per-Domain EndpointPolicy value object on the Tenancy aggregate (../../../internal/identity/tenancy/domain.go) carries one threshold. Default and invariant gates:

ThresholdDefaultFloorCeilingInvariant
TTL5m≥ 30s≤ 1hHeartbeat-cadence agent (30s default) must be able to refresh inside the window

The default is pinned by DefaultEndpointPolicy in domain.go:

go
var DefaultEndpointPolicy = EndpointPolicy{
    TTL: 5 * time.Minute,
}

A fully-zero EndpointPolicy{} on NewDomain / Hydrate is treated as the DEFAULT signal and replaced with DefaultEndpointPolicy at build time. resolveEndpointPolicy rejects an out-of-range TTL via errInvariant("", "Domain", "EndpointPolicy.TTL", "...") so a hand-edited Domain row that violates the floor or ceiling fails at Hydrate.

DECISION (carried over from domain.go): the policy is a single field rather than a tuple. Alternative considered: a sentinel "use default" boolean alongside TTL. REJECTED because a zero duration is already the well-formed signal in Go and the downstream INTERVAL column carries its own NOT NULL DEFAULT, so a boolean flag would be redundant scaffolding. The IsZero / default-injection contract mirrors the ReachabilityPolicy pattern so a reader of buildDomain sees the same shape on both policies.

The persistence layer mirrors the value object on plexsphere.domains.endpoint_ttl as an interval column defaulted to '5 minutes' at the schema level (0029_peer_endpoint.sql). The Tenancy aggregate's Hydrate boundary re-validates the bound on load.

Stale-endpoint sweeper

The EndpointSweeper in ../../../internal/mesh/peers/endpoint_sweeper.go is the periodic background ticker that drives the tombstone path — the only writer of last_endpoint_stale_at. The composition root in ../../../cmd/plexsphere/peers_factory_prod.go wires it alongside the Manager; an unwired (nil) Store is rejected at construction with ErrNilEndpointSweeperCollaborator.

Tick loop

Start(ctx) drives Sweep(ctx) on a fixed-cadence ticker. The default interval is endpointSweeperDefaultInterval = 1 * time.Minute (EndpointSweeperConfig.Interval overrides). One minute matches the partial-index range scan's amortised cost and keeps the sweeper from staging a torrent of tombstone events on a healthy fleet. Per-tick errors are logged but never abort the loop — a transient outage of the Postgres pool must not silently park the sweeper.

Per-tick procedure

Each Sweep(ctx) call:

  1. Reads now = clock() via the injected Clock port (a virtual clock in tests so a tick advances "now" without sleeping the test goroutine).
  2. Opens a single Store.InTx — the whole batch lands as one transaction so a partial failure rolls back the entire batch (no half-stamped rows, no orphan tombstone events).
  3. Calls repo.ListStalePeerEndpoints(ctx, now, batchSize) — the query paginates the peer_endpoint_stale_idx partial index. The endpointSweeperDefaultBatchSize = 256 per-tick fan-out balances throughput (one tick clears 256 staleness transitions) against per-tx wall-clock (256 outbox INSERTs land before the tx commits).
  4. For each returned row:
  • Calls repo.StampPeerEndpointStale(ctx, peerID, now) to write last_endpoint_stale_at = now.
  • Constructs a tombstone PeerEndpointChanged event with Endpoint = "" (the stale-tombstone signal), PreviousEndpoint = renderEndpoint(prior), and EndpointReportedAt = row.LastEndpointReportedAt.
  • Calls repo.AppendOutboxEvent(...) to enqueue the event in the same tx.
  1. Calls counter.IncProcessed(len(rows)) and counter.IncStaleStamped(tombstoned) for observability.

Partial-index paging

The peer_endpoint_stale_idx partial b-tree on (last_endpoint_reported_at) WHERE deregistered_at IS NULL AND last_endpoint_stale_at IS NULL keeps the working set small: deregistered Peers and already-stale rows are excluded, so the sweeper's range scan never has to skip them on every tick. Alternatives considered (a plain b-tree over the full peer table; a sibling plexsphere.peer_endpoint_observation history table) are documented in the migration's DECISION block — both REJECTED for the indexing cost and the redundancy with the outbox event log.

Single-writer posture

The DECISION block at the top of endpoint_sweeper.go records why the sweeper is the SINGLE WRITER of last_endpoint_stale_at — the intake handler never flips the tombstone bit; only the un-stale recovery side of Manager.UpdateEndpoint clears it. The trade-off mirrors the reachability evaluator's read-path-rejector posture documented in ./reachability.md: a single writer per row keeps the transactional surface lockless, and the tombstone-vs-fresh state machine has exactly one home.

Event types

The endpoint surface emits exactly one outbox event type. The literal below is the bare snake_case string the relay routes endpoint events on; downstream consumers switch on this exact literal. The literal is one member of the closed six-event peer-side set (peer_registered, peer_psk_assigned, peer_deregistered, peer_endpoint_changed, rotate_keys, peer_key_rotated) documented in ./peers.md; the six together are the full peer-side wire contract.

  • peer_endpoint_changed

peer_endpoint_changed

Constructed by events.NewPeerEndpointChanged; constant TypePeerEndpointChanged = "peer_endpoint_changed". Emitted on four paths:

  • Manager.UpdateEndpoint on the first-ever observation for the Peer (prior.LastEndpoint not valid).
  • Manager.UpdateEndpoint on every change of the (endpoint, port) tuple from the prior observation.
  • Manager.UpdateEndpoint on the un-stale recovery edge — the prior row had last_endpoint_stale_at non-zero and the new observation cleared it.
  • EndpointSweeper.Sweep on every tombstone — the per-row last_endpoint_stale_at stamp is paired 1:1 with a peer_endpoint_changed event whose Endpoint is the empty string.

A same-(endpoint, port) refresh on a row that was NOT stale before the write is deliberately silent — the UPDATE lands so last_endpoint_reported_at stays fresh against the sweeper's TTL math, but no outbox row is enqueued. Downstream consumers do not need a wake-up for an identical refresh.

JSON payload shape:

json
{
  "event_id": "<uuidv7>",
  "occurred_at": "<RFC3339Nano UTC>",
  "peer_id": "<uuidv7>",
  "domain_id": "<uuidv7>",
  "node_id": "<uuidv7>",
  "endpoint": "<host:port or empty string>",
  "endpoint_reported_at": "<RFC3339Nano UTC>",
  "previous_endpoint": "<host:port or empty string>",
  "fallback_endpoint": "<host:port, omitted when no bridge candidate is available>"
}

The three empty-string variants the constructor explicitly allows:

  • endpoint == "" is the stale-tombstone signal emitted by the sweeper when an observation passes TTL without refresh.
  • previous_endpoint == "" is the first-observation signal — there is no prior endpoint to carry through.
  • fallback_endpoint == "" is the no-bridge-candidate signal where the relay-assigner could not pick a healthy bridge for this peer. Dropped from the JSON wire-form via the omitempty tag.

Validation invariants on the constructor: peer_id, domain_id, and node_id must all be non-zero; endpoint_reported_at must be set — a zero observation timestamp would silently desync the bus from the SQL row. The endpoint and previous strings MAY be empty.

The payload carries an additive optional fallback_endpoint string — the dial address plexd uses when the direct WireGuard handshake times out, sourced from the bridge Node the relay- assigner chose for this peer. The field is dropped from the JSON wire-form by the omitempty tag when no bridge candidate is available, so a legacy consumer that pre-dates the relay-fallback surface observes the original 8-field payload shape byte-for-byte. The two empty-string fields compose to form the silent-unreachability signal: endpoint == "" AND fallback_endpoint == "" means the direct path has gone stale AND no relay-fallback is available — the plexd agent has nowhere left to dial. The relay-assigner that owns the fallback_endpoint field and the rules for composing it with the endpoint snapshot are documented in ./relay-fallback.md.

Read-side projection

Provider.SnapshotEndpointsForDomain in ../../../internal/mesh/peers/provider.go returns []NodePeerEndpoint — the read-side projection of every live Peer in a Domain that carries a fresh observation. Stale rows (last_endpoint_stale_at IS NOT NULL) and first-observation-pending rows (last_endpoint IS NULL) are filtered out at the SQL boundary so the slice always carries an actionable observation set.

FieldTypeMeaning
NodeIDtenancy.IDThe peer's anchored Node id. Mirrors NodePeer.NodeID from the bootstrap snapshot so a relay-fallback consumer can join the two projections by Node id.
LastEndpointnetip.AddrThe NAT-observed IP.
LastEndpointPortintThe NAT-observed transport port. Pinned to 1..65535 by the SQL CHECK.
LastEndpointReportedAttime.TimeThe agent-reported observation instant. Lets the consumer prefer the freshest observation when two snapshots arrive close together.

DECISION (carried over from provider.go): NodePeerEndpoint is a distinct peers-package value object rather than a reuse of NodePeer or registration.Peer. The endpoint projection's shape (no MeshIP, no PublicKey, has reported_at) is orthogonal to the bootstrap projection; folding both onto one type would force the bootstrap path to carry endpoint fields it does not consume, and a downstream relay-fallback consumer to carry MeshIP / PublicKey it does not need.

excludeNodeID may be the zero value — a future audit-replay tool that wants the complete snapshot passes the zero value; the relay-fallback assignment surface passes the address of the consuming Node so the agent does not receive its own observation back.

Clock-skew rules

The endpoint handler admits an observation only when |server_now − client_reported_at| ≤ MaxEndpointSkew (60s). The constant is pinned in ../../../internal/platform/clock/skew.go against a golden literal (the test file pins the value at 60 * time.Second so an accidental adjustment surfaces in code review). Widening the bound is a security regression, not a tuning knob — see the file-level DECISION block.

The WithinEndpointSkew(t1, t2) helper is symmetric (callers may pass server and client time in either order) and inclusive at the boundary (delta == MaxEndpointSkew admits). The implementation branches on time.Time.Before to keep the subtraction in the non-negative half of the duration range — the same overflow-safety pattern documented for WithinSkew in ./reachability.md.

DECISION (carried over from skew.go): MaxEndpointSkew is pinned to a literal 60 * time.Second rather than aliased to MaxHeartbeatSkew. The two surfaces share a numeric value today but evolve independently — a future tightening of MaxHeartbeatSkew (because the heartbeat handler ships a real rate limiter) must NOT silently tighten the endpoint window the way a derived const would. Aliasing would hide that the two gates are independent security boundaries.

Manager.UpdateEndpoint runs the skew gate inside its own command body (defense-in-depth — the handler's gate already ran upstream) AND runs a second gate against now() - per-Domain TTL so a reported_at within the skew window but older than the freshness TTL surfaces as ErrEndpointObservationStale rather than landing the write.

Audit contract

Every decision branch in the endpoint surface emits exactly one audit.Entry per layer. The endpoint surface has TWO layers — the NSK middleware (admission) and the handler (ingestion) — so a granted observation produces TWO rows from the operator's perspective: one from the middleware (node_heartbeat.authenticate when the same middleware path-gates the endpoint route) and one with relation = node_endpoint.record, outcome = granted from the handler. Audit consumers therefore SHOULD NOT expect a 1:1 mapping between PUT requests and audit rows; they SHOULD filter by relation to scope to the layer they care about.

A nil AuditSink is tolerated — the slog lines stand in for observability — but the security gates still execute.

The relation strings live in ../../../internal/transport/http/v1/handlers/endpoint_deps.go:

ConstantValueLayer / origin
AuditRelationNodeEndpointIngestnode_endpoint.recordHandler ingestion phase: clock-skew, malformed body, body-too-large, AddrPort parse, peer lookup, recorder failure, granted.
AuditRelationNodeEndpointPathGatenode_endpoint.path_gateHandler defense-in-depth path-id gate. Distinct relation so dashboards can alert on "middleware was bypassed but handler caught it" without conflating it with ingestion-phase entries.

The closed AuditOutcome set is enumerated in ../../../internal/transport/http/v1/handlers/events_deps.go. The endpoint surface uses six distinct values so audit consumers triage clock-spoofing from malformed-body from invariant-violation from middleware-bypass from internal-error WITHOUT parsing free-form reason strings:

ValueEmitter / cause
grantedEvery layer on success.
node_id_mismatchHandler path-id gate (defense-in-depth).
clock_skewHandler clock-skew gate (reported_at outside MaxEndpointSkew); recorder-side ErrEndpointObservationStale (older than per-Domain TTL); recorder-side ErrEndpointClockSkew (defense-in-depth).
malformed_requestHandler JSON-decode arm; handler AddrPort parse arm.
invariant_violationPeerLookup reports no live Peer row; recorder returned ErrEndpointUnparseable or ErrEndpointPeerAlreadyDeregistered.
internal_errorPeerLookup or recorder returned a non-sentinel error (Postgres outage, adapter bug).
insufficient_relationBody-too-large path uses this outcome (mirrors the heartbeat surface's body-size contract; the handler does not have a dedicated body-size outcome).
BranchSourceRelationOutcomeReason
Endpoint observation admittedendpoint.go step 9node_endpoint.recordgrantedendpoint observation recorded
Path-id mismatch (defense-in-depth)endpoint.go step 2node_endpoint.path_gatenode_id_mismatchresolved Node id does not match URL path id (defense-in-depth)
Body too largeendpoint.go step 4 (http.MaxBytesError arm)node_endpoint.recordinsufficient_relationbody exceeds EndpointMaxBodyBytes: limit=<N> body_size_bytes>=<N+1>
Malformed bodyendpoint.go step 4node_endpoint.recordmalformed_requestmalformed endpoint envelope: <reason>
Clock-skew refusalendpoint.go step 5node_endpoint.recordclock_skewreported_at outside MaxEndpointSkew window
Unparseable endpointendpoint.go step 6node_endpoint.recordmalformed_requestunparseable endpoint: <reason>
Peer lookup infra failureendpoint.go step 7node_endpoint.recordinternal_errorpeer lookup failed: <reason>
No live Peer for Nodeendpoint.go step 7node_endpoint.recordinvariant_violationno live Peer row resolves for the authenticated Node
Recorder clock-skew (defense-in-depth)endpoint.go handleEndpointRecordErrornode_endpoint.recordclock_skewrecorder clock-skew invariant violated: <reason>
Recorder observation stale (older than TTL)endpoint.go handleEndpointRecordErrornode_endpoint.recordclock_skewreported_at older than per-Domain endpoint TTL: <reason>
Recorder endpoint unparseable (defense-in-depth)endpoint.go handleEndpointRecordErrornode_endpoint.recordinvariant_violationrecorder endpoint invariant violated: <reason>
Recorder peer already deregisteredendpoint.go handleEndpointRecordErrornode_endpoint.recordinvariant_violationtarget Peer was deregistered between lookup and update: <reason>
Recorder internal errorendpoint.go handleEndpointRecordErrornode_endpoint.recordinternal_errorendpoint persistence failed: <reason>

OpenAPI surface reference

The wire surface is pinned in ../../../api/openapi/plexsphere-v1.yaml under the mesh tag. The relevant operationIds and schemas:

OpenAPI artefactWhere in the specGo counterpart
PutNodeEndpoint operationpaths./v1/nodes/{id}/endpoint.putHandlers.PutNodeEndpoint in endpoint_dispatch.go (gated entry) and putNodeEndpoint in endpoint.go (body).
EndpointRequest schemacomponents.schemas.EndpointRequesthandlers.EndpointRequest in endpoint.go.
EndpointResponse schemacomponents.schemas.EndpointResponsehandlers.EndpointResponse in endpoint.go.
Problem schemacomponents.schemas.Problem (shared)The six endpoint refusal codes are endpoint_clock_skew, endpoint_unparseable, malformed_endpoint_request, endpoint_body_too_large, endpoint_peer_not_found, endpoint_peer_gone; the 501 fail-closed stub emits endpoint_not_provisioned; the 401 / 403 arms emit nsk_revoked / node_id_mismatch.
PermissionDenied schemacomponents.schemas.PermissionDenied (shared)The 403 node_id_mismatch arm uses the richer PermissionDenied shape that carries reason, relation_path, and correlation_id alongside the bare Problem fields.

The OpenAPI byte-equality drift gate at ../../../tests/workspace/openapi_drift_test.go asserts that the source spec and the embedded mirror under internal/transport/http/v1/handlers/plexsphere-v1.yaml are byte-identical, so a spec drift surfaces at go test time, not at runtime.

Threat model

The endpoint surface mitigates four classes of attack along the plexd → plexsphere ingestion path. Each mitigation is implemented in a single, named place so a reader chasing a security claim does not have to assemble it from multiple files.

  1. Observation-replay via clock spoofing. A compromised plexd with a forged client clock could replay a stale endpoint observation (masking that the agent has fallen silent on a NAT rebind) or pin a fresh-but-future timestamp (so the observation outlives its real reported_at window). Mitigation: the handler admits the observation only when |reported_at − server_now| ≤ MaxEndpointSkew (60s) — pinned at ../../../internal/platform/clock/skew.go — and Manager.UpdateEndpoint additionally rejects any reported_at older than now() - per-Domain TTL as ErrEndpointObservationStale so an observation that drifted inside the skew window but past the freshness window cannot smuggle a stale write past the sweeper. Both gates fire BEFORE the per-Peer aggregate's tx surface.
  2. Cross-Node impersonation. A Node with a valid NSK could attempt to overwrite a sibling's endpoint by PUTting to /v1/nodes/{otherId}/endpoint. Mitigation: the same NSKAuthenticator middleware the heartbeat surface uses (../../../internal/identity/authn/middleware/nsk.go) resolves the bearer to a tenancy.Node and enforces a path-id-equals-resolved-Node-id gate; a mismatch surfaces as 403 node_id_mismatch. The handler runs the same check as defense-in-depth (step 2) so a misconfigured router that mounts the handler without the middleware still rejects, AND the distinct AuditRelationNodeEndpointPathGate relation lets the audit log distinguish "middleware was bypassed but handler caught it" from a normal ingestion-phase entry.
  3. Pathological-body resource exhaustion. A hostile or buggy agent could stream an unbounded payload to tie up the JSON decoder or starve concurrent requests. Mitigation: the handler caps the body at EndpointMaxBodyBytes = 4 KiB via http.MaxBytesReader BEFORE the decoder ever sees the bytes (step 3). An overflow surfaces as 413 endpoint_body_too_large with a stable code so log scrapers can alert on the body-cap floor.
  4. Forensic-audit-trail loss. The migration's Down block REFUSES the downgrade with SQLSTATE 0A000. The four endpoint observation columns on plexsphere.peer and the per-Domain endpoint_ttl column on plexsphere.domains carry the forensic evidence chain that links every NAT rebind to the wall-clock window the relay fallback was assigned in; dropping the columns is a regulatory retention regression that mirrors the NSK invariant 0008 protects. Soft-tombstone on last_endpoint_stale_at (rather than overwriting the prior endpoint) keeps the audit chain intact through every sweeper tick.

Sentinel errors callers branch on (defined in ../../../internal/mesh/peers/errors.go):

SentinelPathCaller branch
ErrEndpointClockSkewManager.UpdateEndpoint clock-skew gateFail-closed; surface 400 endpoint_clock_skew
ErrEndpointObservationStaleManager.UpdateEndpoint TTL gate (reported_at older than now() - TTL)Fail-closed; surface 400 endpoint_clock_skew with a distinct audit reason
ErrEndpointUnparseableManager.UpdateEndpoint AddrPort invariantFail-closed; surface 400 endpoint_unparseable (defense-in-depth against the handler-side parse)
ErrPeerAlreadyDeregisteredManager.UpdateEndpoint after-resolve gate; UPDATE no-op raceSurface 410 endpoint_peer_gone so the agent stops reporting against the gone row
ErrNilEndpointSweeperCollaboratorNewEndpointSweeper with nil StoreBoot-time misconfig alert

What this context is NOT

This context ships the per-Node NAT endpoint intake surface and the per-Peer endpoint observation snapshot. The following adjacent concerns are explicit non-goals — each is owned by a downstream context:

  • Relay-fallback assignment. The relay-fallback orchestrator's choice of fallback bridge per peer pair MAY consume Provider.SnapshotEndpointsForDomain, but the heuristics (latency, load, churn) are owned by the relay-fallback assignment context. The endpoint surface simply maintains the snapshot. The relay-fallback context now also extends the peer_endpoint_changed payload with an additive fallback_endpoint field; the silent-unreachability composite signal occurs when both the endpoint and the fallback_endpoint strings on the same payload are empty — endpoint TTL has expired AND no bridge is available — and downstream consumers MAY render the degraded state directly without further inference. See ./relay-fallback.md for the chooser, the assignment table, and the silent-unreachability composition rule.
  • Endpoint history. This surface persists the per-Peer SNAPSHOT (the latest admitted observation plus the tombstone flag). The observation HISTORY lives in the outbox event log; a future plexsphere.peer_endpoint_observation history table is explicitly out of scope, and the migration's DECISION block records the alternative as rejected.
  • NAT-type classification verification. The nat_type field is forwarded verbatim from the agent's classifier for downstream observability. The control plane does NOT verify the classification; a future telemetry pipeline that cross-checks reported nat_type against observed endpoint behaviour is owned by a separate context.
  • Operator-tunable MaxEndpointSkew. The 60-second bound is a security invariant, not a tuning knob — the file-level DECISION block in internal/platform/clock/skew.go rejects the configuration alternative because widening would silently erode the replay-resistance the threat model relies on. A future per-Domain knob is NOT planned.
  • Rate limiting endpoint observations below the per-Domain TTL. An agent that PUTs faster than the per-Domain TTL cadence is NOT rate-limited here — the surface admits every well-formed observation and the recorder runs the full read/compare/write cycle on every call. Per-Node rate limiting is a future surface.
  • SSE drain of peer_endpoint_changed. The outbox row is durable on every transition today, but the SSE relay drain that surfaces the row on GET /v1/nodes/{id}/events is tracked separately in ../../architecture/mesh-event-bus-roadmap.md. The wire shape is stable today so downstream subscribers can pin against it.
  • Fallback-endpoint advisory in peer_endpoint_changed. The payload deliberately omits a fallback_endpoint field. The relay-fallback assignment surface owns that extension in a later story; widening the JSON shape with an additive optional field is a non-breaking change for downstream consumers when the time comes.