Appearance
Per-node reachability — heartbeat and liveness tracking
This document is the authoritative bounded-context reference for the per-Node reachability sub-context that ships under internal/mesh/reachability and exposes through POST /v1/nodes/{id}/heartbeat and GET /v1/nodes/{id}/reachability. It covers the ubiquitous language, the wire envelopes, the three-state liveness machine, the per-Domain threshold policy, the periodic evaluator, the clock-skew admission rule, the audit contract, the OpenAPI cross-reference, and the threat model.
The reachability surface is the plexd → plexsphere liveness channel: a per-Node, NSK-authenticated heartbeat ingestion endpoint plus a synchronous read of the server's verdict on each Node's liveness state. Anything outside that surface — the underlying Signed SSE Event Bus that fans out tenancy.NodeReachabilityChanged events, the reconciliation pull that exposes a cold-start view, the per-Domain Tenancy aggregate that owns ReachabilityPolicy, the audit hash chain, the Dashboard's ReachabilityBadge — is a collaborator the reachability sub-context projects from or emits to, not a concern of this document. The two surfaces this document pins are the in-process liveness state machine and the wire-shape contracts that surround it.
Status — partial delivery
The writer-side closure for the reachability event bus is load-bearing in the production binary:
- Outbox-backed
tenancy.NodeReachabilityChangedemission — the evaluator'sReachabilityWriterroutes every transition throughNodeRepo.TransitionAndEmit, folding theUpdateReachabilityStatewrite and thetenancy/events.NodeReachabilityChangedoutbox append into one Postgres transaction.NewNodeReachabilityChangedininternal/identity/tenancy/eventsis the canonical event shape. - Chain-bound AuditSink — each transition lands on the per-Domain audit chain as
internal/audit.EntrywithRelation = "node_reachability.transition",Object = "domain:<hex16>",RelationPath = [from, to, reason].
One downstream surface remains deferred: the SSE relay drain for node_state_updated that surfaces the outbox row on GET /v1/nodes/{id}/events, tracked in ../../architecture/mesh-event-bus-roadmap.md. The outbox row is durable on every transition today; the wire envelope reaches a subscriber once the relay drain lands.
Load-bearing today (each detailed in its own section below): the pure state machine (state.go, Apply), the heartbeat handler with its six-step gate ordering and four refusal codes, the production NSK plaintext resolver (heartbeat_nsk_resolver.go), the reachability read handler, the per-Domain ReachabilityPolicy value object and its invariant gates, the periodic evaluator, the 0010_node_reachability.sql migration (six Node columns, three Domain columns, the nodes_stale_idx partial index), the MaxHeartbeatSkew golden constant, the OpenAPI surface, and the Dashboard ReachabilityBadge. The exact file references are in Cross-references.
Downstream consumers (platform audit log, node deregistration, relay-fallback heuristics, rolled-up state reports) MUST treat the deferred SSE drain as scaffolding: every wire shape is stable from day one, and the tenancy.NodeReachabilityChanged outbox row is the single dependency they consume the transition stream from rather than the read-side projection.
Cross-references
./sse.md— the push-channel sibling. Thetenancy.NodeReachabilityChangedevent-type the evaluator emits travels onto the bus as anode_state_updatedenvelope; the bus's ubiquitous-language table cross-links the event-type back to this document for state-machine semantics../reconciliation-pull.md— the cold- start sibling. The reconciliation-pull surface carries aReachabilityblock onNodeStateSnapshotthat mirrors the wire shape ofGET /v1/nodes/{id}/reachabilityso plexd can recover the Node's liveness verdict without a second round-trip.../../contributing/layout.md— bounded-context map row enumeratinginternal/mesh/reachabilityand the depguard allow-list that keeps the sub-context boundary intact.../../how-to/mesh/operate-reachability.md— operator how-to for inspecting heartbeat ingest, reading the per-Node reachability projection, and chasing a stuck or flapping state transition. (Authored as task 5.4 in the same orchestration as this reference; forward link.)../../architecture/mesh-event-bus-roadmap.md— the deferred- wiring tracker that records the SSE relay's production-wiring gap the evaluator's outbox-emitter currently rides on.../../../api/openapi/plexsphere-v1.yaml— OpenAPI spec forPOST /v1/nodes/{id}/heartbeatandGET /v1/nodes/{id}/reachability, including theHeartbeatRequest,HeartbeatResponse,Reachabilityschemas and the four refusal codes.../../../internal/mesh/reachability/state.go— the pure state machine and theApplytransition function.../../../internal/mesh/reachability/evaluator.go— the periodic sweeper, the read-path-rejector DECISION block, and the per-Node Update + Emit + Audit pipeline.../../../internal/identity/tenancy/domain.go— the per-DomainReachabilityPolicyvalue object and its invariant gates.../../../internal/platform/clock/skew.go— theMaxHeartbeatSkewconstant and theWithinSkewhelper.../../../internal/platform/db/migrations/0010_node_reachability.sql— the schema migration adding the six Node columns, the three Domain columns, and thenodes_stale_idxpartial index.../../../tests/workspace/state_factory_wiring_receipt_test.go— the wiring-receipt drift gate the heartbeat / reachability / evaluator factories share with the reconciliation-pull factory.
Ubiquitous language
The terms below travel together across the Go code, the OpenAPI contract, the audit log, the structured-log attributes, the Prometheus metric label values, and operator-facing tooling. Names are preserved verbatim in error messages and audit-row vocabulary so a reader chasing a string from a log line finds it in the source without translation.
| Term | Definition |
|---|---|
| Heartbeat | A signed liveness ping plexd POSTs to POST /v1/nodes/{id}/heartbeat at the per-Domain HeartbeatInterval cadence. The wire envelope is HeartbeatRequest{client_now, binary_checksum, binary_version, nat_summary}. The ingestion stamps the server's last_heartbeat_at on the addressed Node row. |
| Reachability | The server's verdict on a Node's liveness, derived purely from server-observed inputs. Persisted as a (state, last_heartbeat_at, changed_at) triple on plexsphere.nodes and projected onto the wire by GET /v1/nodes/{id}/reachability. |
| Healthy | A Reachability.state value (StateHealthy = "healthy"). The Node's last admitted heartbeat is fresher than the per-Domain StaleAfter threshold. The schema default for a freshly-registered Node. |
| Stale | A Reachability.state value (StateStale = "stale"). The Node's last admitted heartbeat is older than StaleAfter but younger than UnreachableAfter. The transition Healthy → Stale is driven by the periodic evaluator. |
| Unreachable | A Reachability.state value (StateUnreachable = "unreachable"). The Node's last admitted heartbeat is older than UnreachableAfter. The transition Stale → Unreachable is driven by the periodic evaluator. |
| ReachabilityPolicy | The per-Domain value object ({HeartbeatInterval, StaleAfter, UnreachableAfter}) on the Tenancy aggregate that drives the evaluator's thresholds. Persisted as the three reach_* columns on plexsphere.domains. |
| HeartbeatInterval | The cadence agents must honour. Floor ≥ 10s, ceiling ≤ 1h. Default 30s. Pinned in internal/identity/tenancy/domain.go's DefaultReachabilityPolicy. |
| StaleAfter | The grace window before the evaluator declares a Node Stale. Floor ≥ 3 × HeartbeatInterval (so a single dropped beat does not trip the verdict), ceiling ≤ 1h. Default 90s. |
| UnreachableAfter | The grace window before the evaluator declares a Node Unreachable. Floor ≥ 2 × StaleAfter (avoids flapping near the Stale boundary), ceiling ≤ 1h. Default 300s. |
| BinaryChecksum | The raw SHA-256 digest (exactly 32 bytes) of the agent binary, base64-encoded on the wire. The aggregate's RecordHeartbeat method enforces the 32-byte length invariant. Empty / wrong-length checksums are refused with 400 binary_checksum_empty. |
| BinaryVersion | The agent's semver string. Empty values (after trimming) are refused with 400 binary_version_empty. |
| NATSummary | The optional advisory NAT/firewall summary the agent observed at envelope creation. The handler does not parse it; the aggregate persists it verbatim as a JSON byte slice for downstream observability. |
| NSK (Node Session Key) | The Node's per-session bearer credential, presented as Authorization: Bearer nsk_<env>_<...>. The NSKAuthenticator middleware in internal/identity/authn/middleware/nsk.go resolves it to a tenancy.Node and refuses path-id ≠ resolved-Node-id with 403 node_id_mismatch. |
| MaxHeartbeatSkew | The inclusive upper bound on |client_now − server_now| accepted by the heartbeat handler. Pinned at 60 * time.Second in internal/platform/clock/skew.go against a golden literal; widening is a security regression, not a tuning knob. |
| Reachability evaluator | The per-Domain background ticker in internal/mesh/reachability/evaluator.go that paginates the partial-index-backed candidate set, runs Apply per row, and writes any transition through UpdateReachabilityState's single-statement UPDATE … RETURNING with idempotency guarded by the reachability_state <> $2 predicate. |
| AuditRelationNodeHeartbeat | The canonical Relation string the heartbeat handler stamps on every audit row: node_heartbeat.record. Stable so audit consumers can filter by relation without parsing free-form text. Pinned in internal/transport/http/v1/handlers/heartbeat_deps.go. |
| AuditRelationNodeReachabilityRead | The canonical Relation string the read-side handler stamps on every audit row: node_reachability.read. Pinned in heartbeat_deps.go alongside the heartbeat relation. |
| node-agent relation | The ReBAC relation enforced on GET /v1/nodes/{id}/reachability. Shared with GET /v1/nodes/{id}/state and GET /v1/nodes/{id}/events so a single principal cone authorises the cold-start view, the push channel, and the reachability projection. |
Wire shape — POST /v1/nodes/{id}/heartbeat
The request envelope is the HeartbeatRequest value type defined in internal/transport/http/v1/handlers/heartbeat.go, mirrored on the OpenAPI spec at api/openapi/plexsphere-v1.yaml under components.schemas.HeartbeatRequest.
| Field | Type | Meaning |
|---|---|---|
client_now | RFC 3339 timestamp | The agent's local wall-clock at envelope creation. Compared against the server's now via clock.WithinSkew — |delta| > MaxHeartbeatSkew rejects with 400 clock_skew. NEVER drives state transitions; the server uses its own now for verdict logic. |
binary_checksum | base64-encoded []byte (32 bytes raw SHA-256) | The aggregate's RecordHeartbeat invariant enforces the 32-byte length. Empty / wrong-length values reject with 400 binary_checksum_empty. |
binary_version | string | The agent's semver string. Empty after trimming rejects with 400 binary_version_empty. |
nat_summary | json.RawMessage (optional) | The agent's NAT/firewall advisory blob. Forwarded verbatim to the aggregate for persistence; the handler does not parse it. |
The success response envelope is the HeartbeatResponse value type in the same file.
| Field | Type | Meaning |
|---|---|---|
accepted_at | RFC 3339 timestamp | The server-side admission instant the handler stamped after the clock-skew gate passed and before the aggregate write. |
reconcile | bool | Hint to the agent that it should issue a reconciliation pull at GET /v1/nodes/{id}/state. ALWAYS false today — becomes load-bearing once the snapshot-divergence detection lands as a follow-up to the reconciliation-pull surface. |
rotate_keys | bool | Hint to the agent that it should rotate its NSK. ALWAYS false today — becomes load-bearing once the NSK rotation workflow lands. |
The four refusal codes the handler emits (Problem-Details code field):
| Code | HTTP | Trigger | Source file |
|---|---|---|---|
clock_skew | 400 | |client_now − server_now| > MaxHeartbeatSkew (60s) | heartbeat.go step 4 |
node_id_mismatch | 403 | URL path id ≠ resolved NSK Node id (defense-in-depth; the NSK middleware also runs this gate) | heartbeat.go step 2 |
nsk_revoked | 401 | NSK middleware resolves to a revoked NSK row (resolver responsibility) | internal/identity/authn/middleware/nsk.go |
binary_checksum_empty | 400 | len(binary_checksum) ≠ 32 (also triggered by an empty / nil slice) | heartbeat.go step 5 (ErrHeartbeatChecksumInvalid translation) |
A binary_version_empty (400) variant fires for empty / whitespace- only binary_version values; it shares the same well-formed-envelope requirement row as binary_checksum_empty but is a distinct Problem code.
Wire shape — GET /v1/nodes/{id}/reachability
The success response envelope is the ReachabilityResponse value type defined in internal/transport/http/v1/handlers/reachability.go, mirrored on the OpenAPI spec under components.schemas.Reachability.
| Field | Type | Meaning |
|---|---|---|
state | string ("healthy" / "stale" / "unreachable" / "") | The server's current verdict. The empty-string variant is bootstrap noise — the schema default before the first evaluator sweep. |
last_heartbeat_at | RFC 3339 timestamp (UTC) | The most recent successful ingestion. Zero means "never observed"; pair (state="", last_heartbeat_at=0) is the bootstrap state. |
changed_at | RFC 3339 timestamp (UTC) | The most recent state transition instant. Always present; matches last_heartbeat_at for a never-transitioned Node. |
The handler emits four refusal arms: 401 unauthorized (no Principal), 403 insufficient_relation (no node-agent relation on node:<id>), 404 node_not_found (the addressed Node id does not resolve to a row), and 501 reachability_not_provisioned (one or more of ReachabilityReader, RelationChecker, NodeRepo is nil at composition time — the reachability_dispatch.go gate fires).
State machine
The Apply function in internal/mesh/reachability/state.go is the SINGLE exported transition rule. Every consumer (the heartbeat handler's downstream observability, the evaluator's per-row loop, the reconciliation pull's reachability block) funnels through it rather than branching on elapsed time itself, so the matrix has exactly one home.
mermaid
stateDiagram-v2
[*] --> Healthy : schema default (`reachability_state = 'healthy'`)
Healthy --> Stale : evaluator: elapsed >= StaleAfter && elapsed < UnreachableAfter
Stale --> Unreachable : evaluator: elapsed >= UnreachableAfter
Healthy --> Unreachable : evaluator: elapsed >= UnreachableAfter (skipped Stale)
Stale --> Healthy : admitted heartbeat: elapsed < StaleAfter
Unreachable --> Healthy : admitted heartbeat: elapsed < StaleAfter
Unreachable --> Stale : admitted heartbeat: elapsed >= StaleAfter && elapsed < UnreachableAfterThe transition rules Apply encodes (in transitionFor):
Elapsed (now − lastHeartbeatAt) | Result |
|---|---|
elapsed >= UnreachableAfter | StateUnreachable |
StaleAfter <= elapsed < UnreachableAfter | StateStale |
elapsed < StaleAfter | StateHealthy |
Apply returns (next, changed) where changed = (next != prev), so a no-op transition (e.g. Healthy × elapsed < StaleAfter, Stale × elapsed between thresholds) returns (prev, false) and the caller short-circuits the eventing pipeline. The fuzz harness at internal/mesh/reachability/state_fuzz_test.go pins the contract that an invalid prev MUST NOT panic — the function may return any State for invalid input, but never crashes the caller.
The two transition origins:
- Recovery edges (
Stale → Healthy,Unreachable → Healthy,Unreachable → Stale) are driven by an admitted heartbeat resetting the elapsed clock back below the relevant threshold. The heartbeat handler stampslast_heartbeat_atand the next evaluator sweep observeselapsed < StaleAfter(or between thresholds) and applies the transition. The handler itself does NOT writereachability_state— see the read-path-rejector DECISION below . - Advance edges (
Healthy → Stale,Stale → Unreachable,Healthy → Unreachable) are driven solely by the periodic evaluator. The handler is a thin admit-and-write that updateslast_heartbeat_atand the NAT/binary advisories; the state- machine vocabulary lives entirely in the evaluator.
TTL / threshold defaults & per-Domain knobs
The per-Domain ReachabilityPolicy value object on the Tenancy aggregate (internal/identity/tenancy/domain.go) carries three thresholds. Defaults and invariant gates:
| Threshold | Default | Floor | Ceiling | Invariant |
|---|---|---|---|---|
HeartbeatInterval | 30s | ≥ 10s | ≤ 1h | (none beyond floor / ceiling) |
StaleAfter | 90s | ≥ 3 × HeartbeatInterval | ≤ 1h | A single dropped beat must not trip the verdict |
UnreachableAfter | 300s | ≥ 2 × StaleAfter | ≤ 1h | Avoids flapping near the Stale boundary |
The defaults are pinned by DefaultReachabilityPolicy in domain.go:
go
var DefaultReachabilityPolicy = ReachabilityPolicy{
HeartbeatInterval: 30 * time.Second,
StaleAfter: 90 * time.Second,
UnreachableAfter: 300 * time.Second,
}A fully-zero ReachabilityPolicy{} on NewDomain / Hydrate is treated as the DEFAULT signal and replaced with DefaultReachabilityPolicy at build time. A partial fill (e.g. only HeartbeatInterval set) is REJECTED with errInvariant rather than silently extended — a caller who meant to set a custom StaleAfter but spelled the field wrong sees the error in their face. Each gate returns errInvariant("REQ-006", "Domain", <field>, <bound>) so a grep on the requirement id surfaces every refusal site.
The persistence layer mirrors the value object on plexsphere.domains as three interval columns (reach_heartbeat_interval, reach_stale_after, reach_unreachable_after), defaulted to the same (30s, 90s, 300s) triple at the schema level (0010_node_reachability.sql). The Tenancy aggregate's Hydrate boundary re-validates the multipliers on load so a hand-edited Domain row that violates the invariants fails fast.
Evaluator scheduling
The reachability evaluator (internal/mesh/reachability/evaluator.go) is the per-Domain background ticker that drives advance-edge transitions. The composition root in cmd/plexsphere/evaluator_factory_prod.go wires it on the same PLEXSPHERE_DSN env-var as the other production-wired surfaces; an empty DSN leaves the evaluator unwired (no goroutine, no sweep, no transitions).
Tick loop
Run(ctx, tick) drives Tick(ctx) on a fixed-cadence ticker. The production tick is read from PLEXSPHERE_REACH_EVAL_TICK with a 5 * time.Second default (DefaultEvaluatorTick). Five seconds is the scaffold balance between responsiveness (a Stale → Unreachable transition is observable within one full cycle) and Postgres pressure (a 1024-row LoadStaleNodes page is a single index scan, so 5s ticks land < 1k SELECT/min on a quiet Domain). Cancelling the context terminates the sweep.
Per-tick procedure
Each Tick(ctx) call:
- Reads
now = clock.Now().UTC()via the injectedClockport (a virtual clock in tests so aTick()call advances "now" without sleeping the test goroutine). - Calls
Lister.ListStale(ctx, now, limit, cursor)— paginates thenodes_stale_idxpartial index by(reachability_changed_at, id). The cursor's zero value bootstraps the first page; a short page (len < limit) drains the candidate set for this tick. - For each returned
tenancy.Noderow:
- Resolves the Node's Domain
ReachabilityPolicyviaPolicies.LookupPolicy(ctx, node.DomainID()). A non-nil error short-circuits the loop on this Node and surfaces as a WARN log; the evaluator CONTINUES to the next Node — one misconfigured Domain must not stall every other Domain's sweep. - Runs
Apply(prev, now, node.Reachability.LastHeartbeatAt, policy.StaleAfter, policy.UnreachableAfter). Ifchanged == false, skips the row — the eventing pipeline short-circuits. - Calls
Writer.Update(ctx, nodeID, string(next), now)— a singleUPDATE … RETURNINGwith areachability_state <> $2guard for idempotency. Returns(transitioned bool, err). A(false, nil)arm means another writer (or an earlier sweep) already landed this transition — the evaluator suppresses event + audit so only the row that actually transitioned is the single emitter. - On a successful transition, calls
Events.Emit(ctx, ev)to emit thetenancy.NodeReachabilityChangedevent (slog-only today — see status section), thenAudit.Record(ctx, entry)to emit the audit row.
- Advances the cursor to the last row of the page; loops if the page was full, returns otherwise.
Partial-index paging
The query that paginates the candidate set rides the nodes_stale_idx partial b-tree on (reachability_changed_at) WHERE reachability_state IN ('healthy','stale'). The Unreachable-filtered subset of the index keeps the working set small: an Unreachable Node has no further advance transition, so including it in the index would only inflate the scan width.
The defaultStalePageLimit constant pins the per-tick fan-out at 1024 rows. The number is large enough to drain a small Domain in one tick and small enough to keep the worst-case tx duration bounded under heavy load. A future per-Domain knob slots into the EvaluatorDeps.Limit field without touching the evaluator.
Read-path-rejector posture
The DECISION block at the top of evaluator.go records why the evaluator is the SINGLE WRITER of reachability_state — the heartbeat handler does NOT flip the row to Healthy on every ingestion. The trade-off:
- Alternative considered: drive transitions from the heartbeat handler itself. Every successful ingestion would flip the row's state to
Healthyin the same transaction. - Rejected because (a) the handler would need to know the Domain's
StaleAfterandUnreachableAfterto decide whether the ingestion implies a transition; (b) the state-machine vocabulary would leak into a hot HTTP path that is supposed to remain a thin admit-and-write; (c) two writers on the same row would force pessimistic locking on every heartbeat. - Resolution: the evaluator owns the read path. It consults the Node row, runs
Apply, and writes the transition (with idempotency guarded by the SQL-sidereachability_state <> $2predicate inUpdateReachabilityState). The handler stampslast_heartbeat_atand the evaluator picks up the next sweep — clean separation of concerns, one writer per row.
Clock-skew rules
The heartbeat handler admits a heartbeat only when |server_now − client_now| ≤ MaxHeartbeatSkew (60s). The constant is pinned in internal/platform/clock/skew.go against a golden literal (the test file pins the value at 60 * time.Second so an accidental adjustment surfaces in code review). Widening the bound is a security regression, not a tuning knob — see the file-level DECISION block.
The WithinSkew(t1, t2) helper is symmetric (callers may pass server and client time in either order) and inclusive at the boundary (delta == MaxHeartbeatSkew admits). The implementation branches on time.Time.Before to keep the subtraction in the non-negative half of the duration range — a naive t1.Sub(t2) followed by a sign-flip would saturate at the int64 duration range and let a zero time.Time vs a real timestamp silently pass the gate.
The DECISION block in internal/mesh/reachability/state.go records why client-side timestamps NEVER drive a reachability transition. The only inputs Apply consumes are the server's now and the lastHeartbeatAt value the server itself stamped at admit time. A forged or drifted client clock could otherwise:
- Mask a genuinely absent Node by claiming to have heartbeat in the future.
- Trigger a spurious
Unreachableverdict by claiming to have heartbeat in the deep past. - Trigger a spurious recovery by claiming to have heartbeat after a long quiet period.
All three failure modes would erode the state machine's trustworthiness as the source of operator alerts. The MaxHeartbeatSkew admission gate runs at the handler boundary ; once that gate has passed, the server has no remaining reason to re-introduce client-clock dependence into the verdict logic.
Audit contract
Every decision branch in the heartbeat surface emits exactly one audit.Entry per layer. The heartbeat surface has TWO layers — the NSK middleware (admission) and the handler (ingestion) — so a granted heartbeat produces TWO rows from the operator's perspective: one with relation = node_heartbeat.authenticate, outcome = granted from the middleware (NSKAuditSink) and one with relation = node_heartbeat.record, outcome = granted from the handler (handlers.AuditSink). Audit consumers therefore SHOULD NOT expect a 1:1 mapping between heartbeat requests and audit rows; they SHOULD filter by relation to scope to the layer they care about (admission vs ingestion). A nil AuditSink is tolerated — the slog lines stand in for observability — but the security gates still execute.
The relation strings live in internal/transport/http/v1/handlers/heartbeat_deps.go:
| Constant | Value | Layer / origin |
|---|---|---|
AuditRelationNodeHeartbeatAuthenticate | node_heartbeat.authenticate | NSK middleware admission gate (internal/identity/authn/middleware/nsk.go). |
AuditRelationNodeHeartbeat | node_heartbeat.record | Handler ingestion phase: clock-skew, malformed body, invariant violation, recorder failure, granted. |
AuditRelationNodeHeartbeatPathGate | node_heartbeat.path_gate | Handler defense-in-depth path-id gate. Distinct relation so dashboards can alert on "middleware was bypassed but handler caught it" without conflating it with ingestion-phase entries. |
AuditRelationNodeReachabilityRead | node_reachability.read | Read-side handler. |
node_reachability.transition | (literal in evaluator) | Per-Domain reachability evaluator (slogReachabilityAuditSink.Record at cmd/plexsphere/evaluator_factory_prod.go). |
The closed AuditOutcome set is enumerated in internal/transport/http/v1/handlers/events_deps.go. The heartbeat surface uses six distinct values so audit consumers triage clock-spoofing from malformed-body from invariant-violation from middleware-bypass from internal-error WITHOUT parsing free-form reason strings:
| Value | Emitter / cause |
|---|---|
granted | Every layer on success. |
node_id_mismatch | Handler path-id gate (defense-in-depth). |
clock_skew | Handler clock-skew gate (client_now outside MaxHeartbeatSkew). |
malformed_request | Handler JSON-decode arm. |
invariant_violation | Recorder rejection on the 32-byte checksum or non-empty version invariant. |
internal_error | Recorder returned a non-sentinel error (Postgres outage, adapter bug). |
insufficient_relation | NSK middleware admission denial; reachability read-side ReBAC denial. |
signature_failure | (SSE bus only.) |
| Branch | Source | Relation | Outcome | Reason |
|---|---|---|---|---|
| NSK middleware admission (granted) | nsk.go middleware | node_heartbeat.authenticate | granted | nsk_authenticated |
| NSK middleware refusal (revoked / invalid) | nsk.go middleware | node_heartbeat.authenticate | insufficient_relation | nsk_invalid / nsk_revoked |
| Heartbeat admitted | heartbeat.go step 6 | node_heartbeat.record | granted | heartbeat ingested |
| Clock-skew refusal | heartbeat.go step 4 | node_heartbeat.record | clock_skew | client_now outside MaxHeartbeatSkew window |
| Path-id mismatch (defense-in-depth) | heartbeat.go step 2 | node_heartbeat.path_gate | node_id_mismatch | resolved Node id does not match URL path id (defense-in-depth) |
| Malformed body | heartbeat.go step 3 | node_heartbeat.record | malformed_request | malformed heartbeat envelope: <reason> |
| Checksum invariant violation | heartbeat.go handleHeartbeatRecordError | node_heartbeat.record | invariant_violation | binary checksum invariant violated |
| Version invariant violation | heartbeat.go handleHeartbeatRecordError | node_heartbeat.record | invariant_violation | binary version invariant violated |
| Recorder internal error | heartbeat.go handleHeartbeatRecordError | node_heartbeat.record | internal_error | heartbeat persistence failed: <reason> |
| Reachability projection read (granted) | reachability.go step 3 | node_reachability.read | granted | reachability projection read |
| Reachability projection read (refused) | reachability.go step 2 | node_reachability.read | insufficient_relation | principal lacks node-agent relation |
Transition Healthy → Stale | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat overdue (stale threshold exceeded) |
Transition Stale → Unreachable | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat absent (unreachable threshold exceeded) |
Transition Healthy → Unreachable | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat absent (skipped stale, hit unreachable) |
Transition Stale → Healthy | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat resumed (back to healthy) |
Transition Unreachable → Healthy | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat resumed (recovered from unreachable) |
Transition Unreachable → Stale | evaluator evaluateNode | node_reachability.transition | granted | evaluator: heartbeat resumed (partial recovery to stale) |
The full reason-string table lives in transitionReason in evaluator.go so audit consumers can switch on the exact strings without parsing free-form text. Idempotent no-op writes ((false, nil) from Writer.Update) suppress the audit + event arms — only the row that actually transitioned is the single emitter.
OpenAPI surface reference
The wire surface is pinned in api/openapi/plexsphere-v1.yaml under the mesh tag. The relevant operationIds and schemas:
| OpenAPI artefact | Where in the spec | Go counterpart |
|---|---|---|
PostNodeHeartbeat operation | paths./v1/nodes/{id}/heartbeat.post | Handlers.PostNodeHeartbeat in heartbeat_dispatch.go (gated entry) and postNodeHeartbeat in heartbeat.go (body). |
GetNodeReachability operation | paths./v1/nodes/{id}/reachability.get | Handlers.GetNodeReachability in reachability_dispatch.go (gated entry) and getNodeReachability in reachability.go (body). |
HeartbeatRequest schema | components.schemas.HeartbeatRequest | handlers.HeartbeatRequest in heartbeat.go. |
HeartbeatResponse schema | components.schemas.HeartbeatResponse | handlers.HeartbeatResponse in heartbeat.go. |
Reachability schema | components.schemas.Reachability | handlers.ReachabilityResponse in reachability.go; mirrored on the bounded-context value object tenancy.Reachability. |
NodeStateSnapshot.reachability field | components.schemas.NodeStateSnapshot.reachability ($ref: Reachability) | The reconciliation-pull surface (./reconciliation-pull.md) carries the same Reachability projection on the cold-start envelope so plexd recovers the verdict without a second round-trip. |
Problem schema | components.schemas.Problem (shared) | The four heartbeat refusal codes are clock_skew, node_id_mismatch, binary_checksum_empty, binary_version_empty; the read-side handler emits unauthorized, insufficient_relation, node_not_found; both surfaces emit heartbeat_not_provisioned / reachability_not_provisioned on the 501 fail-closed stub. |
The OpenAPI byte-equality drift gate at tests/workspace/openapi_drift_test.go asserts that the source spec and the embedded mirror under internal/transport/http/v1/handlers/plexsphere-v1.yaml are byte- identical, so a spec drift surfaces at go test time, not at runtime.
Threat model
The heartbeat surface mitigates four classes of attack along the plexd → plexsphere ingestion path. Each mitigation is implemented in a single, named place so a reader chasing a security claim does not have to assemble it from multiple files.
- Clock spoofing. A compromised plexd with a forged client clock could mask absence by stamping
client_nowin the future, or trigger a spuriousUnreachableverdict by stamping it in the distant past, or trigger a spurious recovery by stamping it after a long quiet period. Mitigation: the handler admits the heartbeat only when|client_now − server_now| ≤ MaxHeartbeatSkew (60s)— pinned atinternal/platform/clock/skew.go— and the state machine then drives transitions purely from the server'snowand the server-stampedlast_heartbeat_at. Client- claimed timestamps NEVER reachApply's input. - Cross-Node impersonation. A Node with a valid NSK could attempt to heartbeat on behalf of another Node by POSTing to a sibling's
/v1/nodes/{otherId}/heartbeat. Mitigation: theNSKAuthenticatormiddleware ininternal/identity/authn/middleware/nsk.goresolves the bearer to atenancy.Nodeand enforces a path-id- equals-resolved-Node-id gate; a mismatch surfaces as403 node_id_mismatch. The handler runs the same check as defense-in- depth (heartbeat.gostep 2) so a misconfigured router that mounts the handler without the middleware still rejects. The per-handler-path mount incmd/plexsphere/heartbeat_factory_prod.gopins the middleware on/v1/nodes/{id}/heartbeatonly — see the file-level DECISION block on why a global NSK gate would either reject every cookie-authenticated dashboard call or admit an NSK envelope where cookie/JWT was expected. - Checksum strip / replay. An attacker with a captured heartbeat could attempt to replay it to mask a paused or compromised agent, or strip the
binary_checksumfield to bypass downstream integrity checks. Mitigation: the aggregate'sRecordHeartbeatinvariant rejects empty / wrong-length checksums with400 binary_checksum_empty, and the persistence-layer CHECK constraint onplexsphere.nodes.binary_checksumis the schema-level backstop. The drifted-checksum case (a heartbeat whose checksum no longer matches the canonical artefact registry) is NOT refused here — that refusal lives in the artifact registry context once the canonical checksum store is in place; this context's invariant is "well-formed shape", not "matches the canonical artefact". - Stale or revoked NSK admission. A stolen NSK that the operator has rotated could attempt to refresh a Node's liveness verdict after revocation. Mitigation: the NSK middleware's
Resolverport refuses to admit a heartbeat whose NSK row is in the revoked set; the production binding will surface401 nsk_invalid(mapped to thensk_revokedaudit reason) once the real plaintext-resolver lands. Today thefailClosedNSKResolverreturnsErrNSKResolverNotProvisionedfor every Resolve call so EVERY NSK envelope is refused — fail-closed by construction.
What this context is NOT
This context ships the heartbeat ingestion + per-Node liveness-tracking surface. The following adjacent concerns are explicit non-goals — each is owned by a downstream context:
- Hash-chained durable audit log. The audit entries this context emits flow through a
slog-backed sink. The durable hash-chained sink, the audit relation registry, and the audit-log query API land in the Platform Audit Log context. - Node deregistration / lifecycle. A Node that goes
Unreachableis NOT removed from the Domain by this surface. The deregistration workflow, the per-Node lifecycle state machine, and the operator-drivenforce_deregisteraction are owned by the Key & Peer Manager context — the lifecycle and deregistration sibling for the per-Node liveness surface. The reachability state remainsUnreachableindefinitely until either a recovery heartbeat arrives or an operator action removes the Node. - Relay-fallback reachability heuristics. The bridge / relay orchestrator's choice of fallback bridge per peer pair MAY consult the reachability projection, but the heuristics themselves (latency, load, churn) are owned by the relay-fallback assignment context. The per-Node reachability projection is one of several inputs that consumer takes; this surface simply exposes the read.
- Node state service reports. The
stateandreportsblocks onNodeStateSnapshot(the reconciliation-pull envelope) ship as JSONnullplaceholders today and become populated value objects in the node state service context. The reachability projection on the same envelope is the only block this context populates; the rolled-up state reports are the node state service's responsibility. - Operator-tunable
MaxHeartbeatSkew. The 60-second bound is a security invariant, not a tuning knob — the file-level DECISION block ininternal/platform/clock/skew.gorejects the configuration alternative because widening would silently erode the replay-resistance the threat model relies on. A future per-Domain knob is NOT planned. - Rate limiting heartbeats below
HeartbeatInterval. A plexd that POSTs faster than the per-DomainHeartbeatIntervalis NOT rate-limited here — the surface admits every well-formed heartbeat. Per-Node rate limiting is a future surface; the brief explicitly lists it as a deferred edge case. - Driving the
Reconcile/RotateKeyshint flags. TheHeartbeatResponseenvelope carries the two boolean hints today but the handler always emits them asfalse.Reconcilebecomes load-bearing once snapshot-divergence detection lands as a follow-up to the reconciliation-pull surface;RotateKeysbecomes load-bearing once the key rotation workflow lands. The wire shape is stable today so plexd's reconcile loop can pin against it.