Skip to content

Per-node reachability — heartbeat and liveness tracking

This document is the authoritative bounded-context reference for the per-Node reachability sub-context that ships under internal/mesh/reachability and exposes through POST /v1/nodes/{id}/heartbeat and GET /v1/nodes/{id}/reachability. It covers the ubiquitous language, the wire envelopes, the three-state liveness machine, the per-Domain threshold policy, the periodic evaluator, the clock-skew admission rule, the audit contract, the OpenAPI cross-reference, and the threat model.

The reachability surface is the plexd → plexsphere liveness channel: a per-Node, NSK-authenticated heartbeat ingestion endpoint plus a synchronous read of the server's verdict on each Node's liveness state. Anything outside that surface — the underlying Signed SSE Event Bus that fans out tenancy.NodeReachabilityChanged events, the reconciliation pull that exposes a cold-start view, the per-Domain Tenancy aggregate that owns ReachabilityPolicy, the audit hash chain, the Dashboard's ReachabilityBadge — is a collaborator the reachability sub-context projects from or emits to, not a concern of this document. The two surfaces this document pins are the in-process liveness state machine and the wire-shape contracts that surround it.

Status — partial delivery

The writer-side closure for the reachability event bus is load-bearing in the production binary:

  • Outbox-backed tenancy.NodeReachabilityChanged emission — the evaluator's ReachabilityWriter routes every transition through NodeRepo.TransitionAndEmit, folding the UpdateReachabilityState write and the tenancy/events.NodeReachabilityChanged outbox append into one Postgres transaction. NewNodeReachabilityChanged in internal/identity/tenancy/events is the canonical event shape.
  • Chain-bound AuditSink — each transition lands on the per-Domain audit chain as internal/audit.Entry with Relation = "node_reachability.transition", Object = "domain:<hex16>", RelationPath = [from, to, reason].

One downstream surface remains deferred: the SSE relay drain for node_state_updated that surfaces the outbox row on GET /v1/nodes/{id}/events, tracked in ../../architecture/mesh-event-bus-roadmap.md. The outbox row is durable on every transition today; the wire envelope reaches a subscriber once the relay drain lands.

Load-bearing today (each detailed in its own section below): the pure state machine (state.go, Apply), the heartbeat handler with its six-step gate ordering and four refusal codes, the production NSK plaintext resolver (heartbeat_nsk_resolver.go), the reachability read handler, the per-Domain ReachabilityPolicy value object and its invariant gates, the periodic evaluator, the 0010_node_reachability.sql migration (six Node columns, three Domain columns, the nodes_stale_idx partial index), the MaxHeartbeatSkew golden constant, the OpenAPI surface, and the Dashboard ReachabilityBadge. The exact file references are in Cross-references.

Downstream consumers (platform audit log, node deregistration, relay-fallback heuristics, rolled-up state reports) MUST treat the deferred SSE drain as scaffolding: every wire shape is stable from day one, and the tenancy.NodeReachabilityChanged outbox row is the single dependency they consume the transition stream from rather than the read-side projection.

Cross-references

Ubiquitous language

The terms below travel together across the Go code, the OpenAPI contract, the audit log, the structured-log attributes, the Prometheus metric label values, and operator-facing tooling. Names are preserved verbatim in error messages and audit-row vocabulary so a reader chasing a string from a log line finds it in the source without translation.

TermDefinition
HeartbeatA signed liveness ping plexd POSTs to POST /v1/nodes/{id}/heartbeat at the per-Domain HeartbeatInterval cadence. The wire envelope is HeartbeatRequest{client_now, binary_checksum, binary_version, nat_summary}. The ingestion stamps the server's last_heartbeat_at on the addressed Node row.
ReachabilityThe server's verdict on a Node's liveness, derived purely from server-observed inputs. Persisted as a (state, last_heartbeat_at, changed_at) triple on plexsphere.nodes and projected onto the wire by GET /v1/nodes/{id}/reachability.
HealthyA Reachability.state value (StateHealthy = "healthy"). The Node's last admitted heartbeat is fresher than the per-Domain StaleAfter threshold. The schema default for a freshly-registered Node.
StaleA Reachability.state value (StateStale = "stale"). The Node's last admitted heartbeat is older than StaleAfter but younger than UnreachableAfter. The transition Healthy → Stale is driven by the periodic evaluator.
UnreachableA Reachability.state value (StateUnreachable = "unreachable"). The Node's last admitted heartbeat is older than UnreachableAfter. The transition Stale → Unreachable is driven by the periodic evaluator.
ReachabilityPolicyThe per-Domain value object ({HeartbeatInterval, StaleAfter, UnreachableAfter}) on the Tenancy aggregate that drives the evaluator's thresholds. Persisted as the three reach_* columns on plexsphere.domains.
HeartbeatIntervalThe cadence agents must honour. Floor ≥ 10s, ceiling ≤ 1h. Default 30s. Pinned in internal/identity/tenancy/domain.go's DefaultReachabilityPolicy.
StaleAfterThe grace window before the evaluator declares a Node Stale. Floor ≥ 3 × HeartbeatInterval (so a single dropped beat does not trip the verdict), ceiling ≤ 1h. Default 90s.
UnreachableAfterThe grace window before the evaluator declares a Node Unreachable. Floor ≥ 2 × StaleAfter (avoids flapping near the Stale boundary), ceiling ≤ 1h. Default 300s.
BinaryChecksumThe raw SHA-256 digest (exactly 32 bytes) of the agent binary, base64-encoded on the wire. The aggregate's RecordHeartbeat method enforces the 32-byte length invariant. Empty / wrong-length checksums are refused with 400 binary_checksum_empty.
BinaryVersionThe agent's semver string. Empty values (after trimming) are refused with 400 binary_version_empty.
NATSummaryThe optional advisory NAT/firewall summary the agent observed at envelope creation. The handler does not parse it; the aggregate persists it verbatim as a JSON byte slice for downstream observability.
NSK (Node Session Key)The Node's per-session bearer credential, presented as Authorization: Bearer nsk_<env>_<...>. The NSKAuthenticator middleware in internal/identity/authn/middleware/nsk.go resolves it to a tenancy.Node and refuses path-id ≠ resolved-Node-id with 403 node_id_mismatch.
MaxHeartbeatSkewThe inclusive upper bound on |client_now − server_now| accepted by the heartbeat handler. Pinned at 60 * time.Second in internal/platform/clock/skew.go against a golden literal; widening is a security regression, not a tuning knob.
Reachability evaluatorThe per-Domain background ticker in internal/mesh/reachability/evaluator.go that paginates the partial-index-backed candidate set, runs Apply per row, and writes any transition through UpdateReachabilityState's single-statement UPDATE … RETURNING with idempotency guarded by the reachability_state <> $2 predicate.
AuditRelationNodeHeartbeatThe canonical Relation string the heartbeat handler stamps on every audit row: node_heartbeat.record. Stable so audit consumers can filter by relation without parsing free-form text. Pinned in internal/transport/http/v1/handlers/heartbeat_deps.go.
AuditRelationNodeReachabilityReadThe canonical Relation string the read-side handler stamps on every audit row: node_reachability.read. Pinned in heartbeat_deps.go alongside the heartbeat relation.
node-agent relationThe ReBAC relation enforced on GET /v1/nodes/{id}/reachability. Shared with GET /v1/nodes/{id}/state and GET /v1/nodes/{id}/events so a single principal cone authorises the cold-start view, the push channel, and the reachability projection.

Wire shape — POST /v1/nodes/{id}/heartbeat

The request envelope is the HeartbeatRequest value type defined in internal/transport/http/v1/handlers/heartbeat.go, mirrored on the OpenAPI spec at api/openapi/plexsphere-v1.yaml under components.schemas.HeartbeatRequest.

FieldTypeMeaning
client_nowRFC 3339 timestampThe agent's local wall-clock at envelope creation. Compared against the server's now via clock.WithinSkew|delta| > MaxHeartbeatSkew rejects with 400 clock_skew. NEVER drives state transitions; the server uses its own now for verdict logic.
binary_checksumbase64-encoded []byte (32 bytes raw SHA-256)The aggregate's RecordHeartbeat invariant enforces the 32-byte length. Empty / wrong-length values reject with 400 binary_checksum_empty.
binary_versionstringThe agent's semver string. Empty after trimming rejects with 400 binary_version_empty.
nat_summaryjson.RawMessage (optional)The agent's NAT/firewall advisory blob. Forwarded verbatim to the aggregate for persistence; the handler does not parse it.

The success response envelope is the HeartbeatResponse value type in the same file.

FieldTypeMeaning
accepted_atRFC 3339 timestampThe server-side admission instant the handler stamped after the clock-skew gate passed and before the aggregate write.
reconcileboolHint to the agent that it should issue a reconciliation pull at GET /v1/nodes/{id}/state. ALWAYS false today — becomes load-bearing once the snapshot-divergence detection lands as a follow-up to the reconciliation-pull surface.
rotate_keysboolHint to the agent that it should rotate its NSK. ALWAYS false today — becomes load-bearing once the NSK rotation workflow lands.

The four refusal codes the handler emits (Problem-Details code field):

CodeHTTPTriggerSource file
clock_skew400|client_now − server_now| > MaxHeartbeatSkew (60s)heartbeat.go step 4
node_id_mismatch403URL path id ≠ resolved NSK Node id (defense-in-depth; the NSK middleware also runs this gate)heartbeat.go step 2
nsk_revoked401NSK middleware resolves to a revoked NSK row (resolver responsibility)internal/identity/authn/middleware/nsk.go
binary_checksum_empty400len(binary_checksum) ≠ 32 (also triggered by an empty / nil slice)heartbeat.go step 5 (ErrHeartbeatChecksumInvalid translation)

A binary_version_empty (400) variant fires for empty / whitespace- only binary_version values; it shares the same well-formed-envelope requirement row as binary_checksum_empty but is a distinct Problem code.

Wire shape — GET /v1/nodes/{id}/reachability

The success response envelope is the ReachabilityResponse value type defined in internal/transport/http/v1/handlers/reachability.go, mirrored on the OpenAPI spec under components.schemas.Reachability.

FieldTypeMeaning
statestring ("healthy" / "stale" / "unreachable" / "")The server's current verdict. The empty-string variant is bootstrap noise — the schema default before the first evaluator sweep.
last_heartbeat_atRFC 3339 timestamp (UTC)The most recent successful ingestion. Zero means "never observed"; pair (state="", last_heartbeat_at=0) is the bootstrap state.
changed_atRFC 3339 timestamp (UTC)The most recent state transition instant. Always present; matches last_heartbeat_at for a never-transitioned Node.

The handler emits four refusal arms: 401 unauthorized (no Principal), 403 insufficient_relation (no node-agent relation on node:<id>), 404 node_not_found (the addressed Node id does not resolve to a row), and 501 reachability_not_provisioned (one or more of ReachabilityReader, RelationChecker, NodeRepo is nil at composition time — the reachability_dispatch.go gate fires).

State machine

The Apply function in internal/mesh/reachability/state.go is the SINGLE exported transition rule. Every consumer (the heartbeat handler's downstream observability, the evaluator's per-row loop, the reconciliation pull's reachability block) funnels through it rather than branching on elapsed time itself, so the matrix has exactly one home.

mermaid
stateDiagram-v2
    [*] --> Healthy : schema default (`reachability_state = 'healthy'`)
    Healthy --> Stale : evaluator: elapsed >= StaleAfter && elapsed < UnreachableAfter
    Stale --> Unreachable : evaluator: elapsed >= UnreachableAfter
    Healthy --> Unreachable : evaluator: elapsed >= UnreachableAfter (skipped Stale)
    Stale --> Healthy : admitted heartbeat: elapsed < StaleAfter
    Unreachable --> Healthy : admitted heartbeat: elapsed < StaleAfter
    Unreachable --> Stale : admitted heartbeat: elapsed >= StaleAfter && elapsed < UnreachableAfter

The transition rules Apply encodes (in transitionFor):

Elapsed (now − lastHeartbeatAt)Result
elapsed >= UnreachableAfterStateUnreachable
StaleAfter <= elapsed < UnreachableAfterStateStale
elapsed < StaleAfterStateHealthy

Apply returns (next, changed) where changed = (next != prev), so a no-op transition (e.g. Healthy × elapsed < StaleAfter, Stale × elapsed between thresholds) returns (prev, false) and the caller short-circuits the eventing pipeline. The fuzz harness at internal/mesh/reachability/state_fuzz_test.go pins the contract that an invalid prev MUST NOT panic — the function may return any State for invalid input, but never crashes the caller.

The two transition origins:

  • Recovery edges (Stale → Healthy, Unreachable → Healthy, Unreachable → Stale) are driven by an admitted heartbeat resetting the elapsed clock back below the relevant threshold. The heartbeat handler stamps last_heartbeat_at and the next evaluator sweep observes elapsed < StaleAfter (or between thresholds) and applies the transition. The handler itself does NOT write reachability_state — see the read-path-rejector DECISION below .
  • Advance edges (Healthy → Stale, Stale → Unreachable, Healthy → Unreachable) are driven solely by the periodic evaluator. The handler is a thin admit-and-write that updates last_heartbeat_at and the NAT/binary advisories; the state- machine vocabulary lives entirely in the evaluator.

TTL / threshold defaults & per-Domain knobs

The per-Domain ReachabilityPolicy value object on the Tenancy aggregate (internal/identity/tenancy/domain.go) carries three thresholds. Defaults and invariant gates:

ThresholdDefaultFloorCeilingInvariant
HeartbeatInterval30s≥ 10s≤ 1h(none beyond floor / ceiling)
StaleAfter90s≥ 3 × HeartbeatInterval≤ 1hA single dropped beat must not trip the verdict
UnreachableAfter300s≥ 2 × StaleAfter≤ 1hAvoids flapping near the Stale boundary

The defaults are pinned by DefaultReachabilityPolicy in domain.go:

go
var DefaultReachabilityPolicy = ReachabilityPolicy{
    HeartbeatInterval: 30 * time.Second,
    StaleAfter:        90 * time.Second,
    UnreachableAfter:  300 * time.Second,
}

A fully-zero ReachabilityPolicy{} on NewDomain / Hydrate is treated as the DEFAULT signal and replaced with DefaultReachabilityPolicy at build time. A partial fill (e.g. only HeartbeatInterval set) is REJECTED with errInvariant rather than silently extended — a caller who meant to set a custom StaleAfter but spelled the field wrong sees the error in their face. Each gate returns errInvariant("REQ-006", "Domain", <field>, <bound>) so a grep on the requirement id surfaces every refusal site.

The persistence layer mirrors the value object on plexsphere.domains as three interval columns (reach_heartbeat_interval, reach_stale_after, reach_unreachable_after), defaulted to the same (30s, 90s, 300s) triple at the schema level (0010_node_reachability.sql). The Tenancy aggregate's Hydrate boundary re-validates the multipliers on load so a hand-edited Domain row that violates the invariants fails fast.

Evaluator scheduling

The reachability evaluator (internal/mesh/reachability/evaluator.go) is the per-Domain background ticker that drives advance-edge transitions. The composition root in cmd/plexsphere/evaluator_factory_prod.go wires it on the same PLEXSPHERE_DSN env-var as the other production-wired surfaces; an empty DSN leaves the evaluator unwired (no goroutine, no sweep, no transitions).

Tick loop

Run(ctx, tick) drives Tick(ctx) on a fixed-cadence ticker. The production tick is read from PLEXSPHERE_REACH_EVAL_TICK with a 5 * time.Second default (DefaultEvaluatorTick). Five seconds is the scaffold balance between responsiveness (a Stale → Unreachable transition is observable within one full cycle) and Postgres pressure (a 1024-row LoadStaleNodes page is a single index scan, so 5s ticks land < 1k SELECT/min on a quiet Domain). Cancelling the context terminates the sweep.

Per-tick procedure

Each Tick(ctx) call:

  1. Reads now = clock.Now().UTC() via the injected Clock port (a virtual clock in tests so a Tick() call advances "now" without sleeping the test goroutine).
  2. Calls Lister.ListStale(ctx, now, limit, cursor) — paginates the nodes_stale_idx partial index by (reachability_changed_at, id). The cursor's zero value bootstraps the first page; a short page (len < limit) drains the candidate set for this tick.
  3. For each returned tenancy.Node row:
  • Resolves the Node's Domain ReachabilityPolicy via Policies.LookupPolicy(ctx, node.DomainID()). A non-nil error short-circuits the loop on this Node and surfaces as a WARN log; the evaluator CONTINUES to the next Node — one misconfigured Domain must not stall every other Domain's sweep.
  • Runs Apply(prev, now, node.Reachability.LastHeartbeatAt, policy.StaleAfter, policy.UnreachableAfter). If changed == false, skips the row — the eventing pipeline short-circuits.
  • Calls Writer.Update(ctx, nodeID, string(next), now) — a single UPDATE … RETURNING with a reachability_state <> $2 guard for idempotency. Returns (transitioned bool, err). A (false, nil) arm means another writer (or an earlier sweep) already landed this transition — the evaluator suppresses event + audit so only the row that actually transitioned is the single emitter.
  • On a successful transition, calls Events.Emit(ctx, ev) to emit the tenancy.NodeReachabilityChanged event (slog-only today — see status section), then Audit.Record(ctx, entry) to emit the audit row.
  1. Advances the cursor to the last row of the page; loops if the page was full, returns otherwise.

Partial-index paging

The query that paginates the candidate set rides the nodes_stale_idx partial b-tree on (reachability_changed_at) WHERE reachability_state IN ('healthy','stale'). The Unreachable-filtered subset of the index keeps the working set small: an Unreachable Node has no further advance transition, so including it in the index would only inflate the scan width.

The defaultStalePageLimit constant pins the per-tick fan-out at 1024 rows. The number is large enough to drain a small Domain in one tick and small enough to keep the worst-case tx duration bounded under heavy load. A future per-Domain knob slots into the EvaluatorDeps.Limit field without touching the evaluator.

Read-path-rejector posture

The DECISION block at the top of evaluator.go records why the evaluator is the SINGLE WRITER of reachability_state — the heartbeat handler does NOT flip the row to Healthy on every ingestion. The trade-off:

  • Alternative considered: drive transitions from the heartbeat handler itself. Every successful ingestion would flip the row's state to Healthy in the same transaction.
  • Rejected because (a) the handler would need to know the Domain's StaleAfter and UnreachableAfter to decide whether the ingestion implies a transition; (b) the state-machine vocabulary would leak into a hot HTTP path that is supposed to remain a thin admit-and-write; (c) two writers on the same row would force pessimistic locking on every heartbeat.
  • Resolution: the evaluator owns the read path. It consults the Node row, runs Apply, and writes the transition (with idempotency guarded by the SQL-side reachability_state <> $2 predicate in UpdateReachabilityState). The handler stamps last_heartbeat_at and the evaluator picks up the next sweep — clean separation of concerns, one writer per row.

Clock-skew rules

The heartbeat handler admits a heartbeat only when |server_now − client_now| ≤ MaxHeartbeatSkew (60s). The constant is pinned in internal/platform/clock/skew.go against a golden literal (the test file pins the value at 60 * time.Second so an accidental adjustment surfaces in code review). Widening the bound is a security regression, not a tuning knob — see the file-level DECISION block.

The WithinSkew(t1, t2) helper is symmetric (callers may pass server and client time in either order) and inclusive at the boundary (delta == MaxHeartbeatSkew admits). The implementation branches on time.Time.Before to keep the subtraction in the non-negative half of the duration range — a naive t1.Sub(t2) followed by a sign-flip would saturate at the int64 duration range and let a zero time.Time vs a real timestamp silently pass the gate.

The DECISION block in internal/mesh/reachability/state.go records why client-side timestamps NEVER drive a reachability transition. The only inputs Apply consumes are the server's now and the lastHeartbeatAt value the server itself stamped at admit time. A forged or drifted client clock could otherwise:

  • Mask a genuinely absent Node by claiming to have heartbeat in the future.
  • Trigger a spurious Unreachable verdict by claiming to have heartbeat in the deep past.
  • Trigger a spurious recovery by claiming to have heartbeat after a long quiet period.

All three failure modes would erode the state machine's trustworthiness as the source of operator alerts. The MaxHeartbeatSkew admission gate runs at the handler boundary ; once that gate has passed, the server has no remaining reason to re-introduce client-clock dependence into the verdict logic.

Audit contract

Every decision branch in the heartbeat surface emits exactly one audit.Entry per layer. The heartbeat surface has TWO layers — the NSK middleware (admission) and the handler (ingestion) — so a granted heartbeat produces TWO rows from the operator's perspective: one with relation = node_heartbeat.authenticate, outcome = granted from the middleware (NSKAuditSink) and one with relation = node_heartbeat.record, outcome = granted from the handler (handlers.AuditSink). Audit consumers therefore SHOULD NOT expect a 1:1 mapping between heartbeat requests and audit rows; they SHOULD filter by relation to scope to the layer they care about (admission vs ingestion). A nil AuditSink is tolerated — the slog lines stand in for observability — but the security gates still execute.

The relation strings live in internal/transport/http/v1/handlers/heartbeat_deps.go:

ConstantValueLayer / origin
AuditRelationNodeHeartbeatAuthenticatenode_heartbeat.authenticateNSK middleware admission gate (internal/identity/authn/middleware/nsk.go).
AuditRelationNodeHeartbeatnode_heartbeat.recordHandler ingestion phase: clock-skew, malformed body, invariant violation, recorder failure, granted.
AuditRelationNodeHeartbeatPathGatenode_heartbeat.path_gateHandler defense-in-depth path-id gate. Distinct relation so dashboards can alert on "middleware was bypassed but handler caught it" without conflating it with ingestion-phase entries.
AuditRelationNodeReachabilityReadnode_reachability.readRead-side handler.
node_reachability.transition(literal in evaluator)Per-Domain reachability evaluator (slogReachabilityAuditSink.Record at cmd/plexsphere/evaluator_factory_prod.go).

The closed AuditOutcome set is enumerated in internal/transport/http/v1/handlers/events_deps.go. The heartbeat surface uses six distinct values so audit consumers triage clock-spoofing from malformed-body from invariant-violation from middleware-bypass from internal-error WITHOUT parsing free-form reason strings:

ValueEmitter / cause
grantedEvery layer on success.
node_id_mismatchHandler path-id gate (defense-in-depth).
clock_skewHandler clock-skew gate (client_now outside MaxHeartbeatSkew).
malformed_requestHandler JSON-decode arm.
invariant_violationRecorder rejection on the 32-byte checksum or non-empty version invariant.
internal_errorRecorder returned a non-sentinel error (Postgres outage, adapter bug).
insufficient_relationNSK middleware admission denial; reachability read-side ReBAC denial.
signature_failure(SSE bus only.)
BranchSourceRelationOutcomeReason
NSK middleware admission (granted)nsk.go middlewarenode_heartbeat.authenticategrantednsk_authenticated
NSK middleware refusal (revoked / invalid)nsk.go middlewarenode_heartbeat.authenticateinsufficient_relationnsk_invalid / nsk_revoked
Heartbeat admittedheartbeat.go step 6node_heartbeat.recordgrantedheartbeat ingested
Clock-skew refusalheartbeat.go step 4node_heartbeat.recordclock_skewclient_now outside MaxHeartbeatSkew window
Path-id mismatch (defense-in-depth)heartbeat.go step 2node_heartbeat.path_gatenode_id_mismatchresolved Node id does not match URL path id (defense-in-depth)
Malformed bodyheartbeat.go step 3node_heartbeat.recordmalformed_requestmalformed heartbeat envelope: <reason>
Checksum invariant violationheartbeat.go handleHeartbeatRecordErrornode_heartbeat.recordinvariant_violationbinary checksum invariant violated
Version invariant violationheartbeat.go handleHeartbeatRecordErrornode_heartbeat.recordinvariant_violationbinary version invariant violated
Recorder internal errorheartbeat.go handleHeartbeatRecordErrornode_heartbeat.recordinternal_errorheartbeat persistence failed: <reason>
Reachability projection read (granted)reachability.go step 3node_reachability.readgrantedreachability projection read
Reachability projection read (refused)reachability.go step 2node_reachability.readinsufficient_relationprincipal lacks node-agent relation
Transition Healthy → Staleevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat overdue (stale threshold exceeded)
Transition Stale → Unreachableevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat absent (unreachable threshold exceeded)
Transition Healthy → Unreachableevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat absent (skipped stale, hit unreachable)
Transition Stale → Healthyevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat resumed (back to healthy)
Transition Unreachable → Healthyevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat resumed (recovered from unreachable)
Transition Unreachable → Staleevaluator evaluateNodenode_reachability.transitiongrantedevaluator: heartbeat resumed (partial recovery to stale)

The full reason-string table lives in transitionReason in evaluator.go so audit consumers can switch on the exact strings without parsing free-form text. Idempotent no-op writes ((false, nil) from Writer.Update) suppress the audit + event arms — only the row that actually transitioned is the single emitter.

OpenAPI surface reference

The wire surface is pinned in api/openapi/plexsphere-v1.yaml under the mesh tag. The relevant operationIds and schemas:

OpenAPI artefactWhere in the specGo counterpart
PostNodeHeartbeat operationpaths./v1/nodes/{id}/heartbeat.postHandlers.PostNodeHeartbeat in heartbeat_dispatch.go (gated entry) and postNodeHeartbeat in heartbeat.go (body).
GetNodeReachability operationpaths./v1/nodes/{id}/reachability.getHandlers.GetNodeReachability in reachability_dispatch.go (gated entry) and getNodeReachability in reachability.go (body).
HeartbeatRequest schemacomponents.schemas.HeartbeatRequesthandlers.HeartbeatRequest in heartbeat.go.
HeartbeatResponse schemacomponents.schemas.HeartbeatResponsehandlers.HeartbeatResponse in heartbeat.go.
Reachability schemacomponents.schemas.Reachabilityhandlers.ReachabilityResponse in reachability.go; mirrored on the bounded-context value object tenancy.Reachability.
NodeStateSnapshot.reachability fieldcomponents.schemas.NodeStateSnapshot.reachability ($ref: Reachability)The reconciliation-pull surface (./reconciliation-pull.md) carries the same Reachability projection on the cold-start envelope so plexd recovers the verdict without a second round-trip.
Problem schemacomponents.schemas.Problem (shared)The four heartbeat refusal codes are clock_skew, node_id_mismatch, binary_checksum_empty, binary_version_empty; the read-side handler emits unauthorized, insufficient_relation, node_not_found; both surfaces emit heartbeat_not_provisioned / reachability_not_provisioned on the 501 fail-closed stub.

The OpenAPI byte-equality drift gate at tests/workspace/openapi_drift_test.go asserts that the source spec and the embedded mirror under internal/transport/http/v1/handlers/plexsphere-v1.yaml are byte- identical, so a spec drift surfaces at go test time, not at runtime.

Threat model

The heartbeat surface mitigates four classes of attack along the plexd → plexsphere ingestion path. Each mitigation is implemented in a single, named place so a reader chasing a security claim does not have to assemble it from multiple files.

  1. Clock spoofing. A compromised plexd with a forged client clock could mask absence by stamping client_now in the future, or trigger a spurious Unreachable verdict by stamping it in the distant past, or trigger a spurious recovery by stamping it after a long quiet period. Mitigation: the handler admits the heartbeat only when |client_now − server_now| ≤ MaxHeartbeatSkew (60s) — pinned at internal/platform/clock/skew.go — and the state machine then drives transitions purely from the server's now and the server-stamped last_heartbeat_at. Client- claimed timestamps NEVER reach Apply's input.
  2. Cross-Node impersonation. A Node with a valid NSK could attempt to heartbeat on behalf of another Node by POSTing to a sibling's /v1/nodes/{otherId}/heartbeat. Mitigation: the NSKAuthenticator middleware in internal/identity/authn/middleware/nsk.go resolves the bearer to a tenancy.Node and enforces a path-id- equals-resolved-Node-id gate; a mismatch surfaces as 403 node_id_mismatch. The handler runs the same check as defense-in- depth (heartbeat.go step 2) so a misconfigured router that mounts the handler without the middleware still rejects. The per-handler-path mount in cmd/plexsphere/heartbeat_factory_prod.go pins the middleware on /v1/nodes/{id}/heartbeat only — see the file-level DECISION block on why a global NSK gate would either reject every cookie-authenticated dashboard call or admit an NSK envelope where cookie/JWT was expected.
  3. Checksum strip / replay. An attacker with a captured heartbeat could attempt to replay it to mask a paused or compromised agent, or strip the binary_checksum field to bypass downstream integrity checks. Mitigation: the aggregate's RecordHeartbeat invariant rejects empty / wrong-length checksums with 400 binary_checksum_empty, and the persistence-layer CHECK constraint on plexsphere.nodes.binary_checksum is the schema-level backstop. The drifted-checksum case (a heartbeat whose checksum no longer matches the canonical artefact registry) is NOT refused here — that refusal lives in the artifact registry context once the canonical checksum store is in place; this context's invariant is "well-formed shape", not "matches the canonical artefact".
  4. Stale or revoked NSK admission. A stolen NSK that the operator has rotated could attempt to refresh a Node's liveness verdict after revocation. Mitigation: the NSK middleware's Resolver port refuses to admit a heartbeat whose NSK row is in the revoked set; the production binding will surface 401 nsk_invalid (mapped to the nsk_revoked audit reason) once the real plaintext-resolver lands. Today the failClosedNSKResolver returns ErrNSKResolverNotProvisioned for every Resolve call so EVERY NSK envelope is refused — fail-closed by construction.

What this context is NOT

This context ships the heartbeat ingestion + per-Node liveness-tracking surface. The following adjacent concerns are explicit non-goals — each is owned by a downstream context:

  • Hash-chained durable audit log. The audit entries this context emits flow through a slog-backed sink. The durable hash-chained sink, the audit relation registry, and the audit-log query API land in the Platform Audit Log context.
  • Node deregistration / lifecycle. A Node that goes Unreachable is NOT removed from the Domain by this surface. The deregistration workflow, the per-Node lifecycle state machine, and the operator-driven force_deregister action are owned by the Key & Peer Manager context — the lifecycle and deregistration sibling for the per-Node liveness surface. The reachability state remains Unreachable indefinitely until either a recovery heartbeat arrives or an operator action removes the Node.
  • Relay-fallback reachability heuristics. The bridge / relay orchestrator's choice of fallback bridge per peer pair MAY consult the reachability projection, but the heuristics themselves (latency, load, churn) are owned by the relay-fallback assignment context. The per-Node reachability projection is one of several inputs that consumer takes; this surface simply exposes the read.
  • Node state service reports. The state and reports blocks on NodeStateSnapshot (the reconciliation-pull envelope) ship as JSON null placeholders today and become populated value objects in the node state service context. The reachability projection on the same envelope is the only block this context populates; the rolled-up state reports are the node state service's responsibility.
  • Operator-tunable MaxHeartbeatSkew. The 60-second bound is a security invariant, not a tuning knob — the file-level DECISION block in internal/platform/clock/skew.go rejects the configuration alternative because widening would silently erode the replay-resistance the threat model relies on. A future per-Domain knob is NOT planned.
  • Rate limiting heartbeats below HeartbeatInterval. A plexd that POSTs faster than the per-Domain HeartbeatInterval is NOT rate-limited here — the surface admits every well-formed heartbeat. Per-Node rate limiting is a future surface; the brief explicitly lists it as a deferred edge case.
  • Driving the Reconcile / RotateKeys hint flags. The HeartbeatResponse envelope carries the two boolean hints today but the handler always emits them as false. Reconcile becomes load-bearing once snapshot-divergence detection lands as a follow-up to the reconciliation-pull surface; RotateKeys becomes load-bearing once the key rotation workflow lands. The wire shape is stable today so plexd's reconcile loop can pin against it.