Appearance
Reconciliation pull — GET /v1/nodes/{id}/state
This document is the authoritative bounded-context reference for the reconciliation-pull surface that ships under internal/mesh/state and exposes through GET /v1/nodes/{id}/state. It covers the ubiquitous language, the snapshot envelope shape, the four-block convergence with the SSE event taxonomy, the security-gate ordering, the OpenAPI cross- reference, and the deferred-wiring posture.
The pull is the plexd → plexsphere fallback channel: a single authoritative cold-start view that plexd consumes when it first comes up, when its Signed SSE Event Bus connection has been disconnected for longer than the replay window, or when an out-of-band request arrives to re-derive the desired state. Anything outside that surface — the per-Node SSE push at GET /v1/nodes/{id}/events, the underlying outbox relay, the JetStream broker, the policy compiler, the bridge orchestrator — is a collaborator the pull surface projects from, not a concern of this document. This document is the implementation-side reference for the README's Node State Service specification; the README pins the reconciliation contract and this document pins the in-process composition pipeline that produces it .
Status — partial delivery
This story ships the wire envelope, the handler body, the snapshot composer, and the production composition root of the reconciliation-pull surface. The handler is fail-closed by construction: the gate in internal/transport/http/v1/handlers/state_dispatch.go returns 501 Not Implemented with code signed_event_bus_not_provisioned whenever any of the three load- bearing security ports — SnapshotProvider, RelationChecker, NodeRepo — is nil. Three control-plane gates that the README and this document name as part of the pull's contract are deferred to follow-up work and are NOT live in the production binary today :
- Production wiring of the
RelationCheckerport — the handler refuses to mount without it, so the addressed Node'snode-agentReBAC relation cannot be enforced today. The composition root incmd/plexsphere/state_factory_prod.gopopulatesSnapshotProviderandNodeRepoonly; until the upstream ReBAC authorizer wiring lands a*authz.Authorizer-backed adapter onhandlers.Deps.RelationChecker, every production request continues to fall through thestate_dispatch.gogate to 501. The 403 path documented in the OpenAPI spec is therefore unreachable from the production binary even afterPLEXSPHERE_DSNis set. - Production wiring of the
AuditSinkport — the handler emits audit rows for thegrantedandinsufficient_relationoutcomes throughhandlers.Deps.AuditSinkwhen one is supplied, but the composition root does not yet plumb a*audit.Sink-backed adapter through. The control flow is unaffected — audit emission is best- effort and a nil sink simply suppresses the row — but operators scrapingrelation=node_state.pulloutcomes will see no rows until the sink is wired. - Production wiring of the
SnapshotProviderandNodeRepoports on the upstream binary path —BuildProductionStateFactoryreturns a non-nil bundle ONLY whenPLEXSPHERE_DSNis set and the factory closure is invoked fromcmd/plexsphere/main.go. The factory is the wiring receipt that prevents the earlier deferred- wiring lesson from recurring on this surface (the chainsaw E2E fixture and the source-substring drift gate at../../../tests/workspace/state_factory_wiring_receipt_test.gopin the regression). WithPLEXSPHERE_DSNunset the binary keeps the 501 stub, by design.
The chainsaw E2E fixture at ../../../tests/e2e/mesh/reconciliation-pull/chainsaw-test.yaml stands the production plexsphere binary up against a Postgres + SpiceDB stack inside a kind cluster, registers a Domain and N Nodes through the production POST /v1/domains/{id}/nodes surface, and asserts that the peers projection is correct. It carries skip: true until the SSE producer wiring lands because the fixture's terminal convergence gate (step 4 in its file-level DECISION block) issues an SSE peer_registered event and asserts the (node_id, mesh_ip, public_key) tuples agree byte-for-byte between the SSE channel and the pull response. That gate depends on cmd/plexsphere/sse_factory_prod.go's EventStream / EventPublisher seams being non-nil, which the SSE roadmap documents as deferred. The integration-tier equivalent at ../../../tests/integration/state_pull_sse_equivalence_test.go exercises the same convergence story without the chainsaw round-trip and is GREEN today.
What does ship and is load-bearing today:
- The wire envelope (
peers,policy,bridge,state,reports), pinned in the OpenAPI spec at../../../api/openapi/plexsphere-v1.yamland in the canonical Go value typestate.NodeStateSnapshot. The four placeholder blocks (policy,bridge,state,reports) ship as JSONnulluntil the policy fan-out, bridge orchestrator, and node-state report stories populate them — the wire shape is stable across those rollouts so plexd's reconcile loop can pin against it today. - The snapshot composer (
internal/mesh/state/snapshot_composer.go) with itsPeerSourceseam, single-SQL-round-trip contract, per-row invariant filtering (zero MeshIP,empty PublicKey,uuid.Nil NodeID), and WARN-on-drop structured logging. - The handler body (
internal/transport/http/v1/handlers/state.go) with the four-step gate ordering documented in the Security-gate ordering section below. - The production composition root (
cmd/plexsphere/state_factory_prod.go) with itspeerSourceAdapter(the sole place in the tree where theinternal/mesh/statepackage brushes againstpgx), its BUILD- time validation gate, and its empty-DSN early return. - The 501 fail-closed stub (
internal/transport/http/v1/handlers/state_dispatch.go) that prevents a half-wired posture from admitting traffic past gates that have not been provisioned.
Downstream stories that depend on the reconciliation-pull surface (the policy fan-out, the bridge orchestrator, and the node-state reports) MUST treat the placeholder blocks as null until their own event-types land — the wire field is present from day one so a diff-by-presence reconcile loop in plexd does not need a second deployment to start observing it.
Cross-references
../../../README.md#node-state-service— top-level specification of the Node State Service, thenode_state_updatedSSE event, and thestateblock inGET /v1/nodes/{id}/state../sse.md— the push-channel sibling. The pull and the push share thenode-agentReBAC relation, the audit-row vocabulary (AuditRelationNodeStatePullmirrorsAuditRelationNodeEventsSubscribe), and thesigned_event_bus_not_provisionedproblem code so a single alert rule scrapes both surfaces in the deferred-wiring posture.../../architecture/mesh-event-bus-roadmap.md— the deferred- wiring tracker. The reconciliation pull's chainsaw equivalence E2E carriesskip: trueuntil the items in this roadmap close.../../contributing/layout.md— bounded-context map row enumeratinginternal/stateandinternal/meshas the co-owners of the reconciliation-pull surface .../../../api/openapi/plexsphere-v1.yaml— OpenAPI spec forGET /v1/nodes/{id}/state, including theNodeStateSnapshot,NodeStatePeer,NodeStatePolicy,NodeStateBridge, andNodeStateReportsschemas and the registered Problem-Details codes.../../../internal/mesh/state/snapshot.go— canonical Go value type that mirrors the OpenAPI schema and theSnapshotProvider/PeerSourceports.../../../internal/transport/http/v1/handlers/state.go— handler body and gate ordering.../../../cmd/plexsphere/state_factory_prod.go— production composition root, thepeerSourceAdapter, and the fail-fast BUILD-time validation.../../../tests/e2e/mesh/reconciliation-pull/chainsaw-test.yaml— chainsaw E2E fixture, currentlyskip: trueper the deferred- wiring DECISION block at the top of the file.
Ubiquitous language
The terms below travel together across the Go code, the OpenAPI contract, the audit log, the structured-log attributes, and operator- facing tooling. Names are preserved verbatim in error messages and audit row vocabulary so a reader chasing a string from a log line finds it in the source without translation.
| Term | Definition |
|---|---|
| Reconciliation Pull | The HTTP-shaped fallback channel through which plexd re-derives a Node's desired state on cold start, after a replay-window-out-of-bounds disconnect, or on an explicit operator request. Served at GET /v1/nodes/{id}/state. |
| NodeStateSnapshot | The canonical reconciliation-pull envelope (internal/mesh/state/snapshot.go) with five wire blocks: peers, policy, bridge, state, reports. The OpenAPI schema of the same name (NodeStateSnapshot) is the wire mirror. |
| Addressed Node | The Node whose id is on the request path. The snapshot is composed for this Node — the peers projection excludes it explicitly so plexd does not program a self-peer. |
| Domain Peer | Any other Node in the addressed Node's Domain. Each domain peer projects into one entry in NodeStateSnapshot.Peers carrying the (NodeID, MeshIP, PublicKey) triple plexd needs to program a WireGuard tunnel. |
| SnapshotProvider | The transport-side port (internal/transport/http/v1/handlers/state_deps.go) the handler calls to compose the envelope. The production concrete is *state.SnapshotComposer; tests substitute a recording fake. The same signature is mirrored on the bounded-context port state.SnapshotProvider so the concrete satisfies both interfaces simultaneously. |
| PeerSource | The persistence-side seam (internal/mesh/state/snapshot.go) the composer reads peer rows through. Production wiring is the peerSourceAdapter in cmd/plexsphere/state_factory_prod.go; tests substitute a recording fake without pgx machinery. The seam exists so internal/mesh/state stays free of direct pgx imports — the no-direct-persistence-from-contexts depguard rule denies them. |
| PeerRow | The per-row shape PeerSource yields to the composer. Byte-shape-compatible with sqlcgen.SnapshotPeersForDomainRow projected into domain types. A row with a corrupt MeshIP / empty PublicKey reaches the composer for filtering — the source MUST NOT silently drop. |
| Per-peer invariants | The trio enforced inside the composer: NodeID != uuid.Nil, MeshIP.IsValid() && !MeshIP.IsUnspecified(), len(PublicKey) > 0. A row that fails any check is dropped with a single WARN-level slog line and the snapshot composes from the survivors. |
| Placeholder Block | A wire field (policy, bridge, state, reports) that ships as JSON null today and will carry a populated value object once its owning downstream story lands. The field itself is always present on the wire so plexd's reconcile loop can diff by presence rather than absence. |
| AuditRelationNodeStatePull | The canonical Relation string the state handler stamps on every audit row: node_state.pull. Mirrors AuditRelationNodeEventsSubscribe so the push and pull surfaces emit consistent audit shapes. |
| node-agent relation | The ReBAC relation enforced on both GET /v1/nodes/{id}/state and GET /v1/nodes/{id}/events. Any caller authorised to subscribe to a Node's SSE event stream is also authorised to issue a reconciliation pull. |
| signed_event_bus_not_provisioned | The Problem-Details code returned by the 501 fail-closed stub. Shared with the SSE peer endpoint so a single alert rule scrapes both surfaces — the two unblock together when the composition root wires the deferred ports. |
| Wiring Receipt | The lesson the earlier SSE wiring left behind: a production binary whose composition root quietly leaves a load-bearing port nil silently 501s, and the gap is invisible until the first request lands in production. BuildProductionStateFactory is the wiring receipt for this surface. |
Snapshot envelope diagram
The wire envelope is the state.NodeStateSnapshot value type defined in internal/mesh/state/snapshot.go. The OpenAPI schema of the same name is its byte-mirror; the transport layer projects the Go value onto the schema directly without an intermediate DTO.
text
┌────────────────────────────────────────────────────────────┐
│ GET /v1/nodes/{addressed-id}/state │
│ Accept: application/json │
│ Authorization: Bearer <node-agent JWT> │
└─────────────────────────┬──────────────────────────────────┘
│
▼
┌────────────────────────────────────────────────────────────┐
│ NodeStateSnapshot │
├────────────────────────────────────────────────────────────┤
│ peers [ NodeStatePeer, NodeStatePeer, … ] │
│ ── one entry per OTHER Node in the Domain │
│ ── ordered by node_id ASC (REQ-005) │
│ ── per-peer invariants enforced (REQ-013) │
│ │
│ reachability { state, last_heartbeat_at, changed_at } │
│ ── always present (PX-0019, REQ-009) │
│ │
│ policy null (placeholder until S028) │
│ bridge null (placeholder until S037) │
│ state null (placeholder until S055) │
│ reports null (placeholder until S055) │
└────────────────────────────────────────────────────────────┘
Each NodeStatePeer carries the minimum plexd needs to program a
WireGuard tunnel — the triple is byte-compatible with the
RegisterPeer payload so the bootstrap and the reconciliation
pull converge on a single wire contract (PX-0018, REQ-001;
PX-0015, REQ-001):
┌────────────────────────────────────────────────────────────┐
│ NodeStatePeer │
├────────────────────────────────────────────────────────────┤
│ node_id uuid (UUIDv7) │
│ mesh_ip string ("10.42.0.7") │
│ public_key string (base64-padded, 44-char) │
└────────────────────────────────────────────────────────────┘The peers array is always present — empty [] when the addressed Node has no peers, never null. The four placeholder blocks may be null until their owning story populates them. Both posistions are pinned in the OpenAPI schema's required keyword and covered by the integration-tier happy-path test at ../../../tests/integration/state_pull_happy_path_test.go.
The reachability block is always present — never null — and projects the addressed Node's heartbeat-driven liveness verdict so an SSE subscriber and a follow-up GET /v1/nodes/{id}/state observe the same to_state after a transition. The Go counterpart is state.Reachability in ../../../internal/mesh/state/snapshot.go, with the shape {State: string, LastHeartbeatAt: time.Time, ChangedAt: time.Time}. The state field is one of healthy, stale, unreachable, or "" ("not yet evaluated"); last_heartbeat_at is the wall-clock time of the last accepted heartbeat (zero when none has been received yet); changed_at is the wall-clock time the state field last transitioned. The block is projected from the corresponding reachability_state, last_heartbeat_at, and changed_at columns on plexsphere.nodes (added by migration ../../../internal/platform/db/migrations/0010_node_reachability.sql) through the ReachabilitySource SQL seam declared alongside PeerSource. See ./reachability.md for the reachability state-machine semantics.
Four-block convergence with the SSE event taxonomy
Every block in NodeStateSnapshot has a counterpart event-type on the Signed SSE Event Bus. The pull is the cold-start fallback for the same state the bus delivers as incremental deltas; the table below pins which event-type populates each block, which downstream story owns the block, and where the block currently sits in the rollout.
| Snapshot block | SSE event-type counterpart | Owning story | Status today |
|---|---|---|---|
peers | peer_added, peer_removed, peer_key_rotated, peer_endpoint_changed | this surface; the pull is the cold-start fallback for the SSE peer-delta stream | Live. The composer projects the per-Domain peer set excluding the addressed Node, ordered by node_id ASC, with per-peer invariants enforced. Byte-equality with the SSE peer projection is pinned by ../../../tests/integration/state_pull_sse_equivalence_test.go. |
policy | policy_updated | Policy Engine fan-out | Live. The composer reads the per-(Node, Policy) plexsphere.policy_compiled_ruleset rows through the PolicySource port and projects the merged (revision_id, fingerprint, rules) shape onto NodeStateSnapshot.Policy; a Node with no matched Policy still sees JSON null. The wire-side policy_updated fan-out runs from the compile-service arm with byte-equal fingerprint, pinned by ../../../tests/integration/policy_updated_wire_fanout_test.go. |
bridge | bridge_config_updated | Bridge Orchestrator | Live. The composer reads the four bridge aggregate sets plus the per-peer relay-assignment rows through the BridgeSource port and projects the effective.EffectiveConfigBuilder output onto NodeStateSnapshot.bridge; a Node hosting no bridge Resource still sees JSON null. The wire-side bridge_config_updated fan-out runs from the bridge application services with byte-equal effective_config, pinned by the parity suite at tests/integration/bridge_config_updated_pull_parity_test.go. The README's "same payload, two channels" contract holds: the SSE effective_config bytes equal the pull bridge block for the same Node and bridge Resource. See ../bridge/events.md for the publisher-side dispatch table and the per-Node fan-out algorithm. |
state (and reports) | node_state_updated | Node State Service | Live. The composer reads the addressed Node's node-state entries through the EntriesSource port and fans them by kind into the three NodeStateReports buckets — platform-owned metadata, platform-owned data, and upstream reports — ordered by key ascending; each bucket is [] (never null) when the Node carries no entry of that kind. reports mirrors state (the composer points both at the same value object) so a future split between "live state" and "rolled-up reports" lands without an OpenAPI break. Only platform-owned metadata/data writes fan out a node_state_updated event; an upstream report write does not. The SSE/pull equivalence is pinned by ../../../tests/integration/nodestate_snapshot_convergence_test.go. See ../state.md for the Node State Service model, the report ACL, and the closed event set. |
The convergence rule the table encodes is the README's reconciliation contract: every state delivered as an SSE event-type MUST also be recoverable from the pull response. Today only the peers block carries that guarantee end-to-end; the placeholder blocks make the contract structurally explicit so a downstream caller is not surprised by a future schema growth.
Security-gate ordering
The handler in internal/transport/http/v1/handlers/state.go runs four gates in a deliberate, fail-closed order. The order is security-driven, not performance-driven: each step rejects a strict superset of the requests the next step would have rejected, so re-ordering any pair admits a leak or burns work an earlier reject would have skipped.
The same ordering is used by the SSE peer endpoint (./sse.md) — both surfaces share the contract so a single mental model covers GET /v1/nodes/{id}/events and GET /v1/nodes/{id}/state.
| Step | Gate | Reject behaviour | Why this position |
|---|---|---|---|
| 1 | Authn — authn.FromContext(ctx) returns a non-KindUnknown Principal. | 401 Unauthorized with code unauthorized. | Reject unsigned callers BEFORE we burn a SpiceDB round-trip. |
| 2 | Authz (ReBAC) — RelationChecker.Check(ctx, principal, "node-agent", "node:<id>") returns (true, nil). | 403 Forbidden with code insufficient_relation; audit row outcome=insufficient_relation is emitted when a sink is wired. | Reject the unauthorised caller BEFORE we leak Node existence via the 404/403 timing differential. |
| 3 | Lookup — NodeRepo.GetByID(ctx, id) returns (node, nil); an errors.Is(err, ErrNodeNotFound) miss renders 404. | 404 Not Found with code node_not_found on the not-found arm; 500 Internal Server Error on any other error. | Reject the missing Node BEFORE we burn a snapshot composition. Done AFTER authz so an unauthorised caller cannot probe Node existence. The single GetByID call replaces the previous Exists-then-GetByID double round-trip — the lookup doubles as the 404 gate AND the DomainID resolver for the next step. |
| 4 | Snapshot composition — SnapshotProvider.SnapshotForNode(ctx, id, domainID) returns (snapshot, nil). | 500 Internal Server Error with no public code; the underlying error is logged with structured fields. | Compose the wire body LAST. The composer's contract guarantees a single SQL round-trip and ascending peer order so the response is deterministic and byte-stable across consecutive pulls. |
A successful pass through all four gates emits an audit row with relation=node_state.pull, outcome=granted, subject=<principal>, object=node:<id> (when an AuditSink is wired) and writes the NodeStateSnapshot JSON body with 200 OK. A nil AuditSink suppresses the audit row but does not change the security control flow — the gate at the top of state_dispatch.go deliberately excludes AuditSink so a missing sink does not over-fire the 501 stub .
The 501 fail-closed stub at internal/transport/http/v1/handlers/state_dispatch.go sits in front of every gate above. It refuses to dispatch to the body when ANY of the three load-bearing security ports is nil, returning 501 Not Implemented with code signed_event_bus_not_provisioned. The code is shared with the SSE peer endpoint because the two surfaces unblock together when the composition root wires the deferred ports — a single alert rule on the code suffices to catch either surface in the deferred posture.
OpenAPI cross-reference
The wire surface is pinned in ../../../api/openapi/plexsphere-v1.yaml under the mesh tag. The relevant operationIds and schemas:
| OpenAPI artefact | Where in the spec | Go counterpart |
|---|---|---|
GetNodeState operation | paths./v1/nodes/{id}/state.get | Handlers.GetNodeState in internal/transport/http/v1/handlers/state_dispatch.go (gated entry) and getNodeState in internal/transport/http/v1/handlers/state.go (body). |
NodeStateSnapshot schema | components.schemas.NodeStateSnapshot | state.NodeStateSnapshot in internal/mesh/state/snapshot.go; projected onto server.NodeStateSnapshot (oapi-codegen output) by buildNodeStateResponse in state.go. |
NodeStatePeer schema | components.schemas.NodeStatePeer | state.Peer in snapshot.go; projected onto server.NodeStatePeer with the PublicKey field base64-encoded with standard padding to mirror RegisterPeer's convention. |
NodeStatePolicy schema | components.schemas.NodeStatePolicy | state.Policy (placeholder until the Policy Engine fan-out lands). |
NodeStateBridge schema | components.schemas.NodeStateBridge | state.Bridge (placeholder until the Bridge Orchestrator lands). |
NodeStateReports schema | components.schemas.NodeStateReports | state.Reports — the composer fans the addressed Node's entries into the Metadata / Data / Reports buckets and points both the state and reports blocks at the same value (the mirror). See ../state.md. |
Problem schema | components.schemas.Problem (shared) | problem.Problem in the transport layer. The state handler emits codes unauthorized, insufficient_relation, node_not_found, signed_event_bus_not_provisioned. |
The OpenAPI byte-equality drift gate at ../../../tests/integration/state_pull_openapi_drift_test.go asserts that the source spec at ../../../api/openapi/plexsphere-v1.yaml and the embedded mirror under ../../../internal/transport/http/v1/handlers/ are byte-identical, AND that the generated ServerInterface.GetNodeState signature pins the (http.ResponseWriter, *http.Request, openapi_types.UUID) shape via reflect.Type inspection. A spec drift therefore surfaces at go test time, not at runtime.
Composition flow
The composition flow runs strictly in the order shown — each step's output is the next step's input, and no step is skipped on the happy path:
text
┌────────────────────────────────────────────────────────────┐
│ HTTP request: GET /v1/nodes/{id}/state │
└────┬───────────────────────────────────────────────────────┘
│ 1. authn middleware attaches Principal to ctx
v
┌────────────────────────────┐
│ state_dispatch.go : gate │ if SnapshotProvider, RelationChecker,
└────┬───────────────────────┘ or NodeRepo is nil → 501
│
v
┌────────────────────────────┐
│ state.go : getNodeState │
└────┬───────────────────────┘
│ 2. authn check ────────► 401 if no Principal
│ 3. RelationChecker ────────► 403 if not allowed
│ (audit: insufficient_relation)
│ 4. NodeRepo.GetByID ────────► 404 if ErrNodeNotFound
│ resolve DomainID otherwise
v
┌────────────────────────────┐
│ SnapshotComposer │ PeersForDomain (single SQL round-trip)
│ .SnapshotForNode │ ── ORDER BY node_id ASC
└────┬───────────────────────┘ ── id <> $2 (exclude addressed)
│ 5. drop rows that fail per-peer invariants
│ (WARN-log each drop with structured fields)
│ 6. defensive copy of PublicKey slice per row
v
┌────────────────────────────┐
│ buildNodeStateResponse │ base64-encode PublicKey
│ (state.go) │ Policy/Bridge/State/Reports = nil
└────┬───────────────────────┘
│ 7. emit audit row outcome=granted
v
┌────────────────────────────┐
│ writeJSON 200 │
└────────────────────────────┘Failure modes are typed: errZeroNodeID and errZeroDomainID short- circuit at step 4 with a 500 (the errors.Is(err, ErrNodeNotFound) arm renders 404 instead); a PeersForDomain error short-circuits at step 5 with a 500 wrapping the underlying error; and any nil port short-circuits at the entry stub with a 501. The composer NEVER returns a 200 with a partial peer set on an error path — partial success is preserved only on the per-peer invariant filter, which is a row-level WARN-and-continue path required by the per-peer invariant contract.
Composition root and the production wiring receipt
cmd/plexsphere/state_factory_prod.go is the wiring receipt for this surface. It validates productionStateConfig at BUILD time (not inside the returned closure), so a misconfigured operator sees the failure before /readyz lights up green. An empty PLEXSPHERE_DSN returns (nil, nil) deliberately — the caller in cmd/plexsphere/main.go falls back to the 501 stub path so an operator who has not opted into production wiring sees the gap during boot, not during the first pull.
The peerSourceAdapter at the bottom of the file is the SOLE place in the tree where the internal/mesh/state package brushes against pgx. The trade-off — a single uuid.UUID → pgtype.UUID translation — is the price of keeping the no-direct-persistence-from-contexts depguard rule clean. The adapter performs exactly one SQL round-trip per call (the SnapshotPeersForDomain query in ../../../internal/platform/db/queries/10_tenancy.sql) and the composer's bench (../../../internal/mesh/state/snapshot_bench_test.go) asserts that contract on a 64-Node Domain.
The wiring-receipt drift gate at ../../../tests/workspace/state_factory_wiring_receipt_test.go reads cmd/plexsphere/main.go from disk and asserts both the BuildProductionStateFactory( call and the StateFactory: Config assignment are present — the test exists specifically to catch the earlier wiring-receipt regression: a production binary whose composition root quietly leaves a load-bearing port nil silently 501s, and the gap is invisible until the first real request lands. The end-to-end factory coverage proper lives at the integration tier in ../../../tests/integration/state_pull_happy_path_test.go and its peers, which boot BuildProductionStateFactory against a testcontainers Postgres and assert a non-error response.
Operator runbook (post-wiring)
Until the deferred items in ../../architecture/mesh-event-bus-roadmap.md close and RelationChecker + AuditSink are wired through the composition root, the production binary always returns 501 on this endpoint and there is nothing to inspect operationally. The post-wiring runbook below is the entry points an operator will chase once the surface is hot.
- Inspect a single Node's snapshot. The pull is idempotent; an operator can issue
GET /v1/nodes/{id}/statewith anode-agentscoped JWT and diff the response againstkubectl get plexdnodestateon the addressed Node. Thepeersarray is sorted bynode_id ASCso two consecutive pulls against the same ledger snapshot are byte- equal — a non-empty diff is a real change, not a non-determinism. - Inspect a Domain's peer projection. The SQL query backing the pull lives at
../../../internal/platform/db/queries/10_tenancy.sqlunder the:many SnapshotPeersForDomainname. An operator can issue the same projection by hand:SELECT node_id, mesh_ip, public_key FROM plexsphere.tenancy_nodes WHERE domain_id = $1 AND id <> $2 ORDER BY node_id ASC;and compare against the pull response. - Watch the
relation=node_state.pullaudit stream. Once theAuditSinkis wired the handler emits one audit row per request withoutcome=granted(success) oroutcome=insufficient_relation(403). A spike ofinsufficient_relationcorrelates with a misconfigured ReBAC tuple set; a sustained absence of any rows correlates with the deferred wiring still being in place — check the 501 rate on/v1/nodes/{id}/statefirst. - Look for WARN-on-drop log lines. The composer emits a single WARN-level slog line per dropped peer row carrying
outcome=invalid_node_id/invalid_mesh_ip/empty_public_key, plus thedomain_id,addressed_node_id, andpeer_node_idfields. A non-zero rate is the operator's signal that the ledger holds a corrupt row that should be reconciled — the snapshot itself remains correct because the composer drops the offending row from the response.