Appearance
Node State Service — per-Node metadata, data, and reports
This document is the authoritative bounded-context reference for the Node State Service, the per-Node key/value sub-context of the mesh context that ships under ../../internal/mesh/nodestate/. It owns the small, addressable facts a Node carries — the State Entry aggregate — and the two surfaces that read and write them: the platform-owned write path that fans a node_state_updated change out over the Signed Event Bus, and the upstream, NSK-authenticated report channel a workload on the Node pushes telemetry through.
The Node State Service is the producer and owner of node-state entries. How a Node converges on a delivered change — the per-Node SSE push at GET /v1/nodes/{id}/events, the JetStream relay, and the cold-start reconciliation pull at GET /v1/nodes/{id}/state — are collaborators this context emits into and projects through, not concerns it owns. This page is the implementation-side reference for the Node State Service specification in the top-level README.
Ubiquitous language
The terms below travel together across the Go code, the OpenAPI contract, the persistence schema, and operator-facing tooling. Names are preserved verbatim so a reader chasing a string from a log line or a wire field finds it in the source without translation.
| Term | Definition |
|---|---|
| State Entry | One addressable key/value fact a Node carries, keyed by the composite (Node, Kind, Key). The value object lives at ../../internal/mesh/nodestate/entry.go; Node identity is supplied by the service scope and is deliberately not a field on the entry. |
| Kind | The closed set of three entry kinds — metadata, data, report — that partitions the per-Node key namespace into channels and drives the event-emission rule. |
| Platform-owned | A metadata or data entry an operator (or a control-plane workflow on the operator's behalf) sets. A platform-owned write fans out a node_state_updated event. |
| Upstream / report | A report entry a workload running on the Node authors and pushes back up to the control plane. It is the only kind a Node-side workload may write, and a report write does not fan out an event. |
| Workload tag | An optional, free-form label on an entry identifying the workload that produced it. Carried opaquely; empty when the entry is not attributed to a workload. |
node_state_updated | The single wire event the context emits, for platform-owned changes only. Its payload carries the routing fields domain_id and node_id plus the changed key and value — never secrets. |
| Report quota | The per-Node cap on the number of report entries a single Node may accumulate, bounding an over-chatty workload from exhausting the table. A re-report of an existing key never grows the count; only a new key past the cap is refused. |
| State Entry coordinate | The (node_id, kind, key) primary key of plexsphere.node_state_entry. A repeated report of the same coordinate is an upsert, never a duplicate row. |
The three entry kinds
A State Entry is one of three kinds, and the kind decides both who may write it and whether the write notifies downstream consumers. The domain pins the set as a closed enum in ../../internal/mesh/nodestate/entry.go; an unrecognised kind is rejected at the value-object boundary before any persistence write.
metadata— platform-owned. An operator sets it to describe the Node from the platform's point of view. A write fans out anode_state_updatedevent.data— platform-owned. Written from the platform side likemetadata. Themetadata/datasplit is a namespacing convention the domain experts use to separate descriptive labels from operational payload — not a difference in ownership. A write fans out anode_state_updatedevent.report— upstream. Written by a workload running on the Node, reporting facts back up to the control plane. It is the only kind a Node-side workload may author, and a report write fans out no event.
The single predicate EntryKind.PlatformOwned() (true for metadata and data, false for report) is the one place the "platform-owned changes only" event rule lives, so the rule cannot drift between the write paths.
The State Entry value object and its invariants
NewEntry validates a candidate entry against the same limits the persistence layer's SQL CHECK constraints enforce, so a malformed entry is rejected at the application boundary rather than at the INSERT. The constructor rejects:
- a kind outside
{metadata, data, report}, with a dedicated invalid-kind sentinel distinct from the key sentinel so the transport layer can branch to different Problem codes; - a key that does not match the grammar
^[a-z][a-z0-9._-]{0,127}$— a lowercase leading letter, then up to 127 of[a-z0-9._-], for a total length in[1, 128]; - a value longer than 4096 bytes.
The workload tag carries no constructor bound: the persistence layer places no CHECK on the workload_tag column, so adding a value-object bound would invent an invariant the SQL layer does not share and risk the two drifting apart. A bound can be added in lockstep with a column CHECK if a later requirement calls for one.
The node_state_updated event and its closed set
The context emits exactly one wire event, declared in ../../internal/mesh/nodestate/events/events.go:
| Field | Meaning |
|---|---|
event_id | A fresh UUIDv7 minted at construction. |
occurred_at | The change time, normalised to UTC so payloads are timezone-stable. |
domain_id | The owning Domain — the relay extracts it to route the change. |
node_id | The affected Node — the relay extracts it to fan the change to the right Node. |
key | The changed entry key. |
value | The changed value. An empty value is the agreed removal marker a delete emits. |
Two governance rules bound the event:
- Platform-owned changes only.
SetPlatformEntryandDeletePlatformEntryappend anode_state_updatedoutbox row inside the same transaction as the entry write, so the entry and its event commit atomically. A report write emits no event — emitting one would invert the direction of the notification and wake platform consumers on every workload heartbeat. A delete reusesnode_state_updatedcarrying an empty value rather than minting a separate "removed" event type; a downstream consumer converges on the(key, empty-value)signal the same way it converges on any other value change. - The set is closed and pinned. The node-state event set has exactly one member. A workspace AST gate parses the events package for the declared constant set and walks the context's emit sites, so a silently-added second event type fails the build rather than reaching the wire.
The payload never carries secrets — node-state values are the small, addressable facts a Node reads and writes, not credentials. After the transaction commits, the change is also pushed onto an optional immediate wire publisher when one is wired; a nil publisher leaves the service on the outbox-only posture (the event rides the outbox relay alone), and a post-commit publish failure is logged and swallowed because the event has already committed durably and the relay path recovers it.
Report writes — the NSK-authenticated upstream channel
A workload on a Node records and removes its reports through two HTTP operations served by ../../internal/transport/http/v1/reports/handler.go:
PUT /v1/nodes/{id}/state/reports/{key}— record (or re-report) one upstream report entry. The entry is stored withkind = reportand fans out no event. The body carries thevalue(capped at 4096 bytes) and an optionalworkload_tag; the response echoes the storedkeyand the serveraccepted_attimestamp.DELETE /v1/nodes/{id}/state/reports/{key}— remove the report the Node recorded under the addressed key. A delete for a key the Node carries no report under is answered with 404; a delete that lands is 204.
Authentication and the report ACL. The caller authenticates with its per-Node Node Secret Key (NSK) in the Authorization: Bearer header. The handler asserts the resolved NSK Node matches the path id, so a Node may only write or delete its own reports. The gate order mirrors the Secret Store fetch handler: a missing, malformed, or revoked credential surfaces as 401 before any work; an NSK that authenticates but resolves to a different Node than the path id surfaces as 403; only then does the body decode and the service call run. The active-DSN binary routes the report paths through the NSK middleware so a request without a valid NSK never reaches the handler body.
The report quota. Report entries are upstream, workload-authored telemetry, so the service bounds how many a single Node may hold (a deliberate per-deployment-overridable default). The quota check and the upsert run inside one transaction so the count the check reads and the row it writes stay consistent. Re-reporting an existing report key is always admitted because it does not grow the count; only a new report key that would push the Node past its quota is refused.
Problem codes. The surface emits a closed taxonomy pinned in ../../internal/transport/http/v1/reports/errors.go, each on an RFC 9457 application/problem+json body with Cache-Control: no-store:
| Status | code | Cause |
|---|---|---|
| 400 | invalid_report | The value exceeds the 4096-byte cap, or the key violates the entry grammar. |
| 400 | report_quota_exceeded | A new report key would breach the per-Node report quota. |
| 401 | unauthorized | The NSK is missing, malformed, or revoked. |
| 403 | node_id_mismatch | The NSK resolves to a different Node than the path id. |
| 404 | node_not_found | No Node resolves for the addressed id. |
| 404 | report_not_found | The Node exists but carries no report under the addressed key (delete path). |
| 413 | payload_too_large | The request body exceeds the handler's defensive size cap. |
| 501 | reports_not_provisioned | The node-state service is not wired into this build. |
Persistence
The context persists a single table, created by migration ../../internal/platform/db/migrations/0048_node_state.sql:
plexsphere.node_state_entry— one row per(node_id, kind, key)coordinate.node_idreferencesplexsphere.nodesON DELETE CASCADEso the table self-prunes when a Node is deleted (a state entry is a derived report with no life once its Node is gone).kindcarries aCHECKpinning the closed three-channel set;keycarries aCHECKenforcing the same grammar the value object does; andvaluecarries aCHECKcapping the payload atoctet_length(value) <= 4096. The redundantCHECKs are deliberate defence in depth: the domain validates at the application boundary, and the constraints make a malformed key or oversized value structurally impossible even if a future raw-SQL writer bypasses the aggregate.
The composite (node_id, kind, key) primary key means a (kind, key) coordinate addresses exactly one entry per Node and a repeated report is an upsert, not a duplicate row. The migration's downgrade refuses the rollback (raising feature_not_supported): dropping the table would discard every reported entry, and re-running the upgrade would restore only an empty table.
The sqlc-annotated queries backing the repository live at ../../internal/platform/db/queries/S0_node_state.sql: an upsert keyed by the coordinate, an idempotent delete, and a list ordered by (kind, key). The list order is part of the contract — the pull path projects these rows into a snapshot, and two consecutive pulls of an unchanged Node must be byte-equal, which an unordered scan cannot guarantee.
There is no internal/mesh/nodestate/repo package: the no-direct-persistence-from-contexts rule denies pgx imports from anything under internal/mesh/, so the pgx Repository adapter lives at the composition root in ../../cmd/plexsphere/nodestate_factory_prod.go rather than in a context-owned subpackage — the same posture the reconciliation-pull peerSourceAdapter follows.
Snapshot inclusion — the pull as a cold-start fallback
Every state the bus delivers as an incremental node_state_updated delta must also be recoverable from the reconciliation pull. The pull's snapshot composer reads the addressed Node's entries through the EntriesSource seam declared in ../../internal/mesh/state/snapshot.go and fans them by kind into three buckets:
metadata— the platform-owned descriptive entries;data— the platform-owned operational entries;reports— the upstream, workload-authored entries.
Each bucket is a flat, key-ascending list of (key, value, workload_tag) triples, and an empty bucket serialises as [], never null. The snapshot envelope carries the projection in both its state and reports blocks: the composer points both at the same value object (the mirror), so a future split between "live state" and "rolled-up reports" can land without an OpenAPI break. The byte-stable ordering and the SSE/pull equivalence are pinned by the integration suite at ../../tests/integration/nodestate_snapshot_convergence_test.go.
OpenAPI surface
The wire surface is pinned in ../../api/openapi/plexsphere-v1.yaml under the mesh tag:
| OpenAPI artefact | Where in the spec | Notes |
|---|---|---|
PutNodeStateReport operation | paths./v1/nodes/{id}/state/reports/{key}.put | Records one report entry; 200 / 400 / 401 / 403 / 404 / 413 / 501. |
DeleteNodeStateReport operation | paths./v1/nodes/{id}/state/reports/{key}.delete | Deletes one report entry; 204 / 401 / 403 / 404 / 501. |
NodeStateReportRequest schema | components.schemas.NodeStateReportRequest | Required value (≤ 4096 bytes), optional workload_tag. |
NodeStateReportResponse schema | components.schemas.NodeStateReportResponse | Required accepted_at, key. |
NodeStateReports schema | components.schemas.NodeStateReports | The three required metadata / data / reports arrays of StateEntry; backs both the state and reports blocks of NodeStateSnapshot. |
StateEntry schema | components.schemas.StateEntry | The (key, value, workload_tag) triple within a bucket. |
The {key} path parameter carries the same ^[a-z][a-z0-9._-]{0,127}$ pattern the domain enforces, so a malformed key is rejected at the wire boundary before the handler body runs.
Composition root and the deferred-wiring posture
../../cmd/plexsphere/nodestate_factory_prod.go is the production wiring receipt for the report-write surface. It keys its opt-in switch on the shared PLEXSPHERE_DSN — the same pool the reconciliation-pull surface activates on — because a report write is a pure Postgres mutation plus an outbox append and needs no external backend. An empty DSN returns a nil bundle so the report handlers stay on their 501 reports_not_provisioned stub and the binary still boots; this fail-fast posture surfaces a half-wired gap during boot rather than on the first report write after /readyz has gone green. When the DSN is set, the factory opens the pool, assembles the pgx Repository adapter behind the node-state application service, wraps that service onto the reports transport Service port, and hands back the transport Deps that flip the report handlers off their stub.
The immediate-wire EventPublisher is not wired here: the service runs the outbox-only posture, so the platform-owned writes append a node_state_updated outbox row the SSE relay routes onto the wire, and an immediate push is a latency optimisation rather than a correctness requirement. This mirrors the Secret Store factory, which leaves its publisher unwired against the same mesh-event-bus roadmap.
What the Node State Service is not
- Not the SSE bus or the relay. The context appends a
node_state_updatedoutbox row; the Signed Event Bus and its relay own the routing of that row onto the wire. The publisher'swireTypeFordispatch maps thenodestate.NodeStateUpdatedoutbox literal onto thenode_state_updatedwire type. - Not the reconciliation-pull surface. This context is the source the pull projects the
state/reportsblocks from; the pull's composer, handler, and security gates are owned there. - Not a secret channel. Node-state values are small, addressable, non-secret facts. Secret material is the Secret Store's concern; the
node_state_updatedpayload never carries it. - Not an audited write path yet. The context declares an
AuditSinkport as part of its outbound contract, but the current write paths carry no audit requirement and do not consume it; the seam exists so the composition root has one place to bind an audit adapter when an audited node-state operation lands.
Cross-references
../../internal/mesh/nodestate/doc.go— the domain root that pins the ubiquitous language and the closed event-type governance.../../internal/mesh/nodestate/service.go— the application service: the platform-owned write paths and the upstream report-write paths.../../internal/mesh/nodestate/ports.go— the outbound ports (Repository,EventPublisher,AuditSink,Clock) and the required-vs-nil-tolerated split../mesh/reconciliation-pull.md— the cold-start fallback channel that projects thestate/reportsblocks from this context's entries../mesh/sse.md— the push-channel sibling that routes thenode_state_updatedoutbox row onto the wire.../../api/openapi/plexsphere-v1.yaml— the OpenAPI spec for the report operations and the node-state schemas.../../README.md#node-state-service— the top-level specification of the Node State Service../audit/index.md— the Platform Audit Log an audited node-state operation would write to once the sink is wired.