Skip to content

Capability Manifest Ingest — agent-side capability snapshots, persisted, diffed, and emitted

This document is the authoritative bounded-context reference for the capability manifest ingest surface — the seam between a plexd agent's per-Node snapshot of "what binary am I running, what hooks do I advertise, what host key am I presenting" and the durable projection downstream consumers (the integrity correlator and the dashboard projector named in the Out of scope section) observe through the NodeCapabilitiesUpdated outbox event. It covers the ubiquitous-language pins, the on-the-wire request and response shape, the exhaustive catalogue of Problem codes the handler emits, the NSK authentication model the transport inherits from the heartbeat sibling, the diff and event contract that translates a per-PUT snapshot into an outbox emission, the Postgres persistence shape migration 0035_node_capabilities.sql defines, and the four downstream consumers this slice deliberately leaves out of scope.

The capability ingest surface is a snapshot ingestor and a diff emitter — and only that. The agent publishes a self-describing manifest on every meaningful change; the control plane persists the canonical form, computes the diff against the prior persisted row, and on a non-empty diff emits a single outbox event carrying the field-level transition list and the security-load-bearing host-key flag. The ingest surface deliberately does NOT correlate against attestation, does NOT compare against the Artifact Registry, and does NOT participate in CRD-style hook scheduling; those concerns live in named follow-up stories.

Ubiquitous language

Five terms travel verbatim across the tenancy.CapabilityManifest value object, the PUT /v1/nodes/{id}/capabilities handler, the plexsphere.node_capability_manifest table, and the tenancy.NodeCapabilitiesUpdated outbox event. Internal code never paraphrases them; documentation and Problem detail strings adopt the exact spelling below.

TermDefinitionCode anchor
CapabilityManifestThe value-object snapshot every plexd agent publishes through PUT /v1/nodes/{id}/capabilities. Carries the agent binary version + checksum, an optional SSH host-key fingerprint, and a list of DeclaredHook entries. Immutable: the receiver-side Node.RecordCapabilities returns a fresh Node rather than mutating the prior.internal/identity/tenancy/capability_manifest.go (CapabilityManifest)
DeclaredHookA single (name, sha256_checksum) tuple the agent advertises. The constructor rejects empty names and non-32-byte checksums; the parent CapabilityManifest rejects duplicate names case-sensitively and caps the list at CapabilitiesMaxDeclaredHooks (128).internal/identity/tenancy/capability_manifest.go (DeclaredHook)
ManifestDiffThe pure-function output of DiffManifests(prev, next). Carries FieldsChanged (alphabetised name list — no before/after values, honouring the audit pseudonym contract) and HostKeyChanged (the dedicated SSH-fingerprint-moved flag downstream consumers branch on without parsing FieldsChanged).internal/identity/tenancy/capability_manifest.go (ManifestDiff, DiffManifests)
NodeCapabilitiesUpdatedThe closed outbox event emitted exactly once per non-empty diff. Carries the (Node, Resource, Project, Domain) ID quartet (denormalised so downstream consumers route per-Domain without an extra lookup), the FieldsChanged name list, and the HostKeyChanged flag. Stable event-type discriminator tenancy.NodeCapabilitiesUpdated.internal/identity/tenancy/events/events.go (TypeNodeCapabilitiesUpdated, NodeCapabilitiesUpdated)
NSK (Node Secret Key)The per-Node bearer token the agent presents on every authenticated request. The NSK middleware resolves the bearer plaintext to a Node, asserts the resolved Node id equals the URL path id, and attaches the Node to the request context. The capability ingest handler reads the attached Node out of the context and never re-validates the NSK itself.internal/identity/authn/middleware/

The translation is one-directional: the agent emits a manifest, the handler emits a ManifestDiff on the response, and the repository emits a NodeCapabilitiesUpdated outbox event on a non-empty diff. The agent never reads the diff back through the ingest surface; the integrity correlator and the dashboard projector named in the Out of scope section consume the outbox event downstream.

Wire shape

The transport surface is one HTTP operation: PUT /v1/nodes/{id}/capabilities. The path id is a UUIDv7 Node id; the NSK presented in Authorization: Bearer MUST resolve to that same Node id (a cross-Node replay surfaces as 403 node_id_mismatch).

Request

The handler decodes the body with DisallowUnknownFields after capping the body at 32 KiB through http.MaxBytesReader. The decoded envelope is then canonicalised through tenancy.NewCapabilityManifest; the constructor is the single seam that enforces every manifest invariant.

jsonc
{
  "binary_version":             "plexd-v0.4.2-ge5f3a1c",
  "binary_checksum":            "<base64 of 32-byte SHA-256>",
  "ssh_host_key_fingerprint":   "SHA256:<base64>",   // optional
  "declared_hooks": [
    { "name": "post-install", "checksum": "<base64 of 32 bytes>" }
  ]
}

Field rules:

  • binary_version (string, required) — the agent's semver. Empty and whitespace-only values are rejected with 400 binary_version_empty.
  • binary_checksum (string, base64, required) — the raw SHA-256 digest of the running agent binary. Decodes to exactly 32 bytes through encoding/json's transparent base64 handling; any other length is rejected with 400 binary_checksum_invalid.
  • ssh_host_key_fingerprint (string, optional) — the OpenSSH SHA-256 host-key fingerprint in the canonical SHA256:<base64> form. Empty / absent values are accepted (a Node legitimately may not have a host key configured); a non-empty value not matching the pattern is rejected with 400 ssh_host_key_fingerprint_invalid.
  • declared_hooks (array, optional) — at most CapabilitiesMaxDeclaredHooks (128) entries, no duplicate name values, every entry's checksum decoding to exactly 32 bytes. Violations route to 400 declared_hooks_too_many, 400 declared_hook_duplicate, or 400 declared_hook_invalid respectively.

Response

jsonc
{
  "accepted_at":      "2026-04-27T10:15:30.123Z",
  "fields_changed":   ["binary_checksum", "binary_version"],
  "host_key_changed": false
}

accepted_at is the server commit timestamp the handler stamped on the request (UTC, RFC 3339). fields_changed is the alphabetised list of manifest fields that transitioned versus the prior persisted snapshot — on an idempotent PUT (no diff) the field is the explicit empty array [] and host_key_changed is false. The JSON encoder is forced to emit [] rather than null so downstream consumers can rely on the field being present on every 200.

The valid fields_changed enum values are the four snake_case column-equivalent identifiers:

  • binary_version
  • binary_checksum
  • ssh_host_key_fingerprint
  • declared_hooks

host_key_changed is a dedicated boolean rather than a derived predicate over fields_changed because SSH host-key transitions are security-load-bearing — downstream consumers (the integrity correlator named in the Out of scope section) switch on the explicit flag without parsing the string list.

Error code catalog

Every reachable Problem code from capabilities.go is enumerated below; generated TypeScript / Go clients can exhaustively switch on code without a fall-through arm. The handler's gate ordering (authn → path-id → body cap → body decode → manifest invariants → aggregate write) determines which arm fires; earlier gates are cheaper than later ones so the cheapest rejection path always short-circuits the more expensive ones.

HTTP statuscode literalTrigger
400binary_version_emptybinary_version is missing or empty after trimming whitespace (tenancy.ErrCapabilityManifestVersion).
400binary_checksum_invalidbinary_checksum is missing, empty, or not exactly 32 bytes after base64 decoding (tenancy.ErrCapabilityManifestChecksum). Distinct from heartbeat's binary_checksum_empty because the manifest invariant covers both empty AND wrong-length values in one branch.
400ssh_host_key_fingerprint_invalidssh_host_key_fingerprint is non-empty and does not match SHA256:<base64> (tenancy.ErrCapabilityManifestHostKey).
400declared_hook_invalidA declared_hooks entry violates a per-entry invariant — empty name or non-32-byte checksum (tenancy.ErrCapabilityManifestHookInvalid).
400declared_hook_duplicateTwo declared_hooks entries carry the same name (tenancy.ErrCapabilityManifestHookDuplicate).
400declared_hooks_too_manydeclared_hooks exceeds CapabilitiesMaxDeclaredHooks (128) (tenancy.ErrCapabilityManifestHooksTooMany).
400malformed_capabilities_requestBody is not a valid CapabilityManifestRequest envelope — invalid JSON, unknown field (the decoder runs DisallowUnknownFields), or missing required field.
401unauthorizedThe NSK middleware did not attach a Node to the request context. The handler's defensive arm — in production the middleware runs upstream and a missing Node is unreachable on a correctly mounted route.
403node_id_mismatchThe defense-in-depth path-id gate: the resolved NSK Node id does not equal the URL path id. The NSK middleware runs the same check upstream; this handler-side double-check protects against a misconfigured router that mounts the handler without the middleware. Audit row stamps the dedicated node_capabilities.path_gate relation so dashboards can split admission versus ingestion outcomes.
404capabilities_node_not_foundThe recorder reported ErrNotFound — the Node row was concurrently deleted between the NSK middleware's admission and the aggregate write.
413capabilities_body_too_largeBody exceeds the 32 KiB CapabilitiesMaxBodyBytes ceiling. http.MaxBytesReader caps the bytes before the JSON decoder is ever invoked.
501capabilities_not_provisionedThe deferred-wiring posture: one or more of CapabilitiesRecorder, NSKResolver, or NodeRepo is nil on the Handlers struct. The dispatch shim in capabilities_dispatch.go fails closed so log scrapers can alert on the deferred-wiring state.

Every reachable 4xx and 5xx arm emits an audit row through the shared AuditSink. Ingestion-phase entries (malformed body, invariant violation, recorder failure, granted) stamp the node_capabilities.record relation; the defense-in-depth path-id gate stamps node_capabilities.path_gate. Audit dashboards filter on the relation to detect "middleware was bypassed but the handler caught it" without conflating it with ingestion-phase entries.

NSK authentication model

The capability ingest surface inherits the NSK authentication seam from its heartbeat sibling — the same per-Node bearer credential governs both endpoints. There is no ReBAC participation on the ingest path: the operator-facing identity-and-relation authorisation surface lives in ../../../internal/authz/ and is documented under ../identity/rebac.md; the agent-facing capability ingest is a per-Node credential surface keyed on the NSK plaintext alone.

The authentication contract is structural and two-step:

  1. Middleware admission. The NSK middleware (../../../internal/identity/authn/middleware/) resolves the Authorization: Bearer plaintext to a Node, asserts the resolved Node id equals the URL path id, and attaches the Node to the request context. Missing, malformed, or revoked credentials surface as 401 nsk_revoked; a cross-Node bearer surfaces as 403 node_id_mismatch. The capability handler never re-validates the NSK itself.
  2. Handler-side double-check. The handler reads the attached Node off the context with middleware.FromContextNode and runs a defense-in-depth path-id comparison through middleware.MatchesPathID. A mismatch fires the node_capabilities.path_gate audit relation and refuses with 403 node_id_mismatch. This arm is unreachable when the middleware is mounted correctly; it protects against misconfigured routes that bypass the middleware.

The capability ingest surface is wired through the shared NSK middleware instance installed by the composition root for both the heartbeat and the capabilities paths — see nskAuthenticatedPathRE in ../../../cmd/plexsphere/app.go. A future Node-facing endpoint that requires NSK admission appends its path suffix to the same regex; there is no per-handler NSK middleware instance.

Diff & event contract

The repository write is the single seam where a per-PUT snapshot becomes (a) a row in plexsphere.node_capability_manifest and (b) a non-empty diff turns into one outbox event in plexsphere.outbox_events. The write is one transaction; the diff, the UPSERT, and the outbox append all commit or all roll back together.

Single-transaction ingest

NodeRepo.RecordCapabilities (../../../internal/identity/tenancy/repo/node_capability_repo.go) runs the following inside runInTx:

  1. SelectCapabilityManifest(node_id) — hydrate the prior manifest (returns the zero value when no row exists yet).
  2. DiffManifests(prev, next) — pure function; the constructor on the input has already canonicalised the next manifest.
  3. UpsertCapabilityManifest(...) — INSERT ... ON CONFLICT (node_id) DO UPDATE; the table is keyed on node_id so the UPSERT replaces the prior row in place.
  4. If the diff is non-empty, appendOutbox(...) writes one tenancy.NodeCapabilitiesUpdated event to plexsphere.outbox_events. The event payload is the JSON form the NodeCapabilitiesUpdated.Marshal method produces, carrying the (Node, Resource, Project, Domain) ID quartet, the alphabetised FieldsChanged name list, and the HostKeyChanged flag.

The DECISION block on RecordCapabilities pins the diff-inside-the-transaction posture: two concurrent PUTs against the same Node would each compute a diff against an already-stale row if the SELECT happened at the handler boundary; the loser's diff would not reflect the winner's UPSERT and the two outbox rows would disagree on which fields actually moved. Folding the SELECT, the diff, the UPSERT, and the outbox append into one transaction guarantees the emitted FieldsChanged matches the row that landed.

Empty diff semantics

An idempotent PUT (the agent re-publishes a manifest that exactly matches the persisted row) yields an empty ManifestDiff. The repository UPSERTs the row anyway (so the updated_at trigger fires) but skips the outbox append — downstream consumers see no event because nothing observable moved. The handler still returns 200 with fields_changed: [] and host_key_changed: false so the agent observes a consistent successful response shape.

DiffManifests semantics

The pure-function comparison treats declared_hooks as a set keyed by (name → checksum). A re-order does NOT register as a diff — the constructor already rejects duplicate names so the set keying is unambiguous, and a re-order should not emit a spurious outbox row that the dashboard projector named in the Out of scope section would render as an "agent capabilities changed" alert. The DECISION block on DiffManifests pins the rationale.

A zero-value prev (first PUT, no prior row) is treated as "every field at its zero value". FieldsChanged therefore names every non-zero field on the first manifest a Node publishes — matching the canonical "fields_changed lists every field that moved from zero-value" contract pinned by the event constructor in internal/identity/tenancy/events/events.go.

FieldsChanged is sorted alphabetically so the on-the-wire response and the outbox event payload are byte-stable across runs. HostKeyChanged is a separate flag because the SSH host-key transition is the single field downstream consumers need to branch on without parsing the string list.

Persistence

The schema lives in migration internal/platform/db/migrations/0035_node_capabilities.sql. A single table plexsphere.node_capability_manifest carries one row per Node; the row is overwritten in place when the agent re-publishes.

Table shape

ColumnTypeNotes
node_iduuidPRIMARY KEY and FOREIGN KEY to plexsphere.node(id) ON DELETE CASCADE. The aggregate boundary's "one CapabilityManifest per Node" invariant is enforced by the PK, not a synthetic id + UNIQUE constraint.
binary_versiontext NOT NULLCHECK (trim(binary_version) <> '') — the SQL-side mirror of the value-object invariant so a half-formed row cannot survive Hydrate.
binary_checksumbytea NOT NULLCHECK (length(binary_checksum) = 32) — the same 32-byte raw-SHA-256 length the constructor enforces.
ssh_host_key_fingerprinttext NULLNULLable because a Node legitimately may not have a host key configured. The downstream event flags transitions on this column separately because SSH host-key changes are security-load-bearing.
declared_hooksjsonb NOT NULL DEFAULT '[]'Array of {"name": "<name>", "checksum_base64": "<b64>"} objects. The application layer is the authoritative parser; SQL treats the blob as opaque.
created_attimestamptz NOT NULL DEFAULT now()Set on first INSERT; never modified.
updated_attimestamptz NOT NULL DEFAULT now()Refreshed by an updated_at trigger function on every UPDATE.

Down policy

The migration's Down arm DROPs the table. The DECISION block under the Down section pins the rationale: unlike the reachability and heartbeat tables (migrations 0010 and 0033) that hydrate ongoing state machines, the capability manifest is reconstructible from the next agent PUT — the agent re-publishes its current state on the next meaningful change and the table re-populates from a single round of ingest. A destructive Down is therefore safe and keeps the rollback semantics symmetric with the Up arm.

sqlc queries

The handler reaches Postgres through three sqlc-generated queries declared in internal/platform/db/queries/L0_node_capabilities.sql:

  • UpsertCapabilityManifest — INSERT ... ON CONFLICT (node_id) DO UPDATE. Returns the persisted row.
  • SelectCapabilityManifest — SELECT the prior row by node_id. Returns pgx.ErrNoRows when no row exists yet; the repository translates this into the zero-value CapabilityManifest so DiffManifests can treat the first PUT uniformly.
  • LoadNodeWithDomainIDs — hydrates the (Resource, Project, Domain) IDs the outbox payload needs alongside the Node id, so the outbox row is emitted without a second round-trip.

Out of scope

The capability ingest surface deliberately does NOT correlate, does NOT compare, and does NOT schedule. The downstream concerns below are owned by named follow-up stories; this slice is the producer side of the data their consumer arms will read.

  • Integrity correlator — consumes the NodeCapabilitiesUpdated outbox event and correlates the host_key_changed: true arm against attestation evidence to raise an integrity alert on a suspicious SSH host-key transition. The correlator owns the alerter and the correlation rules; the ingest surface is responsible only for emitting the flag.
  • Artifact Registry — owns the canonical catalogue of agent-binary checksums and hook-payload checksums. A future cross-check arm will compare a freshly-PUT manifest's binary_checksum and per-hook checksum against the registry and raise a divergence alert. The capability ingest surface intentionally does NOT perform this lookup at PUT time — the registry is a separate aggregate with its own lifecycle and the cross-check belongs in the registry's consumer arm.
  • CRD hooks — owns the Kubernetes-native scheduling and execution semantics of the declared_hooks advertised through this surface. The capability ingest surface persists the declarations and emits diff events; it does NOT plan, schedule, or execute any hook. The DeclaredHook value object is a metadata record only. Separately, the discovery-only PlexdHook projection in ./hooks.md projects discovered Kubernetes PlexdHook custom resources onto the distinct plexd_hooks manifest field; it too stays strictly read-only — scheduling and execution remain out of scope.
  • Operator-facing read surface — owns the dashboard view of "what is each Node advertising right now?". A read API and the dashboard projection live in the operator-facing read surface; the capability ingest surface is intentionally write-only on the agent path. The dashboard projector consumes the same NodeCapabilitiesUpdated outbox event the integrity correlator reads.

Cross-references