Skip to content

Action Orchestrator — dispatch, per-Node invocations, the two-tier output contract

This document is the authoritative bounded-context reference for the Action Orchestrator that ships under ../../../internal/actions/. It covers the ubiquitous language, the single Execution aggregate and its per-Node ActionInvocation entities, the schema reference for the three Postgres tables, the value objects, the closed per-Node invocation state machine, the two-tier output contract, the dispatch fan-out and survivor-precedence rules, the per-Domain live-execution cap, the ReBAC act matrix, the per-transition audit contract, the closed Problem-code catalogue, the Labels SelectorPort consumer story, and the downstream stories this context deliberately leaves to follow-ups.

The Action Orchestrator is the dispatch, tracking, and history surface for one named action invoked across a cohort of Nodes — and only that. It admits a dispatch (resolving a target cohort either from an opaque label selector or a single explicit Node), fans it out to one ActionInvocation per Node, mints the per-target callback contract, records every per-Node status transition, collects the action output, and settles the Execution when its targets reach a terminal state. It stores operator intent and execution evidence; it does not run the action, hold the Node's identity, or evaluate step-up policy. The follow-ups that build on this context are named below where they become relevant.

This is NOT an action runtime, an NSK custodian, or a step-up gate. The action itself executes on the Node agent, never in the control plane; the per-Node callback the Node reports back with is authenticated by the Node Secret Key at the transport boundary, not re-checked by this context's application services; and the sensitive-operation step-up approval that gates whether a dispatch is permitted is a deferred story (see What this context is not).

For the bounded-context siblings and upstream references see:

  • ../../../api/openapi/plexsphere-v1.yaml — the OpenAPI source that pins the four HTTP operations (DispatchExecution, ListExecutions, GetExecution, PostNodeExecutionCallback), their wire shapes, status codes, and the Problem-code catalogue this document mirrors.
  • ../../../internal/actions/ — the domain code; every type named in this document has a matching symbol there.
  • ./events.md — the events surface: the single actions.ActionDispatched outbox literal, its action_request payload field table, and the per-Node SSE wire fan-out.
  • ../../../internal/platform/db/migrations/0043_actions.sql — the migration that creates the three tables and pins every CHECK, FK behaviour, and index this document describes.
  • ../labels/index.md — the Label Registry whose SelectorPort seam this context consumes to expand a selector into a concrete Node cohort.
  • ../../contributing/layout.md — the bounded-context map row enumerating the consumed and exposed ports this document pins.

Ubiquitous language

The vocabulary travels verbatim across the Go code, the SQL schema, the OpenAPI surface, the outbox event payload, and the audit trail. Internal code never paraphrases the terms; documentation and error prose adopt the exact spelling below. The package godoc at internal/actions/doc.go is the ubiquitous-language pin; this document is its prose expansion.

TermDefinitionCode anchor
ExecutionThe aggregate root: one dispatch of a named action to a cohort of target Nodes. It is keyed on an app-minted UUIDv7 ExecutionID, scoped to a (DomainID, ProjectID), and fans out to one ActionInvocation per Node. It owns the collected output reference and is the consistency boundary — one transaction modifies the aggregate as a whole.internal/actions/execution.go
ActionInvocationThe per-target-Node entity within the Execution aggregate. Its identity is the (Execution, NodeID) pair, and its Status advances independently along the closed invocation state machine — one Node may have succeeded while another is still pending. Carries the reported exit code, error text, lifecycle timestamps, and the hash of the per-target callback token.ActionInvocation in internal/actions/execution.go
CallbackURLThe absolute URL a Node reports an action result back to, built per the template {base}/v1/nodes/{node_id}/executions/{execution_id}. A value object whose absolute-URL invariant (scheme + host) surfaces at construction.CallbackURL in internal/actions/types.go
InlineOutputThe bounded raw-bytes form of an action's collected output that fits inside the control plane (≤ MaxInlineOutputBytes, 16 KiB). A value object whose size invariant wraps ErrInlineOutputTooLarge.InlineOutput in internal/actions/types.go
OutputRefThe tagged union persisted in the Execution's output_ref jsonb column. The inline variant carries the bounded InlineOutput bytes; the object_store variant (the large-output reference) carries the (bucket, key, sha256) tuple addressing the object the Node uploaded over a presigned PUT. The kind discriminator is unexported so an OutputRef can only be built through its constructors.OutputRef in internal/actions/types.go
ActionKindThe closed two-member discriminator {builtin, hook}: a built-in action shipped with the Node agent, or a user-declared hook. A value object so an invalid kind surfaces at construction.ActionKind in internal/actions/types.go
StatusThe per-Node ActionInvocation lifecycle status; the closed set {pending, ack, started, succeeded, failed, cancelled, timeout}. Legal advances are enforced by Status.Transition.Status in internal/actions/types.go
ExecutionID / DomainID / ProjectID / NodeID / ActorIDOpaque [16]byte UUID value objects. ExecutionID is the aggregate's app-minted UUIDv7 surrogate key; the other four are external references to the identity context. The composition root adapts them from the tenancy identifiers so the actions context never imports internal/identity.internal/actions/types.go
ParametersThe opaque JSON parameter document a dispatch carries (≤ MaxParametersBytes, 64 KiB). A value object whose JSON-validity and size invariants surface at construction; an empty input is the distinct zero value (no parameters), not a JSON null.Parameters in internal/actions/types.go
TimeoutThe per-dispatch execution timeout — strictly positive and ≤ MaxTimeout (24 h). The Execution's expiresAt is requestedAt + Timeout; the background reconciler times out an invocation whose Node never reports within the window.Timeout in internal/actions/types.go

Naming note: the OpenAPI wire shape and the Node-facing callback envelope speak of an inline output and the object-store coordinates of an over-ceiling output; the domain models both arms as the single OutputRef tagged union. There is no separate LargeOutputRef type — the large-output reference is the object_store variant of OutputRef.

Why one aggregate with per-Node entities

The Action Orchestrator is modelled as one Execution aggregate whose per-Node status lives on its ActionInvocation entities, not as a fan of independent per-Node aggregates nor as a single root-level status. The DECISION block above the entity declaration in internal/actions/execution.go records the choice; the reasoning is load-bearing enough to restate.

One dispatch fans out to N target Nodes, each of which advances through the lifecycle independently — one Node may have succeeded while another is still pending. A single status on the Execution root could not represent that mixed state, so the status state machine lives on each ActionInvocation. The Execution remains the consistency boundary: it owns its invocations and the collected output ref, and a single transaction modifies the aggregate as a whole. The aggregate refuses a zero-invocation Execution at construction, so a zero-target dispatch can never be persisted through any path — independently of the application layer's own empty-cohort refusal.

A constructor (New) mints a fresh UUIDv7 ExecutionID, computes expiresAt as requestedAt + Timeout, and creates one pending ActionInvocation per target Node; a hydrator (Hydrate / HydrateInvocation) rebuilds the aggregate from persisted state, rejecting structurally impossible state (zero ids, zero timestamps, a status outside the closed set) so a corrupt row never round-trips silently.

Schema reference

The Action Orchestrator persists to three tables in the plexsphere schema. The migration that creates them is internal/platform/db/migrations/0043_actions.sql. Every CHECK constraint, FK behaviour, and index below is sourced from that file.

TableAggregate partNotable constraints
plexsphere.action_executionExecution headerApp-minted uuid PRIMARY KEY (no DB DEFAULT); FK domain_id/project_id REFERENCES plexsphere.domains(id)/plexsphere.projects(id) ON DELETE RESTRICT; CHECK action_kind IN ('builtin','hook'); CHECK timeout_seconds > 0; terminal_status NULLable with CHECK over the closed terminal set {succeeded, failed, cancelled, timeout} (NULL = live); requested_by carries no FK (the requester may be an API-token service identity absent from a users table).
plexsphere.action_execution_targetActionInvocationComposite PRIMARY KEY (execution_id, node_id) — at most one invocation per Node per Execution is a structural fact; FK execution_id … ON DELETE CASCADE (a target belongs to its Execution); FK node_id REFERENCES plexsphere.nodes(id) ON DELETE RESTRICT; CHECK status IN ('pending','ack','started','succeeded','failed','cancelled','timeout'); callback_token_hash bytea NOT NULL.
plexsphere.action_execution_eventTransition logAppend-only; bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY gives the monotonic insertion order the timeline reads back; FK execution_id … ON DELETE CASCADE; rows are never updated or deleted.

Three indexes back the read paths: action_execution_project_keyset_idx on (project_id, created_at DESC, id DESC) matches the List query's ORDER BY so the keyset scan reads the index without a sort; the partial index action_execution_live_by_domain_idx on (domain_id) WHERE terminal_status IS NULL backs the per-Domain live-execution count without scanning settled history; action_execution_event_execution_idx makes the per-Execution timeline a single range scan.

Why the surrogate id is app-minted

action_execution.id carries an app-minted UUIDv7 with no database DEFAULT. The application mints the id so the row key is time-ordered and the same id can ride the actions.ActionDispatched domain event the orchestrator emits in the same transaction the row is written. A DEFAULT gen_random_uuid() would mint a v4 id the application never sees, splitting the aggregate id from its outbox event. The target's composite (execution_id, node_id) natural key makes "at most one invocation per Node per Execution" a structural fact the SQL layer cannot violate, so a target needs no surrogate id at all.

Why the FK split — RESTRICT on scope, CASCADE on parts

The header FKs domain_id, project_id (and the target's node_id) RESTRICT against their tenancy parents — an Execution carrying live or settled history cannot have its scoping Domain, Project, or a targeted Node deleted out from under it, because that history is operator-authored evidence reconstructible from nowhere else. The target's execution_id and the event log's execution_id CASCADE from the header — a target and an event are parts of the Execution aggregate, not independently meaningful rows, so they are pruned with their root. The DECISION blocks in the migration record the rejected alternatives.

Down-refusal — SQLSTATE 0A000

The migration's downgrade block refuses the rollback and raises SQL exception code 0A000 (feature_not_supported). The three tables hold the operator-authored record of which actions ran across which Nodes, with which outcomes, plus the append-only transition log that is the evidence trail of how each Execution settled; that history is not reconstructible from anywhere else, so dropping the tables would silently discard live and settled operator evidence. The stance mirrors 0042_bridge_orchestrator.sql and the prior evidence-bearing migrations.

Value objects and error sentinels

The context's value objects live in internal/actions/types.go, its constants in internal/actions/constants.go, and its error sentinels in internal/actions/errors.go.

Value object / constantShapeNotes
ExecutionIDUUIDv7 wrapper over [16]byteAggregate surrogate key and event id; time-ordered; canonical 8-4-4-4-12 textual form; zero value is "not yet assigned".
DomainID, ProjectID, NodeID, ActorID[16]byteOpaque external references to the identity context; the composition root adapts them so the actions context stays free of an internal/identity import.
Slug-freeUnlike the bridge context, actions carries no slug value object; an Execution is addressed by id, not by a human-facing handle.
Parametersjson.RawMessage ≤ 64 KiBValidated for JSON validity and size at construction; empty input is the zero value, not JSON null.
Timeouttime.Duration, 0 < d ≤ 24hPositive-and-bounded invariant at construction; Seconds() is what the wire carries.
CallbackURLabsolute URL stringScheme + host required at construction.
InlineOutput[]byte ≤ 16 KiBSize invariant wraps ErrInlineOutputTooLarge.
OutputReftagged union {inline, object_store}The persisted output; the object_store variant requires non-empty (bucket, key, sha256).
MaxInlineOutputBytes16 * 1024The inline-output ceiling; output beyond it rides a presigned object-store PUT.
DefaultLiveExecutionsCap1000The default per-Domain live-execution ceiling; it is operator-tunable, so the constant carries the Default… prefix to read honestly at the call site.
MaxParametersBytes64 * 1024The dispatch parameter-document size bound.
MaxTimeout24 * time.HourThe upper bound on a dispatch timeout, so a stuck dispatch cannot hold a live-execution slot indefinitely.

The nine domain error sentinels (each mapped onto a transport Problem code under ReBAC and Problem codes):

SentinelMeaning
ErrInvalidStateTransitionAn invocation status was asked to advance along an edge the closed state machine does not permit (an illegal jump or a non-terminal self-edge).
ErrExecutionAlreadyTerminalAn invocation already in a terminal status was asked to advance to a different status (a late Node report onto a settled invocation).
ErrSelectorEmptyCohortA dispatch's node selector resolved to zero target Nodes (or every candidate was dropped by the fan-out filter).
ErrActionNotDeclaredThe requested action is not declared by any Node in the resolved cohort.
ErrInlineOutputTooLargeAn inline action output exceeds MaxInlineOutputBytes.
ErrOutputUploadFailedThe presigned object-store PUT backing an over-ceiling output could not be minted or failed.
ErrCapacityExceededAdmitting one more Execution would push the per-Domain live-execution count past its cap.
ErrHookIntegrityViolationA dispatch target's hook failed its trust-on-first-use integrity check — the declared hook digest drifted from the pinned known-good baseline, or no baseline was ever catalogued (the gate fails closed).
ErrPermissionDeniedThe requesting actor lacks the ReBAC act relation required to dispatch the action.

Mapping note: the HTTP status mapping is owned entirely by the transport layer in internal/transport/http/v1/actions/errors.go; the domain sentinels in errors.go carry no HTTP status. The binding mapping is the table under ReBAC and Problem codes: an empty cohort is 422 (selector_empty_cohort) and a capacity refusal is 429 (capacity_exceeded), conforming to the Action Orchestrator acceptance criteria (the capacity criterion is a SHALL).

Invocation state machine

Each ActionInvocation advances along a closed per-Node state machine enforced by Status.Transition against the legalTransitions roster in internal/actions/types.go. The three non-terminal statuses (pending, ack, started) carry outgoing edges; the four terminal statuses (succeeded, failed, cancelled, timeout) own no outgoing edge except an idempotent self-edge.

mermaid
stateDiagram-v2
    [*] --> pending
    pending --> ack
    ack --> started
    started --> succeeded
    started --> failed
    started --> cancelled
    pending --> timeout
    ack --> timeout
    started --> timeout
    succeeded --> [*]
    failed --> [*]
    cancelled --> [*]
    timeout --> [*]

Two properties of this machine are load-bearing:

  • timeout is reachable from every non-terminal statuspending, ack, AND started. A background reconciler times out invocations whose Node never reports, and an execution can expire before the Node ever acks; limiting the timeout edge to started-only would strand a never-acked invocation with no terminal status to settle on. The DECISION block on legalTransitions records the rejected started-only alternative.
  • A terminal status accepts only an idempotent self-edge. A re-posted callback reporting the same terminal status the invocation already rests in is a no-op success; any other advance off a terminal status returns ErrExecutionAlreadyTerminal. A non-terminal self-edge (e.g. pending → pending) is an illegal advance returning ErrInvalidStateTransition.

The callback path drives this machine server-side through a compare-and-set (see Callback path); the reconciler drives the → timeout edge through the same CAS with a hard-coded {pending, ack, started} predecessor set.

Dispatch path

The dispatch application service (internal/actions/services/dispatch.go) admits a named action and fans it out to a cohort of target Nodes via two public methods.

DispatchBySelector resolves an opaque label selector to a Node cohort, then applies five per-node gates in order, dropping (not failing) each rejected node:

  1. Cohort resolution — the selector is resolved within the (Domain, Project) scope through the LabelSelector port, paging until the cursor is exhausted, folding every node-kind ObjectRef into a target NodeID. A cohort that resolves to zero Nodes is refused with ErrSelectorEmptyCohort.
  2. Domain isolation — a node whose Domain differs from the dispatch's is dropped (a foreign-tenant node can never be a target).
  3. Project isolation — a same-Domain node that belongs to a different Project is dropped, even when the selector matched it, so horizontal isolation holds per-target and the domain gate cannot mask a sibling-Project leak. This is the gate the "Cross-project isolation" acceptance scenario pins.
  4. ReBAC act — a node the subject cannot act on is dropped, emitting one denial audit row with the insufficient_relation marker.
  5. Capability — a node that does not declare the requested action is dropped.
  6. Hook integrity — a node whose target hook fails its trust-on-first-use integrity check (a drifted or uncatalogued hook) is dropped, emitting one denial audit row with the hook_integrity_violation relation. A builtin action carries no node-declared hook and skips this gate. The known-good baseline, the byte-for-byte Verified / Drift / Uncatalogued verdict, and the drift integrity_alert are owned by the hook-integrity surface; this context only consults the verdict through its CapabilityReader port and refuses a non-Verified hook.

DispatchByNode names a single explicit target Node and applies the same gates but with a hard refusal rather than a silent drop: a node outside the dispatch's (Domain, Project) scope is refused with ErrSelectorEmptyCohort (an empty single-node cohort — horizontal isolation, including per-target Project isolation, holds independently of the ReBAC graph), a ReBAC denial is a hard ErrPermissionDenied (a 403), a node that does not declare the action yields ErrActionNotDeclared, and a node whose target hook fails its integrity check yields a hard ErrHookIntegrityViolation (a 409) with nothing persisted.

Survivor precedence

When no node survives the DispatchBySelector fan-out filter, the refusal distinguishes the operator's remedy in a fixed precedence (the DECISION block on DispatchBySelector records the rule):

  • If any node was dropped because its target hook failed its trust-on-first-use integrity check, surface ErrHookIntegrityViolation (a 409) — an integrity failure outranks the two refusals below, so an operator whose whole cohort tripped the gate sees the integrity refusal rather than a capability or selector error.
  • Else if at least one node reached the capability gate and was dropped solely because it did not declare the action, surface ErrActionNotDeclared — a capability problem (deploy the action to the nodes).
  • Otherwise (all foreign-domain, all denied, or a mix) surface ErrSelectorEmptyCohort — a selector or permission problem (widen the selector, grant the relation).

Capacity and atomic persist

Both paths funnel through a shared admit tail: a capacity check (admitting one more live Execution must not push the per-Domain live count past liveExecutionsCap, else ErrCapacityExceeded), aggregate + per-target callback-token construction, then an atomic persist in one RunInTx closure that writes the Execution header, the N targets, and the N actions.ActionDispatched outbox rows together — each CreateTarget interleaved with its AppendOutboxEvent so the dispatch commits as a whole or not at all. Each target carries a fresh 32-byte callback token whose sha256 hash is persisted (callback_token_hash is NOT NULL); the plaintext token's delivery to the Node rides the callback NSK auth path.

Only after the transaction commits do the post-commit side effects fire: the live-executions gauge and the dispatch counter advance, one granted audit row is emitted, and — when wired — the WirePublisher fans the dispatch onto the mesh (see ./events.md).

Callback path

The callback application service (internal/actions/services/callback.go) drives one Node-reported status advance through the closed state machine. A target Node reports a status advance for its per-Node invocation via the NSK-authenticated POST /v1/nodes/{id}/executions/{exec_id} surface; the service drives the advance through a server-side compare-and-set, logs the transition, and on a terminal status collects the output and settles the Execution header.

The flow is, in order:

  1. Over-ceiling output handling first — an inline output longer than MaxInlineOutputBytes is a hard refusal (ErrInlineOutputTooLarge); a declared size over the ceiling mints a presigned object-store PUT URL (once) and returns it to the Node before the CAS runs. The presign mint stamps a distinct actions.output_presign audit row.
  2. CAS — the target advances to the reported status only when its current status is a legal predecessor (derived by probing every member of the closed Status set against Status.Transition). The CAS, the transition-log AppendEvent, and — on a terminal status — the header settle all run in one RunInTx so they commit atomically.
  3. On an accepted transition — exactly one granted callback audit row is emitted; on a terminal status the collected OutputRef is persisted and the live gauge decremented.
  4. On a CAS miss — the service re-reads and classifies: an idempotent re-post (current == reported) is a no-op success with no new event or audit; a terminal current status is ErrExecutionAlreadyTerminal; anything else is ErrInvalidStateTransition.

The callback service carries no Authorizer: the calling Node's identity is authenticated at the transport / NSK layer before Apply is invoked, so re-checking a ReBAC relation here would gate the result path on a graph the Node never participates in. The DECISION block on CallbackService records this.

Timeout reconciler

The background reconciler (internal/actions/services/reconciler.go) sweeps the live Executions whose expires_at has passed (a partial-index scan), times out every still-live target through the same CAS (with the {pending, ack, started} predecessor set), settles the header to timeout once all its targets are terminal, and decrements the live gauge. A poisoned row is logged and skipped so one bad row never aborts the whole sweep; the header settle is idempotent (SetExecutionTerminal guards WHERE terminal_status IS NULL), so a concurrent callback that already settled the header makes the timeout settle a no-op.

A timed-out target emits only its transition-log event, not a new typed outbox event — the actions event-type set is closed to actions.ActionDispatched by the AST gate (see ./events.md), and admitting a timeout discriminator would break that closed set. A tracked TODO in the reconciler records emitting a typed timeout outbox event once the closed set is widened in lockstep with the SSE wire-type table.

Two-tier output contract

An action's collected output rides one of two tiers, keyed on the 16 KiB MaxInlineOutputBytes ceiling:

  • Inline (≤ 16 KiB) — the bounded output bytes are stored in the control plane inside the inline variant of OutputRef. An inline body the Node sends over the ceiling is refused with ErrInlineOutputTooLarge (HTTP 413).
  • Object-store (> 16 KiB) — when the Node declares an over-ceiling output (declared_output_bytes past the ceiling), the first callback's 200 response carries an output_upload_url: a presigned object-store PUT URL the Node uploads the full body to before sending the terminal callback. The object-store key is deterministic per (exec, node) (<exec>/<node>/output) so a retried over-ceiling callback addresses the same object. The terminal settle then persists the object_store variant of OutputRef carrying (bucket, key, sha256).

When both an object-store key and inline bytes are present on the terminal callback, the object-store ref wins — a Node that already uploaded an over-ceiling body has its authoritative output there, not in any residual inline field. When neither is set, the terminal settle carries the zero OutputRef (a terminal with no collected output, e.g. a failed action that produced nothing).

The presign mint is required for the over-ceiling path: a callback service constructed without an OutputPresigner surfaces ErrOutputUploadFailed (HTTP 502) on the first over-ceiling callback. The presigner is an injected port declared locally in internal/actions/ports.go so the context stays free of an internal/platform/blobstore import; the composition root wires the concrete blobstore-backed presigner behind it.

ReBAC and Problem codes

The act permission gates every dispatch and read. The split between the transport gate and the per-node service gate is deliberate:

  • The transport layer gates act on the owning project:<project_id> for all three operator-facing operations (DispatchExecution, ListExecutions, GetExecution), resolved directly from the path parameter before any persistence write or body decode, so an unauthorised caller never exercises the JSON decoder and never mints a dispatch. The gate is declared in internal/transport/http/v1/actions/wiring.go.
  • The dispatch service additionally gates act on each cohort member resource:<node_id> during the fan-out — a selector dispatch drops a node the subject cannot act on; a single-node dispatch refuses with a hard 403. The per-node object string is resource:<node-uuid>, reusing the bridge convention so the authz object model stays uniform across contexts.
  • The Node callback is not ReBAC-gated. It is authenticated by the Node Secret Key at the transport boundary, with a defence-in-depth execution-target gate: the resolved NSK Node must be a member of the Execution's target cohort, else 403 nsk_node_mismatch — a leaked NSK from Node A cannot post callbacks against an Execution that targeted only Node B. The gate runs before the body is read.
OperationReBAC objectPermission
DispatchExecution / ListExecutions / GetExecution (transport gate)project:<project_id>act
Per-cohort-member dispatch fan-out (service gate)resource:<node_id>act
PostNodeExecutionCallback— (NSK auth + execution-target membership)

The typed sentinels map onto the closed Problem-code taxonomy at the transport boundary. The operator-facing dispatch / list / get codes are pinned in internal/transport/http/v1/actions/errors.go; the Node callback codes are pinned in internal/transport/http/v1/handlers/executions_callback.go.

Sentinel / conditionHTTP statusProblem code
Malformed label selector400malformed_selector
Action not declared for the resolved target400action_not_declared
Both-or-neither / zero node_id target400invalid_target
Malformed dispatch body400invalid_body
Malformed execution_id / project_id / limit / cursor400invalid_execution_id / invalid_project_id / invalid_limit / invalid_cursor
Selector resolved an empty cohort422selector_empty_cohort
Per-Domain capacity exceeded429capacity_exceeded
Dispatch targets a hook failing its integrity check409hook_integrity_violation
Project not found / not visible404project_not_found
Execution not found404execution_not_found
Dispatch body over the cap413request_body_too_large
ReBAC denial (transport or service)403rendered as a PermissionDenied body
Dispatch/read surface unprovisioned501actions_dispatch_not_provisioned
Callback: reported status not a legal advance409invalid_state_transition
Callback: invocation already terminal409execution_already_terminal
Callback: inline output over the 16 KiB ceiling413inline_output_too_large
Callback: NSK belongs to a different Node403nsk_node_mismatch
Callback surface unprovisioned501execution_callback_not_provisioned

The 500 path never interpolates the underlying error text into the wire body — raw driver messages can carry SQL fragments or constraint names a caller has no right to see; the detail is logged internally and the body stays generic.

Audit contract

Every transition stamps a verb-style audit relation onto its audit row — an operation namespace, distinct from the ReBAC act permission and from the outbox event literal. The orchestrator emits one audit row per mutation and one per denial, distinguished by the row's Outcome. Denial and invariant-rejection rows are written before the response is flushed so a flaky audit backend cannot land a silent denial; a sink error is made loud via slog rather than swallowed.

SurfaceAudit relationsWhere pinned
Dispatch serviceactions.dispatch (granted), insufficient_relation (per-node ReBAC denial drop), hook_integrity_violation (per-node integrity denial drop)dispatch.go
Callback serviceactions.callback (accepted transition), actions.output_presign (over-ceiling presign mint)callback.go
Reconcileractions.timeout (per timed-out invocation, subject system)reconciler.go
Transport (dispatch/list/get)execution.dispatch, execution.list, execution.getwiring.go
Transport (Node callback)node_execution_callback.apply, node_execution_callback.path_gate, node_execution_callback.node_gateexecutions_callback.go

The Outcome values are granted, permission_denied, invariant_violation, and internal_error on the transport surface; the service surface uses granted / denied.

Observability

The application services share a Prometheus collector bundle (internal/actions/services/metrics.go), both collectors carrying exactly the {domain_id} label:

  • plexsphere_actions_live_executions — a GaugeVec of the live (not-yet-terminal) Executions per Domain. The dispatch service increments it after a dispatch commits; the callback and reconciler decrement it when an Execution settles. The three services share one registry (newMetrics is AlreadyRegistered-tolerant) so a callback's decrement settles the same series the dispatch's increment raised.
  • plexsphere_actions_executions_total — a CounterVec of admitted dispatches per Domain.

A subject / resource / execution-id label would be unbounded (one new series per principal, target, or dispatch) and is deliberately rejected: that high-cardinality identity belongs in the structured slog line and the audit trail, never in a metric label. The bundle is nil-tolerant — a service constructed without WithMetrics keeps a zero-cost no-op bundle.

Labels SelectorPort consumer story

The Action Orchestrator is one of the four sanctioned consumers of the Label Registry's SelectorPort seam (alongside the Policy Engine, the Provisioning Broker, and Observability — the allow-list is pinned in the internal/labels layout row and in ../labels/index.md). The orchestrator expands "dispatch to every Node matching <selector>" into the concrete Node set at action_request time.

The crossing is kept narrow. The dispatch service declares a context-local LabelSelector port and its value types (ObjectRef, SelectorScope, ListCursor, ObjectRefPage) in internal/actions/ports.go — actions-local mirrors of the labels shapes, not the labels types themselves — because the no-cross-context-imports-actions depguard rule denies the actions context an internal/labels import. The only place the two contexts meet is the composition-root adapter, which parses the opaque selector string and translates to labels.SelectorPort behind this interface. The dispatch service folds the resolved node-kind ObjectRefs into target NodeIDs, paging until the cursor is exhausted.

Deferred downstream

Three concerns build on or around this context and are out of scope here:

  • The plaintext callback-token delivery. The dispatch service mints a per-target callback token and persists only its sha256 hash; the plaintext's delivery to the Node rides the callback NSK auth path. The durable invariant dispatch guarantees is that every target row carries a unique, unguessable hash from creation so a callback can never match an unset hash.
  • A typed timeout outbox event. The reconciler emits only the per-target transition-log row on a timeout, not a typed outbox event, because the closed actions event-type set admits only actions.ActionDispatched. Widening the set (and the SSE wire-type table in lockstep) is the tracked follow-up.
  • The consumer-side SSE wiring. The producer side of the action_request wire fan-out is wired (the dispatch service calls the WirePublisher after it commits); the consumer-side /v1/nodes/{id}/events HTTP plumbing is owned by the Signed Event Bus epic and stays on its 501 stub. See ./events.md.

What this context is not

To keep the boundary sharp, the Action Orchestrator is deliberately NOT:

  • An action runtime. It dispatches a named action and tracks its per-Node lifecycle; the action itself executes on the Node agent, never in the control plane. This context stores intent and evidence.
  • An NSK custodian. The per-Node callback is authenticated by the Node Secret Key at the transport boundary; this context's callback service trusts its already-authenticated caller and runs an execution-target membership gate, but it neither issues nor validates the NSK.
  • A step-up approval gate. The sensitive-operation step-up approval that gates whether a dispatch is permitted is a separate story; this context gates only the act ReBAC permission.
  • A hook integrity authority. This context gates a hook dispatch on the trust-on-first-use integrity verdict (see Dispatch path), but it neither pins the known-good baseline nor compares digests itself — the baseline, the byte-for-byte Verified / Drift / Uncatalogued classification, and the drift integrity_alert are owned by the hook-integrity surface. This context consults the verdict through its CapabilityReader port and refuses a non-Verified hook.
  • A dashboard. The operator-facing Actions dashboard view is a separate frontend story; this context exposes only the /v1 HTTP surface and the read projections.

Cross-references