Action Orchestrator — dispatch, per-Node invocations, the two-tier output contract

This document is the authoritative bounded-context reference for the Action Orchestrator that ships under ../../../internal/actions/. It covers the ubiquitous language, the single Execution aggregate and its per-Node ActionInvocation entities, the schema reference for the three Postgres tables, the value objects, the closed per-Node invocation state machine, the two-tier output contract, the dispatch fan-out and survivor-precedence rules, the per-Domain live-execution cap, the ReBAC act matrix, the per-transition audit contract, the closed Problem-code catalogue, the Labels SelectorPort consumer story, and the downstream stories this context deliberately leaves to follow-ups.

The Action Orchestrator is the dispatch, tracking, and history surface for one named action invoked across a cohort of Nodes — and only that. It admits a dispatch (resolving a target cohort either from an opaque label selector or a single explicit Node), fans it out to one ActionInvocation per Node, mints the per-target callback contract, records every per-Node status transition, collects the action output, and settles the Execution when its targets reach a terminal state. It stores operator intent and execution evidence; it does not run the action, hold the Node's identity, or evaluate step-up policy. The follow-ups that build on this context are named below where they become relevant.

This is NOT an action runtime, an NSK custodian, or a step-up gate. The action itself executes on the Node agent, never in the control plane; the per-Node callback the Node reports back with is authenticated by the Node Secret Key at the transport boundary, not re-checked by this context's application services; and the sensitive-operation step-up approval that gates whether a dispatch is permitted is a deferred story (see What this context is not).

For the bounded-context siblings and upstream references see:

../../../api/openapi/plexsphere-v1.yaml — the OpenAPI source that pins the four HTTP operations (DispatchExecution, ListExecutions, GetExecution, PostNodeExecutionCallback), their wire shapes, status codes, and the Problem-code catalogue this document mirrors.
../../../internal/actions/ — the domain code; every type named in this document has a matching symbol there.
./events.md — the events surface: the single actions.ActionDispatched outbox literal, its action_request payload field table, and the per-Node SSE wire fan-out.
../../../internal/platform/db/migrations/0043_actions.sql — the migration that creates the three tables and pins every CHECK, FK behaviour, and index this document describes.
../labels/index.md — the Label Registry whose SelectorPort seam this context consumes to expand a selector into a concrete Node cohort.
../../contributing/layout.md — the bounded-context map row enumerating the consumed and exposed ports this document pins.

Ubiquitous language

The vocabulary travels verbatim across the Go code, the SQL schema, the OpenAPI surface, the outbox event payload, and the audit trail. Internal code never paraphrases the terms; documentation and error prose adopt the exact spelling below. The package godoc at internal/actions/doc.go is the ubiquitous-language pin; this document is its prose expansion.

Term	Definition	Code anchor
Execution	The aggregate root: one dispatch of a named action to a cohort of target Nodes. It is keyed on an app-minted UUIDv7 `ExecutionID`, scoped to a `(DomainID, ProjectID)`, and fans out to one ActionInvocation per Node. It owns the collected output reference and is the consistency boundary — one transaction modifies the aggregate as a whole.	`internal/actions/execution.go`
ActionInvocation	The per-target-Node entity within the Execution aggregate. Its identity is the `(Execution, NodeID)` pair, and its `Status` advances independently along the closed invocation state machine — one Node may have succeeded while another is still pending. Carries the reported exit code, error text, lifecycle timestamps, and the hash of the per-target callback token.	`ActionInvocation` in `internal/actions/execution.go`
CallbackURL	The absolute URL a Node reports an action result back to, built per the template `{base}/v1/nodes/{node_id}/executions/{execution_id}`. A value object whose absolute-URL invariant (scheme + host) surfaces at construction.	`CallbackURL` in `internal/actions/types.go`
InlineOutput	The bounded raw-bytes form of an action's collected output that fits inside the control plane (≤ `MaxInlineOutputBytes`, 16 KiB). A value object whose size invariant wraps `ErrInlineOutputTooLarge`.	`InlineOutput` in `internal/actions/types.go`
OutputRef	The tagged union persisted in the Execution's `output_ref` jsonb column. The `inline` variant carries the bounded `InlineOutput` bytes; the `object_store` variant (the large-output reference) carries the `(bucket, key, sha256)` tuple addressing the object the Node uploaded over a presigned PUT. The kind discriminator is unexported so an OutputRef can only be built through its constructors.	`OutputRef` in `internal/actions/types.go`
ActionKind	The closed two-member discriminator `{builtin, hook}`: a built-in action shipped with the Node agent, or a user-declared hook. A value object so an invalid kind surfaces at construction.	`ActionKind` in `internal/actions/types.go`
Status	The per-Node ActionInvocation lifecycle status; the closed set `{pending, ack, started, succeeded, failed, cancelled, timeout}`. Legal advances are enforced by `Status.Transition`.	`Status` in `internal/actions/types.go`
ExecutionID / DomainID / ProjectID / NodeID / ActorID	Opaque `[16]byte` UUID value objects. `ExecutionID` is the aggregate's app-minted UUIDv7 surrogate key; the other four are external references to the identity context. The composition root adapts them from the tenancy identifiers so the actions context never imports `internal/identity`.	`internal/actions/types.go`
Parameters	The opaque JSON parameter document a dispatch carries (≤ `MaxParametersBytes`, 64 KiB). A value object whose JSON-validity and size invariants surface at construction; an empty input is the distinct zero value (no parameters), not a JSON `null`.	`Parameters` in `internal/actions/types.go`
Timeout	The per-dispatch execution timeout — strictly positive and ≤ `MaxTimeout` (24 h). The Execution's `expiresAt` is `requestedAt + Timeout`; the background reconciler times out an invocation whose Node never reports within the window.	`Timeout` in `internal/actions/types.go`

Naming note: the OpenAPI wire shape and the Node-facing callback envelope speak of an inline output and the object-store coordinates of an over-ceiling output; the domain models both arms as the single OutputRef tagged union. There is no separate LargeOutputRef type — the large-output reference is the object_store variant of OutputRef.

Why one aggregate with per-Node entities

The Action Orchestrator is modelled as one Execution aggregate whose per-Node status lives on its ActionInvocation entities, not as a fan of independent per-Node aggregates nor as a single root-level status. The DECISION block above the entity declaration in internal/actions/execution.go records the choice; the reasoning is load-bearing enough to restate.

One dispatch fans out to N target Nodes, each of which advances through the lifecycle independently — one Node may have succeeded while another is still pending. A single status on the Execution root could not represent that mixed state, so the status state machine lives on each ActionInvocation. The Execution remains the consistency boundary: it owns its invocations and the collected output ref, and a single transaction modifies the aggregate as a whole. The aggregate refuses a zero-invocation Execution at construction, so a zero-target dispatch can never be persisted through any path — independently of the application layer's own empty-cohort refusal.

A constructor (New) mints a fresh UUIDv7 ExecutionID, computes expiresAt as requestedAt + Timeout, and creates one pending ActionInvocation per target Node; a hydrator (Hydrate / HydrateInvocation) rebuilds the aggregate from persisted state, rejecting structurally impossible state (zero ids, zero timestamps, a status outside the closed set) so a corrupt row never round-trips silently.

Schema reference

The Action Orchestrator persists to three tables in the plexsphere schema. The migration that creates them is internal/platform/db/migrations/0043_actions.sql. Every CHECK constraint, FK behaviour, and index below is sourced from that file.

Table	Aggregate part	Notable constraints
`plexsphere.action_execution`	Execution header	App-minted `uuid` PRIMARY KEY (no DB DEFAULT); FK `domain_id`/`project_id REFERENCES plexsphere.domains(id)`/`plexsphere.projects(id) ON DELETE RESTRICT`; CHECK `action_kind IN ('builtin','hook')`; CHECK `timeout_seconds > 0`; `terminal_status` NULLable with CHECK over the closed terminal set `{succeeded, failed, cancelled, timeout}` (NULL = live); `requested_by` carries no FK (the requester may be an API-token service identity absent from a users table).
`plexsphere.action_execution_target`	ActionInvocation	Composite PRIMARY KEY `(execution_id, node_id)` — at most one invocation per Node per Execution is a structural fact; FK `execution_id … ON DELETE CASCADE` (a target belongs to its Execution); FK `node_id REFERENCES plexsphere.nodes(id) ON DELETE RESTRICT`; CHECK `status IN ('pending','ack','started','succeeded','failed','cancelled','timeout')`; `callback_token_hash bytea NOT NULL`.
`plexsphere.action_execution_event`	Transition log	Append-only; `bigint GENERATED ALWAYS AS IDENTITY` PRIMARY KEY gives the monotonic insertion order the timeline reads back; FK `execution_id … ON DELETE CASCADE`; rows are never updated or deleted.

Three indexes back the read paths: action_execution_project_keyset_idx on (project_id, created_at DESC, id DESC) matches the List query's ORDER BY so the keyset scan reads the index without a sort; the partial index action_execution_live_by_domain_idx on (domain_id) WHERE terminal_status IS NULL backs the per-Domain live-execution count without scanning settled history; action_execution_event_execution_idx makes the per-Execution timeline a single range scan.

Why the surrogate id is app-minted

action_execution.id carries an app-minted UUIDv7 with no database DEFAULT. The application mints the id so the row key is time-ordered and the same id can ride the actions.ActionDispatched domain event the orchestrator emits in the same transaction the row is written. A DEFAULT gen_random_uuid() would mint a v4 id the application never sees, splitting the aggregate id from its outbox event. The target's composite (execution_id, node_id) natural key makes "at most one invocation per Node per Execution" a structural fact the SQL layer cannot violate, so a target needs no surrogate id at all.

Why the FK split — RESTRICT on scope, CASCADE on parts

The header FKs domain_id, project_id (and the target's node_id) RESTRICT against their tenancy parents — an Execution carrying live or settled history cannot have its scoping Domain, Project, or a targeted Node deleted out from under it, because that history is operator-authored evidence reconstructible from nowhere else. The target's execution_id and the event log's execution_id CASCADE from the header — a target and an event are parts of the Execution aggregate, not independently meaningful rows, so they are pruned with their root. The DECISION blocks in the migration record the rejected alternatives.

Down-refusal — SQLSTATE `0A000`

The migration's downgrade block refuses the rollback and raises SQL exception code 0A000 (feature_not_supported). The three tables hold the operator-authored record of which actions ran across which Nodes, with which outcomes, plus the append-only transition log that is the evidence trail of how each Execution settled; that history is not reconstructible from anywhere else, so dropping the tables would silently discard live and settled operator evidence. The stance mirrors 0042_bridge_orchestrator.sql and the prior evidence-bearing migrations.

Value objects and error sentinels

The context's value objects live in internal/actions/types.go, its constants in internal/actions/constants.go, and its error sentinels in internal/actions/errors.go.

Value object / constant	Shape	Notes
`ExecutionID`	UUIDv7 wrapper over `[16]byte`	Aggregate surrogate key and event id; time-ordered; canonical 8-4-4-4-12 textual form; zero value is "not yet assigned".
`DomainID`, `ProjectID`, `NodeID`, `ActorID`	`[16]byte`	Opaque external references to the identity context; the composition root adapts them so the actions context stays free of an `internal/identity` import.
`Slug`-free	—	Unlike the bridge context, actions carries no slug value object; an Execution is addressed by id, not by a human-facing handle.
`Parameters`	`json.RawMessage` ≤ 64 KiB	Validated for JSON validity and size at construction; empty input is the zero value, not JSON `null`.
`Timeout`	`time.Duration`, `0 < d ≤ 24h`	Positive-and-bounded invariant at construction; `Seconds()` is what the wire carries.
`CallbackURL`	absolute URL string	Scheme + host required at construction.
`InlineOutput`	`[]byte` ≤ 16 KiB	Size invariant wraps `ErrInlineOutputTooLarge`.
`OutputRef`	tagged union `{inline, object_store}`	The persisted output; the `object_store` variant requires non-empty `(bucket, key, sha256)`.
`MaxInlineOutputBytes`	`16 * 1024`	The inline-output ceiling; output beyond it rides a presigned object-store PUT.
`DefaultLiveExecutionsCap`	`1000`	The default per-Domain live-execution ceiling; it is operator-tunable, so the constant carries the `Default…` prefix to read honestly at the call site.
`MaxParametersBytes`	`64 * 1024`	The dispatch parameter-document size bound.
`MaxTimeout`	`24 * time.Hour`	The upper bound on a dispatch timeout, so a stuck dispatch cannot hold a live-execution slot indefinitely.

The nine domain error sentinels (each mapped onto a transport Problem code under ReBAC and Problem codes):

Sentinel	Meaning
`ErrInvalidStateTransition`	An invocation status was asked to advance along an edge the closed state machine does not permit (an illegal jump or a non-terminal self-edge).
`ErrExecutionAlreadyTerminal`	An invocation already in a terminal status was asked to advance to a different status (a late Node report onto a settled invocation).
`ErrSelectorEmptyCohort`	A dispatch's node selector resolved to zero target Nodes (or every candidate was dropped by the fan-out filter).
`ErrActionNotDeclared`	The requested action is not declared by any Node in the resolved cohort.
`ErrInlineOutputTooLarge`	An inline action output exceeds `MaxInlineOutputBytes`.
`ErrOutputUploadFailed`	The presigned object-store PUT backing an over-ceiling output could not be minted or failed.
`ErrCapacityExceeded`	Admitting one more Execution would push the per-Domain live-execution count past its cap.
`ErrHookIntegrityViolation`	A dispatch target's hook failed its trust-on-first-use integrity check — the declared hook digest drifted from the pinned known-good baseline, or no baseline was ever catalogued (the gate fails closed).
`ErrPermissionDenied`	The requesting actor lacks the ReBAC `act` relation required to dispatch the action.

Mapping note: the HTTP status mapping is owned entirely by the transport layer in internal/transport/http/v1/actions/errors.go; the domain sentinels in errors.go carry no HTTP status. The binding mapping is the table under ReBAC and Problem codes: an empty cohort is 422 (selector_empty_cohort) and a capacity refusal is 429 (capacity_exceeded), conforming to the Action Orchestrator acceptance criteria (the capacity criterion is a SHALL).

Invocation state machine

Each ActionInvocation advances along a closed per-Node state machine enforced by Status.Transition against the legalTransitions roster in internal/actions/types.go. The three non-terminal statuses (pending, ack, started) carry outgoing edges; the four terminal statuses (succeeded, failed, cancelled, timeout) own no outgoing edge except an idempotent self-edge.

mermaid

stateDiagram-v2
    [*] --> pending
    pending --> ack
    ack --> started
    started --> succeeded
    started --> failed
    started --> cancelled
    pending --> timeout
    ack --> timeout
    started --> timeout
    succeeded --> [*]
    failed --> [*]
    cancelled --> [*]
    timeout --> [*]

Two properties of this machine are load-bearing:

timeout is reachable from every non-terminal status — pending, ack, AND started. A background reconciler times out invocations whose Node never reports, and an execution can expire before the Node ever acks; limiting the timeout edge to started-only would strand a never-acked invocation with no terminal status to settle on. The DECISION block on legalTransitions records the rejected started-only alternative.
A terminal status accepts only an idempotent self-edge. A re-posted callback reporting the same terminal status the invocation already rests in is a no-op success; any other advance off a terminal status returns ErrExecutionAlreadyTerminal. A non-terminal self-edge (e.g. pending → pending) is an illegal advance returning ErrInvalidStateTransition.

The callback path drives this machine server-side through a compare-and-set (see Callback path); the reconciler drives the → timeout edge through the same CAS with a hard-coded {pending, ack, started} predecessor set.

Dispatch path

The dispatch application service (internal/actions/services/dispatch.go) admits a named action and fans it out to a cohort of target Nodes via two public methods.

DispatchBySelector resolves an opaque label selector to a Node cohort, then applies five per-node gates in order, dropping (not failing) each rejected node:

Cohort resolution — the selector is resolved within the (Domain, Project) scope through the LabelSelector port, paging until the cursor is exhausted, folding every node-kind ObjectRef into a target NodeID. A cohort that resolves to zero Nodes is refused with ErrSelectorEmptyCohort.
Domain isolation — a node whose Domain differs from the dispatch's is dropped (a foreign-tenant node can never be a target).
Project isolation — a same-Domain node that belongs to a different Project is dropped, even when the selector matched it, so horizontal isolation holds per-target and the domain gate cannot mask a sibling-Project leak. This is the gate the "Cross-project isolation" acceptance scenario pins.
ReBAC act — a node the subject cannot act on is dropped, emitting one denial audit row with the insufficient_relation marker.
Capability — a node that does not declare the requested action is dropped.
Hook integrity — a node whose target hook fails its trust-on-first-use integrity check (a drifted or uncatalogued hook) is dropped, emitting one denial audit row with the hook_integrity_violation relation. A builtin action carries no node-declared hook and skips this gate. The known-good baseline, the byte-for-byte Verified / Drift / Uncatalogued verdict, and the drift integrity_alert are owned by the hook-integrity surface; this context only consults the verdict through its CapabilityReader port and refuses a non-Verified hook.

DispatchByNode names a single explicit target Node and applies the same gates but with a hard refusal rather than a silent drop: a node outside the dispatch's (Domain, Project) scope is refused with ErrSelectorEmptyCohort (an empty single-node cohort — horizontal isolation, including per-target Project isolation, holds independently of the ReBAC graph), a ReBAC denial is a hard ErrPermissionDenied (a 403), a node that does not declare the action yields ErrActionNotDeclared, and a node whose target hook fails its integrity check yields a hard ErrHookIntegrityViolation (a 409) with nothing persisted.

Survivor precedence

When no node survives the DispatchBySelector fan-out filter, the refusal distinguishes the operator's remedy in a fixed precedence (the DECISION block on DispatchBySelector records the rule):

If any node was dropped because its target hook failed its trust-on-first-use integrity check, surface ErrHookIntegrityViolation (a 409) — an integrity failure outranks the two refusals below, so an operator whose whole cohort tripped the gate sees the integrity refusal rather than a capability or selector error.
Else if at least one node reached the capability gate and was dropped solely because it did not declare the action, surface ErrActionNotDeclared — a capability problem (deploy the action to the nodes).
Otherwise (all foreign-domain, all denied, or a mix) surface ErrSelectorEmptyCohort — a selector or permission problem (widen the selector, grant the relation).

Capacity and atomic persist

Both paths funnel through a shared admit tail: a capacity check (admitting one more live Execution must not push the per-Domain live count past liveExecutionsCap, else ErrCapacityExceeded), aggregate + per-target callback-token construction, then an atomic persist in one RunInTx closure that writes the Execution header, the N targets, and the N actions.ActionDispatched outbox rows together — each CreateTarget interleaved with its AppendOutboxEvent so the dispatch commits as a whole or not at all. Each target carries a fresh 32-byte callback token whose sha256 hash is persisted (callback_token_hash is NOT NULL); the plaintext token's delivery to the Node rides the callback NSK auth path.

Only after the transaction commits do the post-commit side effects fire: the live-executions gauge and the dispatch counter advance, one granted audit row is emitted, and — when wired — the WirePublisher fans the dispatch onto the mesh (see ./events.md).

Callback path

The callback application service (internal/actions/services/callback.go) drives one Node-reported status advance through the closed state machine. A target Node reports a status advance for its per-Node invocation via the NSK-authenticated POST /v1/nodes/{id}/executions/{exec_id} surface; the service drives the advance through a server-side compare-and-set, logs the transition, and on a terminal status collects the output and settles the Execution header.

The flow is, in order:

Over-ceiling output handling first — an inline output longer than MaxInlineOutputBytes is a hard refusal (ErrInlineOutputTooLarge); a declared size over the ceiling mints a presigned object-store PUT URL (once) and returns it to the Node before the CAS runs. The presign mint stamps a distinct actions.output_presign audit row.
CAS — the target advances to the reported status only when its current status is a legal predecessor (derived by probing every member of the closed Status set against Status.Transition). The CAS, the transition-log AppendEvent, and — on a terminal status — the header settle all run in one RunInTx so they commit atomically.
On an accepted transition — exactly one granted callback audit row is emitted; on a terminal status the collected OutputRef is persisted and the live gauge decremented.
On a CAS miss — the service re-reads and classifies: an idempotent re-post (current == reported) is a no-op success with no new event or audit; a terminal current status is ErrExecutionAlreadyTerminal; anything else is ErrInvalidStateTransition.

The callback service carries no Authorizer: the calling Node's identity is authenticated at the transport / NSK layer before Apply is invoked, so re-checking a ReBAC relation here would gate the result path on a graph the Node never participates in. The DECISION block on CallbackService records this.

Timeout reconciler

The background reconciler (internal/actions/services/reconciler.go) sweeps the live Executions whose expires_at has passed (a partial-index scan), times out every still-live target through the same CAS (with the {pending, ack, started} predecessor set), settles the header to timeout once all its targets are terminal, and decrements the live gauge. A poisoned row is logged and skipped so one bad row never aborts the whole sweep; the header settle is idempotent (SetExecutionTerminal guards WHERE terminal_status IS NULL), so a concurrent callback that already settled the header makes the timeout settle a no-op.

A timed-out target emits only its transition-log event, not a new typed outbox event — the actions event-type set is closed to actions.ActionDispatched by the AST gate (see ./events.md), and admitting a timeout discriminator would break that closed set. A tracked TODO in the reconciler records emitting a typed timeout outbox event once the closed set is widened in lockstep with the SSE wire-type table.

Two-tier output contract

An action's collected output rides one of two tiers, keyed on the 16 KiB MaxInlineOutputBytes ceiling:

Inline (≤ 16 KiB) — the bounded output bytes are stored in the control plane inside the inline variant of OutputRef. An inline body the Node sends over the ceiling is refused with ErrInlineOutputTooLarge (HTTP 413).
Object-store (> 16 KiB) — when the Node declares an over-ceiling output (declared_output_bytes past the ceiling), the first callback's 200 response carries an output_upload_url: a presigned object-store PUT URL the Node uploads the full body to before sending the terminal callback. The object-store key is deterministic per (exec, node) (<exec>/<node>/output) so a retried over-ceiling callback addresses the same object. The terminal settle then persists the object_store variant of OutputRef carrying (bucket, key, sha256).

When both an object-store key and inline bytes are present on the terminal callback, the object-store ref wins — a Node that already uploaded an over-ceiling body has its authoritative output there, not in any residual inline field. When neither is set, the terminal settle carries the zero OutputRef (a terminal with no collected output, e.g. a failed action that produced nothing).

The presign mint is required for the over-ceiling path: a callback service constructed without an OutputPresigner surfaces ErrOutputUploadFailed (HTTP 502) on the first over-ceiling callback. The presigner is an injected port declared locally in internal/actions/ports.go so the context stays free of an internal/platform/blobstore import; the composition root wires the concrete blobstore-backed presigner behind it.

ReBAC and Problem codes

The act permission gates every dispatch and read. The split between the transport gate and the per-node service gate is deliberate:

The transport layer gates act on the owning project:<project_id> for all three operator-facing operations (DispatchExecution, ListExecutions, GetExecution), resolved directly from the path parameter before any persistence write or body decode, so an unauthorised caller never exercises the JSON decoder and never mints a dispatch. The gate is declared in internal/transport/http/v1/actions/wiring.go.
The dispatch service additionally gates act on each cohort member resource:<node_id> during the fan-out — a selector dispatch drops a node the subject cannot act on; a single-node dispatch refuses with a hard 403. The per-node object string is resource:<node-uuid>, reusing the bridge convention so the authz object model stays uniform across contexts.
The Node callback is not ReBAC-gated. It is authenticated by the Node Secret Key at the transport boundary, with a defence-in-depth execution-target gate: the resolved NSK Node must be a member of the Execution's target cohort, else 403 nsk_node_mismatch — a leaked NSK from Node A cannot post callbacks against an Execution that targeted only Node B. The gate runs before the body is read.

Operation	ReBAC object	Permission
`DispatchExecution` / `ListExecutions` / `GetExecution` (transport gate)	`project:<project_id>`	`act`
Per-cohort-member dispatch fan-out (service gate)	`resource:<node_id>`	`act`
`PostNodeExecutionCallback`	— (NSK auth + execution-target membership)	—

The typed sentinels map onto the closed Problem-code taxonomy at the transport boundary. The operator-facing dispatch / list / get codes are pinned in internal/transport/http/v1/actions/errors.go; the Node callback codes are pinned in internal/transport/http/v1/handlers/executions_callback.go.

Sentinel / condition	HTTP status	Problem code
Malformed label selector	400	`malformed_selector`
Action not declared for the resolved target	400	`action_not_declared`
Both-or-neither / zero `node_id` target	400	`invalid_target`
Malformed dispatch body	400	`invalid_body`
Malformed `execution_id` / `project_id` / `limit` / `cursor`	400	`invalid_execution_id` / `invalid_project_id` / `invalid_limit` / `invalid_cursor`
Selector resolved an empty cohort	422	`selector_empty_cohort`
Per-Domain capacity exceeded	429	`capacity_exceeded`
Dispatch targets a hook failing its integrity check	409	`hook_integrity_violation`
Project not found / not visible	404	`project_not_found`
Execution not found	404	`execution_not_found`
Dispatch body over the cap	413	`request_body_too_large`
ReBAC denial (transport or service)	403	rendered as a `PermissionDenied` body
Dispatch/read surface unprovisioned	501	`actions_dispatch_not_provisioned`
Callback: reported status not a legal advance	409	`invalid_state_transition`
Callback: invocation already terminal	409	`execution_already_terminal`
Callback: inline output over the 16 KiB ceiling	413	`inline_output_too_large`
Callback: NSK belongs to a different Node	403	`nsk_node_mismatch`
Callback surface unprovisioned	501	`execution_callback_not_provisioned`

The 500 path never interpolates the underlying error text into the wire body — raw driver messages can carry SQL fragments or constraint names a caller has no right to see; the detail is logged internally and the body stays generic.

Audit contract

Every transition stamps a verb-style audit relation onto its audit row — an operation namespace, distinct from the ReBAC act permission and from the outbox event literal. The orchestrator emits one audit row per mutation and one per denial, distinguished by the row's Outcome. Denial and invariant-rejection rows are written before the response is flushed so a flaky audit backend cannot land a silent denial; a sink error is made loud via slog rather than swallowed.

Surface	Audit relations	Where pinned
Dispatch service	`actions.dispatch` (granted), `insufficient_relation` (per-node ReBAC denial drop), `hook_integrity_violation` (per-node integrity denial drop)	`dispatch.go`
Callback service	`actions.callback` (accepted transition), `actions.output_presign` (over-ceiling presign mint)	`callback.go`
Reconciler	`actions.timeout` (per timed-out invocation, subject `system`)	`reconciler.go`
Transport (dispatch/list/get)	`execution.dispatch`, `execution.list`, `execution.get`	`wiring.go`
Transport (Node callback)	`node_execution_callback.apply`, `node_execution_callback.path_gate`, `node_execution_callback.node_gate`	`executions_callback.go`

The Outcome values are granted, permission_denied, invariant_violation, and internal_error on the transport surface; the service surface uses granted / denied.

Observability

The application services share a Prometheus collector bundle (internal/actions/services/metrics.go), both collectors carrying exactly the {domain_id} label:

plexsphere_actions_live_executions — a GaugeVec of the live (not-yet-terminal) Executions per Domain. The dispatch service increments it after a dispatch commits; the callback and reconciler decrement it when an Execution settles. The three services share one registry (newMetrics is AlreadyRegistered-tolerant) so a callback's decrement settles the same series the dispatch's increment raised.
plexsphere_actions_executions_total — a CounterVec of admitted dispatches per Domain.

A subject / resource / execution-id label would be unbounded (one new series per principal, target, or dispatch) and is deliberately rejected: that high-cardinality identity belongs in the structured slog line and the audit trail, never in a metric label. The bundle is nil-tolerant — a service constructed without WithMetrics keeps a zero-cost no-op bundle.

Labels SelectorPort consumer story

The Action Orchestrator is one of the four sanctioned consumers of the Label Registry's SelectorPort seam (alongside the Policy Engine, the Provisioning Broker, and Observability — the allow-list is pinned in the internal/labels layout row and in ../labels/index.md). The orchestrator expands "dispatch to every Node matching <selector>" into the concrete Node set at action_request time.

The crossing is kept narrow. The dispatch service declares a context-local LabelSelector port and its value types (ObjectRef, SelectorScope, ListCursor, ObjectRefPage) in internal/actions/ports.go — actions-local mirrors of the labels shapes, not the labels types themselves — because the no-cross-context-imports-actions depguard rule denies the actions context an internal/labels import. The only place the two contexts meet is the composition-root adapter, which parses the opaque selector string and translates to labels.SelectorPort behind this interface. The dispatch service folds the resolved node-kind ObjectRefs into target NodeIDs, paging until the cursor is exhausted.

Deferred downstream

Three concerns build on or around this context and are out of scope here:

The plaintext callback-token delivery. The dispatch service mints a per-target callback token and persists only its sha256 hash; the plaintext's delivery to the Node rides the callback NSK auth path. The durable invariant dispatch guarantees is that every target row carries a unique, unguessable hash from creation so a callback can never match an unset hash.
A typed timeout outbox event. The reconciler emits only the per-target transition-log row on a timeout, not a typed outbox event, because the closed actions event-type set admits only actions.ActionDispatched. Widening the set (and the SSE wire-type table in lockstep) is the tracked follow-up.
The consumer-side SSE wiring. The producer side of the action_request wire fan-out is wired (the dispatch service calls the WirePublisher after it commits); the consumer-side /v1/nodes/{id}/events HTTP plumbing is owned by the Signed Event Bus epic and stays on its 501 stub. See ./events.md.

What this context is not

To keep the boundary sharp, the Action Orchestrator is deliberately NOT:

An action runtime. It dispatches a named action and tracks its per-Node lifecycle; the action itself executes on the Node agent, never in the control plane. This context stores intent and evidence.
An NSK custodian. The per-Node callback is authenticated by the Node Secret Key at the transport boundary; this context's callback service trusts its already-authenticated caller and runs an execution-target membership gate, but it neither issues nor validates the NSK.
A step-up approval gate. The sensitive-operation step-up approval that gates whether a dispatch is permitted is a separate story; this context gates only the act ReBAC permission.
A hook integrity authority. This context gates a hook dispatch on the trust-on-first-use integrity verdict (see Dispatch path), but it neither pins the known-good baseline nor compares digests itself — the baseline, the byte-for-byte Verified / Drift / Uncatalogued classification, and the drift integrity_alert are owned by the hook-integrity surface. This context consults the verdict through its CapabilityReader port and refuses a non-Verified hook.
A dashboard. The operator-facing Actions dashboard view is a separate frontend story; this context exposes only the /v1 HTTP surface and the read projections.

Cross-references

./events.md — the events surface: the single actions.ActionDispatched outbox literal, the action_request payload, and the per-Node SSE wire fan-out.
../../../api/openapi/plexsphere-v1.yaml — the OpenAPI source for the four HTTP operations and the Problem-code catalogue.
../../../internal/actions/ — the domain code; every type named here has a matching symbol.
../../../internal/actions/doc.go — the package godoc and the ubiquitous-language pin this document expands.
../../../internal/platform/db/migrations/0043_actions.sql — the three-table migration.
../labels/index.md — the Label Registry and the SelectorPort consumer allow-list this context routes through.
../../contributing/layout.md — the bounded-context map row enumerating the consumed and exposed ports.

Action Orchestrator — dispatch, per-Node invocations, the two-tier output contract ​

Ubiquitous language ​

Why one aggregate with per-Node entities ​

Schema reference ​

Why the surrogate id is app-minted ​

Why the FK split — RESTRICT on scope, CASCADE on parts ​

Down-refusal — SQLSTATE 0A000 ​

Value objects and error sentinels ​

Invocation state machine ​

Dispatch path ​

Survivor precedence ​

Capacity and atomic persist ​

Callback path ​

Timeout reconciler ​

Two-tier output contract ​

ReBAC and Problem codes ​

Audit contract ​

Observability ​

Labels SelectorPort consumer story ​

Deferred downstream ​

What this context is not ​

Cross-references ​