Skip to content

Node Registration Bounded Context

This document is the authoritative bounded-context reference for the Node Registration application service that ships under internal/identity/nodes. It covers the ubiquitous language, the pending → enrolling → ready → deregistered state machine, the atomic in-tx SQL sequence, the per-Domain advisory-lock allocator strategy, the NSK issuance contract, the public-key validation gate, the OpenAPI surface, the tenancy.NodeRegistered outbox payload, the audit and metrics contracts, the threat model, and the boundary between this story and the consumer stories that build on it.

The entrypoint is POST /v1/register — the unauthenticated bootstrap seam where a freshly-installed plexd agent armed with only a BootstrapToken plaintext and its substrate's WireGuard public key exchanges that one-shot credential for a complete Node identity. The exchange is atomic: a single Postgres transaction transitions the agent from "holds a BootstrapToken plaintext" to "holds a (NodeID, mesh_ip, NSK plaintext, signing_public_key + kid, peer_snapshot, domain_mesh_cidr) tuple" or it leaves no observable state behind.

Cross-references

  • ../../contributing/layout.md — the bounded-context map row that locates internal/identity/nodes inside the codebase and names the upstream BootstrapToken context plus the downstream consumers (the signed SSE event bus and the reconciliation pull endpoint).
  • ../../how-to/enrolment/register-a-node.md — operator runbook for redeeming a BootstrapToken and bringing a plexd agent up against POST /v1/register.
  • ../../architecture/overview.md — high-level component diagram showing the Operator → Issuer → plexd → Validator → Registration enrolment data flow.
  • ../../../api/openapi/plexsphere-v1.yaml — OpenAPI 3.1 spec; the PostRegister operation, the RegisterRequest / RegisterResponse schemas, and the x-plexsphere-once: true marker on the nsk field are all pinned there.
  • ../../../internal/platform/db/migrations/0008_node_secret_keys.sql — schema migration for plexsphere.node_secret_key and plexsphere.domain_nsk_wrap_key, including the per-Domain wrap-key ledger that backs active | rotating | retired rotation state.
  • bootstrap-tokens.md — the upstream bounded-context reference. The Validator port this context invokes (bootstraptokens.Validator.Consume), the precedence ordering, and the (project_id, consumed_nonce) partial UNIQUE that backs ErrNonceCollision are all defined there.

Ubiquitous language

The terms below travel together across the Go code, the audit log, the OpenAPI spec, the SQL migration, and the operator-facing tooling. Names are preserved verbatim in error messages and structured log attributes.

TermDefinitionCode anchor
RegistrationThe atomic flow that consumes a BootstrapToken plaintext and provisions a Node identity. The orchestration is "seven steps" at the application layer (Validate → ParsePlaintext → Resolve → Consume → Commit → SigningKey → PeerSnapshot) and "five SQL steps" at the persistence layer (Consume → Allocate → InsertNode → InsertIPAllocation → InsertNodeSecretKey + AppendOutbox).internal/identity/nodes/doc.go, registration/service.go
MeshIPThe allocator-assigned address from the Domain's mesh CIDR. Returned as the canonical netip.Addr.String() form (100.64.0.1, never zero-padded) on both the response body and the outbox event payload.MeshIP field on registration/result.go, tenancy/allocator/doc.go
PublicKeyThe 32-byte X25519 (WireGuard) public half plexd submits at registration time. RFC 7748 §5 fixes the encoding to a 32-byte little-endian point. Validated FIRST: wrong length surfaces as ErrPublicKeyInvalid; the all-zero degenerate value surfaces as ErrPublicKeyAllZero. Both rejections happen BEFORE any BootstrapToken consumption attempt so a malformed key cannot also waste a token.RegisterCommand.Validate in registration/command.go, ErrPublicKeyInvalid / ErrPublicKeyAllZero in nodes/errors.go
NodeSecretKey (NSK)The per-Node AES-256 secret-delivery key issued at registration. Returned to plexd in plaintext exactly once over TLS; persisted only in wrapped form via the per-Domain wrap-key ledger. The Plaintext value object owns the destruction contract — Destroy() zero-fills the backing slice in place; the type is non-comparable by construction so it cannot be used as a map key.Plaintext, Wrapped, Issuer in nsk/issuer.go
WrapKeyThe per-Domain symmetric key that wraps every NSK at rest. Each row in plexsphere.domain_nsk_wrap_key carries a monotonically increasing version and a lifecycle state{active, rotating, retired}; the partial unique index domain_nsk_wrap_key_active_uq enforces "at most one active version per Domain" at the SQL layer.0008_node_secret_keys.sql
kidThe wrap-key version identifier the NSK Issuer stamps onto the wrapping. Mirrors the JOSE kid convention and matches the regex ^[A-Za-z0-9._:-]+$ enforced by the SQL CHECK on node_secret_key.kid. Opaque to the registration handler; the wrap-key adapter owns its interpretation.Issue in nsk/issuer.go, node_secret_key.kid in 0008_node_secret_keys.sql
RegistrationTxThe single-Postgres-tx commit path the application service depends on. Reusable across goroutines (zero per-request state); composes the per-Domain advisory lock, the BootstrapToken consume, the allocator sweep, the Node + IP-allocation INSERTs, the NSK Issuer call, the node_secret_key INSERT, and the tenancy.NodeRegistered outbox append into one transaction.internal/identity/nodes/repo/repository_pg.go
ServiceThe DDD application service that orchestrates the seven-step flow. Stateless and reusable across goroutines; depends only on five narrow ports (BootstrapTokenValidator, TenancyResolver, Committer, SigningKeyResolver, PeerSnapshotProvider) plus optional AuditSink / Registerer collaborators.registration/service.go
Plaintext (NSK)The in-memory NSK plaintext value object. Owns its backing slice, exposes Destroy() to zero the secret bytes once the response has been written, and is non-comparable by construction (an embedded [0]func() array makes the struct contain a non-comparable field).Plaintext.Destroy, the [0]func() sentinel field in nsk/issuer.go

State machine

A Node aggregate transitions through four logical states from the operator's vantage. The pending and enrolling states are in-transaction states the registration commit passes through; from the operator's HTTP-response vantage a successful POST /v1/register returns a Node already in ready. The deregistered terminal state is a forward-looking placeholder owned by the Node-deregistration handler.

mermaid
stateDiagram-v2
    [*] --> pending: BootstrapToken consume + advisory lock acquired
    pending --> enrolling: allocator + Node + IP + NSK rows pre-commit
    enrolling --> ready: tx COMMIT (Node visible to consumers)
    ready --> deregistered: Node deregistration handler
    deregistered --> [*]
    note right of pending
      In-tx pre-commit only. A failure between
      pending and enrolling rolls back the
      BootstrapToken consume — the operator can
      retry with the same plaintext token.
    end note
    note right of ready
      First externally-observable state. The
      tenancy.NodeRegistered outbox row is
      visible to the signed SSE consumer at this
      point and the response carries the NSK
      plaintext to plexd.
    end note

Each transition is bound to a specific code seam:

  • [*] → pending — the registration tx's consumeBootstrapToken helper UPDATEs the BootstrapToken row inside the registration pgx.Tx (Step 2 of the in-tx sequence below). The application service runs validator.Identify (read-only) BEFORE the registration tx opens — that probe verifies the plaintext and identifies the matching TokenID, then hands the id to the committer for the in-tx UPDATE. Bound by registration/service.go step 4 and the Step 2 SQL statement in repo/repository_pg.go.
  • pending → enrolling — allocator + Node + IP + NSK rows inserted under the per-Domain advisory lock (steps 3–8 of the RegistrationTx).
  • enrolling → ready — pgx tx COMMIT. Releases the advisory lock automatically and makes every row of the registration visible at once.
  • ready → deregistered — out of scope for this context; the Node-deregistration handler lands alongside the dashboard rotation UI.

Identify-then-commit application flow

The application service performs the BootstrapToken match as a read-only Identify pass OUTSIDE the registration tx; the actual UPDATE bootstrap_token happens INSIDE the registration tx via the shared bootrepo.ConsumeInTx helper that the committer's consumeBootstrapToken step delegates to. The split is intentional:

  • The Argon2id verify pass (~150 ms CPU per candidate) holds no SQL resources, so running it outside the tx avoids holding the per-Domain advisory lock across a CPU-heavy phase.
  • The actual SQL UPDATE shares the registration tx's commit/rollback fate so a partial-failure rollback on Node insert / NSK issue / outbox append flips consumed_at back to NULL — the registration atomicity guarantee.

The cost is one extra round-trip to identify the row before the tx opens; the benefit is that the consume + Node + IP + NSK + outbox write all share one MVCC snapshot AND one rollback fate.

Resource adoption on register

The registration flow resolves the request's resource_id handle to a (Resource, Project, Domain) projection via TenancyResolver.ResolveByResourceHandle. An unknown handle surfaces as nodes.ErrResourceNotFound. By default that miss is fatal — the service classifies it as resource_not_found and the handler maps it onto 404. The adoption branch turns that miss into a "synthesise a new Resource and proceed" outcome instead.

Service.shouldAdopt decides whether the miss is adoption-shaped. All three preconditions must hold:

  1. The resolve error is exactly nodes.ErrResourceNotFound — any other error (project mismatch, infrastructure failure) is never treated as adoption.
  2. The RegisterCommand carries a non-empty RequestedResourceID — the operator's explicit intent to create.
  3. The service was constructed with a non-nil ProjectScopeResolver — the deployment opted into adoption.

If any precondition fails the service falls back to the historical 404 resource_not_found behaviour unchanged.

When all three hold, Service.buildAdoptionView runs:

  • ProjectScopeResolver.ResolveProjectScope looks up the Project's DomainID and DomainMeshCIDR — the scope the rest of the flow needs and would normally have taken from the resolved Resource.
  • A fresh Resource aggregate id is minted (tenancy.NewID).
  • A synthetic TenancyView (the new ResourceID plus the resolved scope) and a CreateResourceIntent are assembled. The intent carries Origin = tenancy.OriginAdopted, ExternalRef = RequestedResourceID, and the resolved ProjectID / DomainID.

The CreateResourceIntent rides on CommitInput.CreateResourceIntent so the registration transaction INSERTs the new plexsphere.resources row (stamped origin = 'Adopted') in the same pgx.Tx as the Node insert — the new Resource and the Node commit or roll back together. The Origin discriminator is the tenancy-side record of this path; see the Resource Origin section of the tenancy reference.

On the adoption branch the success audit row's Object field carries resource:<resourceID> (rather than the canonical node:<nodeID>:<outcome> form) so a forensic search by the freshly-minted Resource id surfaces the registration event.

Atomic in-tx SQL sequence

The RegistrationTx.Commit body runs the following sequence under one pgx.Tx. Mid-flow failure rolls back every preceding step including the BootstrapToken consume, so the operator can retry with the same plaintext. The step numbering below tracks the source — see internal/identity/nodes/repo/repository_pg.go for the canonical implementation.

  1. q.AcquireDomainAllocatorLock(domain_id)pg_advisory_xact_lock(hash(domain_id)). Releases automatically at COMMIT/ROLLBACK; serialises concurrent registrations within the same Domain.
  2. bootrepo.ConsumeInTx — the shared helper performs UPDATE bootstrap_token … RETURNING WHERE consumed_at IS NULL AND revoked_at IS NULL AND expires_at > now against the registration tx's *sqlcgen.Queries. Single-statement single-use enforcement; the (project_id, consumed_nonce) partial UNIQUE rejects duplicate nonces as bootstraptokens.ErrNonceCollision. The disambiguation re-read (revoked / consumed / expired) lives in this helper too — see the BootstrapToken reference's validator-precedence section.
  3. allocator.AllocateForNode — in-memory sweep over the live IP ledger (rebuilt under the lock). Returns the first usable host inside the Domain CIDR (or the Project sub-range when a reservation applies). See tenancy/allocator/doc.go.
  4. tenancy.NewNode — aggregate construction re-asserts the WireGuard public-key length and the mesh-IP non-zero invariants inside the tx so any failure rolls back every preceding step.
  5. q.RegisterNode — INSERT INTO node. Trips the UNIQUE (resource_id) constraint as ErrConflict if the Resource already has a Node.
  6. q.InsertIPAllocation — appends one row to domain_mesh_ip_allocations so the next allocator restore observes the decision.
  7. r.issuer.Issue + q.InsertNodeSecretKey — issues the NSK and persists the wrapped material with its kid and wrap_key_version. The Issue call runs OUTSIDE pgx (it is a pure-crypto operation) but INSIDE the advisory-lock window so the wrap binds to the freshly-issued NodeID. A wrap failure surfaces as nodes.ErrNSKIssuanceFailed which the handler maps onto HTTP 500.
  8. appendOutbox tenancy.NodeRegistered — appends the NodeRegistered outbox row inside the same tx so a downstream relay observes the new Node IFF the registration committed.

Rollback property. Any error returned from any step inside the runInTx closure rolls back the whole tx. The advisory lock releases automatically on COMMIT/ROLLBACK; the BootstrapToken's consumed_at reverts to NULL; the operator can retry with the same plaintext token without re-issuing it. The service-tier integration test tests/integration/register_consume_in_tx_test.go pins this contract by driving bootrepo.ConsumeInTx inside an outer pgx.Tx the test owns, rolling back, and asserting consumed_at IS NULL afterwards.

Allocator hybrid strategy + per-Domain advisory lock

The mesh-IP allocator is a pure in-memory domain service; the repository layer hydrates it once per registration tx, calls AllocateForNode, and persists the new ledger row. The strategy is "hybrid" — the same allocator serves SaaS-style flat-Domain pools and self-hosted Project-sub-range reservations without branching at the call site (tenancy/allocator/doc.go).

  • Flat-Domain pool — the default. The Node's mesh-IP is drawn from the Domain CIDR minus every active sub-range reservation.
  • Project sub-range reservation — applied when the registering Node's Project carries a row in project_mesh_ip_reservations. The Node is allocated from the reservation and never from outside it; the SQL GIST exclusion guarantees reservations inside one Domain never overlap.

Serialisation is the repository's responsibility, not the allocator's. RegistrationTx.Commit calls q.AcquireDomainAllocatorLock(domain_id) as the first SQL statement of the tx — the same precedent established by internal/identity/tenancy/repo/node_repo.go for direct Node creation. The lock key is pg_advisory_xact_lock(hash(domain_id)) so concurrent registrations in different Domains do not contend; concurrent registrations in the same Domain serialise so the resulting set of allocated IPs has N distinct values for N concurrent attempts.

The per-Domain pool-exhaustion counter plexsphere_register_pool_exhausted_total is labelled with domain_id and scope so a dashboard can pivot per-Domain capacity events; scope takes one of two values (registration/metrics.go):

  • scope="domain" — the flat-Domain pool was exhausted. Pinned as poolScopeDomain in the same file.
  • scope="project_subrange" — a strict-mode Project sub-range was exhausted. The label value is reserved for the strict-mode wiring tracked under TODO(perf, PX-0015) and lands when tenancy/allocator/rules.go exposes the strict-mode toggle.

NSK issuance contract

The NSK Issuer port is the seam between the registration tx and the wrap-key store (internal/identity/nodes/nsk/issuer.go). Implementations MUST:

  1. Generate exactly PlaintextSize = 32 bytes of cryptographic random material (RFC 5116 §5.1: AEAD_AES_256_GCM keys are 32 bytes).
  2. Wrap the plaintext under the Domain's active wrap key.
  3. Return (plaintext *Plaintext, wrapped Wrapped, kid string, err error). The kid matches ^[A-Za-z0-9._:-]+$; the wrapped byte length is bounded by the SQL CHECK to [28, 4096].
  4. Surface failure as a wrapped error the caller can match with errors.Is; the registration service wraps it with nodes.ErrNSKIssuanceFailed so the handler maps it onto HTTP 500 .

The Plaintext value object owns the destruction contract: pointer methods only, embedded [0]func() sentinel for non-comparability, Destroy() zero-fills in place and re-slices the header to length 0 so any previously-captured slice sees zero bytes. The handler defers Destroy() on the success path immediately after writing the JSON response.

The dev-mode software adapter ships in this story (internal/identity/nodes/nsk/software/). Its livez warning constant software.WarningFlagValue = "nsk-software-provider-dev-only" is concatenated into the /livez HTTP body whenever the software adapter is wired as the registration service's NSK Issuer; production liveness probes branch on this exact token to fail closed when the dev-only adapter is detected. The production OpenBao-backed adapter ships in a separate context.

Public-key validation contract

RegisterCommand.Validate (registration/command.go) runs the WireGuard public-key gate FIRST, BEFORE any scope-field check or BootstrapToken consume attempt:

  1. Length — exactly publicKeyLen = 32 bytes after base64 decode. Wrong length surfaces as nodes.ErrPublicKeyInvalid with the offending length in the message text.
  2. All-zero rejection — the all-zero X25519 public key is a degenerate test vector that produces a predictable shared secret (RFC 7748 §5 small-order point). isAllZero rejects it with nodes.ErrPublicKeyAllZero.

The ordering is deliberate (length precedes all-zero) so a zero-length payload reports as ErrPublicKeyInvalid rather than matching the all-zero branch by coincidence. Both rejections fire BEFORE the service touches the BootstrapToken plaintext — a malformed-key attempt cannot also waste a token.

OpenAPI surface reference

The single operation lives on the bootstrap-tokens tag in api/openapi/plexsphere-v1.yaml :

OperationPathAuthNotes
PostRegisterPOST /v1/registerNone — the unauthenticated bootstrap seam. The BootstrapToken plaintext IS the credential the redeeming substrate presents; the caller does NOT send an Authorization header.Returns 200 on success with the RegisterResponse body; 400 on public-key shape failure (token NOT consumed); 403 on Validator denial; 404 on Resource lookup miss when the adoption branch does not apply (token NOT consumed — see Resource adoption on register); 422 on application-boundary invariant failure; 503 on pool exhaustion / allocator contention (token NOT consumed); 500 on NSK issuance / outbox-append failure (token NOT consumed because the tx rolls back).

RegisterRequest schema

FieldTypeNotes
project_idUUIDProject the redeeming substrate is enrolling into. Must match the project_id segment of the plaintext bootstrap_token; mismatch surfaces as 403 with reason=project_mismatch.
resource_idstring (minLength: 1)Human-readable Resource handle (e.g. edge-router-01). An unknown handle is resolved one of two ways: if the request carries a requested_resource_id and the deployment wired the adoption path, the service synthesises a new Resource (see Resource adoption on register); otherwise the miss surfaces as 404 with code=resource_not_found.
bootstrap_tokenstring (^psb_[a-z]+_[a-z2-7]+_(node|bridge)_[a-z2-7]{20,}$)Plaintext BootstrapToken in the documented format. The Validator parses the prefix to discover the env + project-id + kind triple before running the candidate scan.
noncestring (minLength: 1)Request-side replay-protection nonce; backed by the partial UNIQUE on (project_id, consumed_nonce).
public_keybase64 (pattern: ^[A-Za-z0-9+/]{43}=$, length: 44)32-byte X25519 public key, standard base64 with padding. Validated for length and all-zero degeneracy at the application boundary BEFORE any BootstrapToken consume.
requested_resource_idstring, optionalThe adoption trigger. When resource_id resolves no existing Resource, a non-empty requested_resource_id is the operator's explicit signal "synthesise a new Resource for this substrate rather than fail with 404". The value is carried into the new Resource as its ExternalRef. Empty / omitted means "no adoption" — an unknown handle then surfaces as 404 resource_not_found. The adoption branch additionally requires the deployment to have wired a ProjectScopeResolver; see Resource adoption on register.

RegisterResponse schema

FieldTypeNotes
node_idUUID (UUIDv7)Freshly-allocated Node aggregate id.
mesh_ipstring (canonical netip.Addr.String())Allocator-assigned mesh address inside the Domain's mesh CIDR (100.64.0.1, never zero-padded).
signing_public_keybase64Domain's active signing public key. Plexd pins this and uses it to verify signed SSE events delivered by the signed event bus.
signing_key_idstringThe kid the active signing key is indexed under. Carried so plexd can rotate-aware verify SSE events whose header references a non-current kid during a rotation grace window.
nskbase64, x-plexsphere-once: truePer-Node NSK plaintext (32 bytes). Returned exactly once. No API surface ever returns the plaintext again because the persistence layer stores only the wrapped form. The Spectral rule plexsphere-write-once-post-must-be-issue-response gates this marker to the PostRegister response (and Issue* operations) so the plaintext cannot accidentally appear on any other operation.
peer_snapshotarray of {node_id, mesh_ip, public_key}Initial wireguard peer set the registering Node needs to bring its table up. May be empty for the first Node enrolled into a Domain.
domain_mesh_cidrstring (netip.Prefix)The Domain's mesh CIDR (e.g. 100.64.0.0/10). Carried so plexd can program its routing table without a follow-up Domain lookup.

tenancy.NodeRegistered outbox payload

The registration tx appends one outbox row of type tenancy.NodeRegistered (internal/identity/tenancy/events/events.go) inside the same SQL transaction that inserts the Node. Payload mirrors the response by-field; consumers (the signed SSE event bus) read the row through the outbox dispatcher.

FieldSourceNotes
event_idUUIDv7 (tenancy.NewID)Outbox-row identifier.
occurred_atUTC timestamp (ConsumedAt)Same timestamp as the BootstrapToken's consumed_at so audit replay reads consistently across the two tables.
node_idthe freshly-issued Node idSame value as the response node_id.
resource_idthe resolved Resource id
project_idthe resolved Project id
domain_idthe resolved Domain idDenormalised so downstream consumers can route without an extra lookup.
mesh_ipthe allocator-assigned address, netip.Addr.String()Canonical string form so JSON round-tripping is lossless (100.64.0.1, never zero-padded).

The outbox append happens in step 8 of the RegistrationTx — same tx as every other write — so a consumer that reads NodeRegistered is guaranteed to find the matching nodes and domain_mesh_ip_allocations rows already committed.

Audit contract

The registration service emits exactly two audit-row outcomes through the AuditSink port (registration/service.go):

TriggerSubjectRelationOutcome stringWhen
Service.Register successservice:registrationregisterregister_completeLast action on the success path, after the response value passes RegistrationResult.Validate.
RegisterCommand.Validate rejected the public keyservice:registrationregisterregister_invalid_public_keyBoundary failure BEFORE any BootstrapToken Identify attempt. The Object field carries the literal unknown for the node-id segment because no Node has been allocated yet.

The exact wire-form is Subject="service:registration", Relation="register", Outcome ∈ {"register_complete", "register_invalid_public_key"} (underscore-separated, NOT dot-separated). Dashboards should filter on subject="service:registration" AND relation="register" to surface every registration-tier audit row regardless of outcome.

Other failure paths (BootstrapToken precedence denials, pool exhaustion, allocator contention, NSK issuance failures) deliberately do NOT emit a registration-layer audit row — the BootstrapToken identifier middleware (bootaudit.IdentifyMiddleware) already records one entry per Identify attempt with Relation="identify" and a double emission at the registration layer would corrupt the audit stream.

Operator note — identify granted rows on rolled-back registrations. Identify runs OUTSIDE the registration tx (see Identify-then-commit application flow) and bootaudit.IdentifyMiddleware emits its row as soon as Identify returns. If the registration tx then rolls back (NSK issuance failure, allocator pool exhaustion, outbox INSERT failure, etc.) the BootstrapToken's consumed_at reverts to NULL but the identify audit row stays. When triaging, correlate every relation="identify" row with outcome="granted" against the matching relation="register" row in the same correlation window — a granted Identify with no register_complete companion is the signature of a rolled-back registration the operator can safely retry with the same plaintext.

The AuditEntry shape mirrors bootstraptokens/audit.Entry:

FieldValue
Subjectservice:registration (fixed; see AuditSubject). Mirrors the bootstraptokens convention service:bootstrap-tokens so operators have one stable subject per bounded context to filter on.
Relationregister (fixed; see AuditRelationRegister).
Objectnode:<node-id-or-"unknown">:<outcome> — a single object LIKE 'node:%' filter catches every entry this bounded context emits.
OutcomeOne of the two stable strings register_complete / register_invalid_public_key.
TimestampThe injected clock() at decision time, always UTC.

The recordAudit helper swallows Sink errors deliberately: the AuditSink is downstream infrastructure (a hash-chained sink, a Kafka producer, etc.); a failure there must not abort a successful registration. The Sink implementation is expected to log its own failure for operator visibility — mirrors the bootstraptokens/audit.recordOrLog policy.

Metrics

Two Prometheus collectors surface from the registration service (registration/metrics.go):

  • plexsphere_register_total{outcome} — counter incremented on every Service.Register attempt. The outcome label takes one of the values below; operators alert on rate() of specific failure outcomes.
  • plexsphere_register_pool_exhausted_total{domain_id, scope} — counter incremented when the mesh-IP allocator returns nodes.ErrPoolExhausted. Distinct from register_total{outcome="pool_exhausted"} so dashboards can pivot per-Domain without re-aggregating buckets. scope{domain, project_subrange}.

Outcome alphabet:

Label valueTriggered by
completeSuccessful Service.Register.
invalid_public_keynodes.ErrPublicKeyInvalid or nodes.ErrPublicKeyAllZero.
bootstrap_token_invalidbootstraptokens.ParsePlaintext failed.
project_mismatchPlaintext-encoded project_id disagrees with the request's project_id, OR consumeRes.ProjectID disagrees with the resolved view.ProjectID.
kind_mismatchbootstraptokens.ErrKindMismatch.
token_consumedbootstraptokens.ErrTokenConsumed.
token_expiredbootstraptokens.ErrTokenExpired.
token_revokedbootstraptokens.ErrTokenRevoked.
nonce_collisionbootstraptokens.ErrNonceCollision.
token_not_foundbootstraptokens.ErrTokenNotFound.
resource_not_foundnodes.ErrResourceNotFound from the TenancyResolver.
pool_exhaustednodes.ErrPoolExhausted from the allocator.
allocator_contentionnodes.ErrAllocatorContention from the per-Domain advisory lock.
nsk_issuance_failednodes.ErrNSKIssuanceFailed from the NSK Issuer port.
internalDefault fallthrough — any error not matched by the classifier above.

Both collectors tolerate prometheus.AlreadyRegisteredError so multiple Service instances in one test share the same series; a nil registerer makes every observation a no-op.

Threat model

The threat model below names the attacker shapes this context defends against, the invariant that holds the line, and the integration test that proves the invariant. Walk this table before approving any change to the validation, allocation, NSK issuance, or persistence path.

ThreatDefenceTest anchor
Replay — same plaintext + same nonce presented twice (or in concurrent races).Three layers: (a) validator.Consume is a single UPDATE … RETURNING WHERE consumed_at IS NULL; (b) the (project_id, consumed_nonce) partial UNIQUE rejects the second envelope as ErrNonceCollision; (c) when 32 goroutines race the same plaintext, exactly one succeeds and 31 see ErrTokenConsumed.tests/integration/register_concurrent_replay_test.go
Cross-Project token use — token issued for Project A presented at Project B.Project ID is encoded in the plaintext; the Validator rejects mismatch before the candidate scan and the registration service additionally cross-checks consumeRes.ProjectID == view.ProjectID after tenancy.ResolveByResourceHandle so a schema-drift bug in the resolver still trips the gate.tests/integration/register_authz_test.go
Cross-kind confusion — Bridge token presented at the Node-kind register seam (or vice-versa).Kind is encoded in the plaintext; ParsePlaintext extracts it and the registration service forwards it as validator.ConsumeParams.ExpectedKind so kind mismatch surfaces as ErrKindMismatch before the candidate hash compare runs.tests/integration/register_authz_test.go
Atomicity violation — partial registration leaves token consumed but Node missing (or vice-versa).All seven SQL writes plus the outbox append run inside one pgx.Tx. Any error in steps 1–8 rolls back the whole tx, including the BootstrapToken consume; the operator can retry with the same plaintext token.tests/integration/register_atomicity_test.go
Pool-exhaustion DoS — attacker racing registrations to drain the Domain mesh CIDR or burn BootstrapTokens.The per-Domain register_pool_exhausted_total counter pivots capacity events without bucket re-aggregation; nodes.ErrPoolExhausted does NOT consume the token (the registration tx rolls back); the per-Domain advisory lock prevents two Domains saturating one allocator goroutine.tests/integration/register_pool_exhaustion_test.go, tests/integration/register_concurrent_distinct_domains_test.go
Weak public key — caller presents an all-zero X25519 public key (RFC 7748 §5 small-order point) or a wrong-length payload.RegisterCommand.Validate enforces length=32 BEFORE all-zero, both BEFORE any BootstrapToken consume; the rejected attempt does NOT waste a token; register_invalid_public_key is the audit and metric outcome on rejection.tests/integration/register_publickey_validation_test.go

What this context is NOT

This context ships the application service, persistence tx, HTTP handler, OpenAPI surface, audit + metrics emission, and the dev-mode software wrap-key adapter for Node Registration. Adjacent capabilities live in dedicated contexts so this context's surface stays minimal:

  • NOT the SSE signed event bus. This context emits the tenancy.NodeRegistered outbox row inside the registration tx; the consumer that reads the outbox, signs the event, and pushes it down the SSE channel is the signed SSE event bus. The outbox row is the seam the two contexts share.
  • NOT the reconciliation pull GET /v1/nodes/{id}/state. plexd uses a long-poll reconciliation endpoint to discover peer-set drift after registration; that endpoint is the Node-state reconciliation pull. This context returns the initial peer snapshot in the registration response; subsequent drift is the reconciliation pull endpoint's responsibility.
  • NOT the dashboard signing-key / NSK rotation UI. Operators trigger signing-key rotation and NSK re-wrap from the dashboard; that surface is the dashboard signing-key rotation track. This context ships the wrap-key ledger schema (plexsphere.domain_nsk_wrap_key with active | rotating | retired) so the rotation UI lands without a fresh migration.
  • NOT the OpenBao / Vault wrap-key store. This context ships the nsk.Issuer abstraction and the dev-mode software adapter only; the production OpenBao-backed adapter ships in a separate context. The /livez warning constant nsk-software-provider-dev-only lets production liveness probes fail closed when the dev adapter is detected.
  • NOT the deregistration handler. Removing a Node from a Domain, releasing its mesh-IP back to the allocator, and emitting the tenancy.NodeDeregistered outbox row is owned by the Node-deregistration handler. This context issues identities; the deregistration handler retires them.