Appearance
Node Registration Bounded Context
This document is the authoritative bounded-context reference for the Node Registration application service that ships under internal/identity/nodes. It covers the ubiquitous language, the pending → enrolling → ready → deregistered state machine, the atomic in-tx SQL sequence, the per-Domain advisory-lock allocator strategy, the NSK issuance contract, the public-key validation gate, the OpenAPI surface, the tenancy.NodeRegistered outbox payload, the audit and metrics contracts, the threat model, and the boundary between this story and the consumer stories that build on it.
The entrypoint is POST /v1/register — the unauthenticated bootstrap seam where a freshly-installed plexd agent armed with only a BootstrapToken plaintext and its substrate's WireGuard public key exchanges that one-shot credential for a complete Node identity. The exchange is atomic: a single Postgres transaction transitions the agent from "holds a BootstrapToken plaintext" to "holds a (NodeID, mesh_ip, NSK plaintext, signing_public_key + kid, peer_snapshot, domain_mesh_cidr) tuple" or it leaves no observable state behind.
Cross-references
../../contributing/layout.md— the bounded-context map row that locatesinternal/identity/nodesinside the codebase and names the upstream BootstrapToken context plus the downstream consumers (the signed SSE event bus and the reconciliation pull endpoint).../../how-to/enrolment/register-a-node.md— operator runbook for redeeming a BootstrapToken and bringing a plexd agent up againstPOST /v1/register.../../architecture/overview.md— high-level component diagram showing theOperator → Issuer → plexd → Validator → Registrationenrolment data flow.../../../api/openapi/plexsphere-v1.yaml— OpenAPI 3.1 spec; thePostRegisteroperation, theRegisterRequest/RegisterResponseschemas, and thex-plexsphere-once: truemarker on thenskfield are all pinned there.../../../internal/platform/db/migrations/0008_node_secret_keys.sql— schema migration forplexsphere.node_secret_keyandplexsphere.domain_nsk_wrap_key, including the per-Domain wrap-key ledger that backsactive | rotating | retiredrotation state.bootstrap-tokens.md— the upstream bounded-context reference. The Validator port this context invokes (bootstraptokens.Validator.Consume), the precedence ordering, and the(project_id, consumed_nonce)partial UNIQUE that backsErrNonceCollisionare all defined there.
Ubiquitous language
The terms below travel together across the Go code, the audit log, the OpenAPI spec, the SQL migration, and the operator-facing tooling. Names are preserved verbatim in error messages and structured log attributes.
| Term | Definition | Code anchor |
|---|---|---|
| Registration | The atomic flow that consumes a BootstrapToken plaintext and provisions a Node identity. The orchestration is "seven steps" at the application layer (Validate → ParsePlaintext → Resolve → Consume → Commit → SigningKey → PeerSnapshot) and "five SQL steps" at the persistence layer (Consume → Allocate → InsertNode → InsertIPAllocation → InsertNodeSecretKey + AppendOutbox). | internal/identity/nodes/doc.go, registration/service.go |
| MeshIP | The allocator-assigned address from the Domain's mesh CIDR. Returned as the canonical netip.Addr.String() form (100.64.0.1, never zero-padded) on both the response body and the outbox event payload. | MeshIP field on registration/result.go, tenancy/allocator/doc.go |
| PublicKey | The 32-byte X25519 (WireGuard) public half plexd submits at registration time. RFC 7748 §5 fixes the encoding to a 32-byte little-endian point. Validated FIRST: wrong length surfaces as ErrPublicKeyInvalid; the all-zero degenerate value surfaces as ErrPublicKeyAllZero. Both rejections happen BEFORE any BootstrapToken consumption attempt so a malformed key cannot also waste a token. | RegisterCommand.Validate in registration/command.go, ErrPublicKeyInvalid / ErrPublicKeyAllZero in nodes/errors.go |
| NodeSecretKey (NSK) | The per-Node AES-256 secret-delivery key issued at registration. Returned to plexd in plaintext exactly once over TLS; persisted only in wrapped form via the per-Domain wrap-key ledger. The Plaintext value object owns the destruction contract — Destroy() zero-fills the backing slice in place; the type is non-comparable by construction so it cannot be used as a map key. | Plaintext, Wrapped, Issuer in nsk/issuer.go |
| WrapKey | The per-Domain symmetric key that wraps every NSK at rest. Each row in plexsphere.domain_nsk_wrap_key carries a monotonically increasing version and a lifecycle state ∈ {active, rotating, retired}; the partial unique index domain_nsk_wrap_key_active_uq enforces "at most one active version per Domain" at the SQL layer. | 0008_node_secret_keys.sql |
| kid | The wrap-key version identifier the NSK Issuer stamps onto the wrapping. Mirrors the JOSE kid convention and matches the regex ^[A-Za-z0-9._:-]+$ enforced by the SQL CHECK on node_secret_key.kid. Opaque to the registration handler; the wrap-key adapter owns its interpretation. | Issue in nsk/issuer.go, node_secret_key.kid in 0008_node_secret_keys.sql |
| RegistrationTx | The single-Postgres-tx commit path the application service depends on. Reusable across goroutines (zero per-request state); composes the per-Domain advisory lock, the BootstrapToken consume, the allocator sweep, the Node + IP-allocation INSERTs, the NSK Issuer call, the node_secret_key INSERT, and the tenancy.NodeRegistered outbox append into one transaction. | internal/identity/nodes/repo/repository_pg.go |
| Service | The DDD application service that orchestrates the seven-step flow. Stateless and reusable across goroutines; depends only on five narrow ports (BootstrapTokenValidator, TenancyResolver, Committer, SigningKeyResolver, PeerSnapshotProvider) plus optional AuditSink / Registerer collaborators. | registration/service.go |
| Plaintext (NSK) | The in-memory NSK plaintext value object. Owns its backing slice, exposes Destroy() to zero the secret bytes once the response has been written, and is non-comparable by construction (an embedded [0]func() array makes the struct contain a non-comparable field). | Plaintext.Destroy, the [0]func() sentinel field in nsk/issuer.go |
State machine
A Node aggregate transitions through four logical states from the operator's vantage. The pending and enrolling states are in-transaction states the registration commit passes through; from the operator's HTTP-response vantage a successful POST /v1/register returns a Node already in ready. The deregistered terminal state is a forward-looking placeholder owned by the Node-deregistration handler.
mermaid
stateDiagram-v2
[*] --> pending: BootstrapToken consume + advisory lock acquired
pending --> enrolling: allocator + Node + IP + NSK rows pre-commit
enrolling --> ready: tx COMMIT (Node visible to consumers)
ready --> deregistered: Node deregistration handler
deregistered --> [*]
note right of pending
In-tx pre-commit only. A failure between
pending and enrolling rolls back the
BootstrapToken consume — the operator can
retry with the same plaintext token.
end note
note right of ready
First externally-observable state. The
tenancy.NodeRegistered outbox row is
visible to the signed SSE consumer at this
point and the response carries the NSK
plaintext to plexd.
end noteEach transition is bound to a specific code seam:
[*] → pending— the registration tx'sconsumeBootstrapTokenhelper UPDATEs the BootstrapToken row inside the registration pgx.Tx (Step 2 of the in-tx sequence below). The application service runsvalidator.Identify(read-only) BEFORE the registration tx opens — that probe verifies the plaintext and identifies the matchingTokenID, then hands the id to the committer for the in-tx UPDATE. Bound byregistration/service.gostep 4 and the Step 2 SQL statement inrepo/repository_pg.go.pending → enrolling— allocator + Node + IP + NSK rows inserted under the per-Domain advisory lock (steps 3–8 of the RegistrationTx).enrolling → ready— pgx tx COMMIT. Releases the advisory lock automatically and makes every row of the registration visible at once.ready → deregistered— out of scope for this context; the Node-deregistration handler lands alongside the dashboard rotation UI.
Identify-then-commit application flow
The application service performs the BootstrapToken match as a read-only Identify pass OUTSIDE the registration tx; the actual UPDATE bootstrap_token happens INSIDE the registration tx via the shared bootrepo.ConsumeInTx helper that the committer's consumeBootstrapToken step delegates to. The split is intentional:
- The Argon2id verify pass (~150 ms CPU per candidate) holds no SQL resources, so running it outside the tx avoids holding the per-Domain advisory lock across a CPU-heavy phase.
- The actual SQL UPDATE shares the registration tx's commit/rollback fate so a partial-failure rollback on Node insert / NSK issue / outbox append flips
consumed_atback to NULL — the registration atomicity guarantee.
The cost is one extra round-trip to identify the row before the tx opens; the benefit is that the consume + Node + IP + NSK + outbox write all share one MVCC snapshot AND one rollback fate.
Resource adoption on register
The registration flow resolves the request's resource_id handle to a (Resource, Project, Domain) projection via TenancyResolver.ResolveByResourceHandle. An unknown handle surfaces as nodes.ErrResourceNotFound. By default that miss is fatal — the service classifies it as resource_not_found and the handler maps it onto 404. The adoption branch turns that miss into a "synthesise a new Resource and proceed" outcome instead.
Service.shouldAdopt decides whether the miss is adoption-shaped. All three preconditions must hold:
- The resolve error is exactly
nodes.ErrResourceNotFound— any other error (project mismatch, infrastructure failure) is never treated as adoption. - The
RegisterCommandcarries a non-emptyRequestedResourceID— the operator's explicit intent to create. - The service was constructed with a non-nil
ProjectScopeResolver— the deployment opted into adoption.
If any precondition fails the service falls back to the historical 404 resource_not_found behaviour unchanged.
When all three hold, Service.buildAdoptionView runs:
ProjectScopeResolver.ResolveProjectScopelooks up the Project'sDomainIDandDomainMeshCIDR— the scope the rest of the flow needs and would normally have taken from the resolved Resource.- A fresh Resource aggregate id is minted (
tenancy.NewID). - A synthetic
TenancyView(the newResourceIDplus the resolved scope) and aCreateResourceIntentare assembled. The intent carriesOrigin = tenancy.OriginAdopted,ExternalRef = RequestedResourceID, and the resolvedProjectID/DomainID.
The CreateResourceIntent rides on CommitInput.CreateResourceIntent so the registration transaction INSERTs the new plexsphere.resources row (stamped origin = 'Adopted') in the same pgx.Tx as the Node insert — the new Resource and the Node commit or roll back together. The Origin discriminator is the tenancy-side record of this path; see the Resource Origin section of the tenancy reference.
On the adoption branch the success audit row's Object field carries resource:<resourceID> (rather than the canonical node:<nodeID>:<outcome> form) so a forensic search by the freshly-minted Resource id surfaces the registration event.
Atomic in-tx SQL sequence
The RegistrationTx.Commit body runs the following sequence under one pgx.Tx. Mid-flow failure rolls back every preceding step including the BootstrapToken consume, so the operator can retry with the same plaintext. The step numbering below tracks the source — see internal/identity/nodes/repo/repository_pg.go for the canonical implementation.
q.AcquireDomainAllocatorLock(domain_id)—pg_advisory_xact_lock(hash(domain_id)). Releases automatically at COMMIT/ROLLBACK; serialises concurrent registrations within the same Domain.bootrepo.ConsumeInTx— the shared helper performsUPDATE bootstrap_token … RETURNING WHERE consumed_at IS NULL AND revoked_at IS NULL AND expires_at > nowagainst the registration tx's*sqlcgen.Queries. Single-statement single-use enforcement; the(project_id, consumed_nonce)partial UNIQUE rejects duplicate nonces asbootstraptokens.ErrNonceCollision. The disambiguation re-read (revoked / consumed / expired) lives in this helper too — see the BootstrapToken reference's validator-precedence section.allocator.AllocateForNode— in-memory sweep over the live IP ledger (rebuilt under the lock). Returns the first usable host inside the Domain CIDR (or the Project sub-range when a reservation applies). Seetenancy/allocator/doc.go.tenancy.NewNode— aggregate construction re-asserts the WireGuard public-key length and the mesh-IP non-zero invariants inside the tx so any failure rolls back every preceding step.q.RegisterNode— INSERT INTOnode. Trips theUNIQUE (resource_id)constraint asErrConflictif the Resource already has a Node.q.InsertIPAllocation— appends one row todomain_mesh_ip_allocationsso the next allocator restore observes the decision.r.issuer.Issue+q.InsertNodeSecretKey— issues the NSK and persists the wrapped material with itskidandwrap_key_version. TheIssuecall runs OUTSIDE pgx (it is a pure-crypto operation) but INSIDE the advisory-lock window so the wrap binds to the freshly-issuedNodeID. A wrap failure surfaces asnodes.ErrNSKIssuanceFailedwhich the handler maps onto HTTP 500.appendOutbox tenancy.NodeRegistered— appends theNodeRegisteredoutbox row inside the same tx so a downstream relay observes the new Node IFF the registration committed.
Rollback property. Any error returned from any step inside the runInTx closure rolls back the whole tx. The advisory lock releases automatically on COMMIT/ROLLBACK; the BootstrapToken's consumed_at reverts to NULL; the operator can retry with the same plaintext token without re-issuing it. The service-tier integration test tests/integration/register_consume_in_tx_test.go pins this contract by driving bootrepo.ConsumeInTx inside an outer pgx.Tx the test owns, rolling back, and asserting consumed_at IS NULL afterwards.
Allocator hybrid strategy + per-Domain advisory lock
The mesh-IP allocator is a pure in-memory domain service; the repository layer hydrates it once per registration tx, calls AllocateForNode, and persists the new ledger row. The strategy is "hybrid" — the same allocator serves SaaS-style flat-Domain pools and self-hosted Project-sub-range reservations without branching at the call site (tenancy/allocator/doc.go).
- Flat-Domain pool — the default. The Node's mesh-IP is drawn from the Domain CIDR minus every active sub-range reservation.
- Project sub-range reservation — applied when the registering Node's Project carries a row in
project_mesh_ip_reservations. The Node is allocated from the reservation and never from outside it; the SQLGISTexclusion guarantees reservations inside one Domain never overlap.
Serialisation is the repository's responsibility, not the allocator's. RegistrationTx.Commit calls q.AcquireDomainAllocatorLock(domain_id) as the first SQL statement of the tx — the same precedent established by internal/identity/tenancy/repo/node_repo.go for direct Node creation. The lock key is pg_advisory_xact_lock(hash(domain_id)) so concurrent registrations in different Domains do not contend; concurrent registrations in the same Domain serialise so the resulting set of allocated IPs has N distinct values for N concurrent attempts.
The per-Domain pool-exhaustion counter plexsphere_register_pool_exhausted_total is labelled with domain_id and scope so a dashboard can pivot per-Domain capacity events; scope takes one of two values (registration/metrics.go):
scope="domain"— the flat-Domain pool was exhausted. Pinned aspoolScopeDomainin the same file.scope="project_subrange"— a strict-mode Project sub-range was exhausted. The label value is reserved for the strict-mode wiring tracked underTODO(perf, PX-0015)and lands whentenancy/allocator/rules.goexposes the strict-mode toggle.
NSK issuance contract
The NSK Issuer port is the seam between the registration tx and the wrap-key store (internal/identity/nodes/nsk/issuer.go). Implementations MUST:
- Generate exactly
PlaintextSize = 32bytes of cryptographic random material (RFC 5116 §5.1: AEAD_AES_256_GCM keys are 32 bytes). - Wrap the plaintext under the Domain's active wrap key.
- Return
(plaintext *Plaintext, wrapped Wrapped, kid string, err error). Thekidmatches^[A-Za-z0-9._:-]+$; thewrappedbyte length is bounded by the SQL CHECK to[28, 4096]. - Surface failure as a wrapped error the caller can match with
errors.Is; the registration service wraps it withnodes.ErrNSKIssuanceFailedso the handler maps it onto HTTP 500 .
The Plaintext value object owns the destruction contract: pointer methods only, embedded [0]func() sentinel for non-comparability, Destroy() zero-fills in place and re-slices the header to length 0 so any previously-captured slice sees zero bytes. The handler defers Destroy() on the success path immediately after writing the JSON response.
The dev-mode software adapter ships in this story (internal/identity/nodes/nsk/software/). Its livez warning constant software.WarningFlagValue = "nsk-software-provider-dev-only" is concatenated into the /livez HTTP body whenever the software adapter is wired as the registration service's NSK Issuer; production liveness probes branch on this exact token to fail closed when the dev-only adapter is detected. The production OpenBao-backed adapter ships in a separate context.
Public-key validation contract
RegisterCommand.Validate (registration/command.go) runs the WireGuard public-key gate FIRST, BEFORE any scope-field check or BootstrapToken consume attempt:
- Length — exactly
publicKeyLen = 32bytes after base64 decode. Wrong length surfaces asnodes.ErrPublicKeyInvalidwith the offending length in the message text. - All-zero rejection — the all-zero X25519 public key is a degenerate test vector that produces a predictable shared secret (RFC 7748 §5 small-order point).
isAllZerorejects it withnodes.ErrPublicKeyAllZero.
The ordering is deliberate (length precedes all-zero) so a zero-length payload reports as ErrPublicKeyInvalid rather than matching the all-zero branch by coincidence. Both rejections fire BEFORE the service touches the BootstrapToken plaintext — a malformed-key attempt cannot also waste a token.
OpenAPI surface reference
The single operation lives on the bootstrap-tokens tag in api/openapi/plexsphere-v1.yaml :
| Operation | Path | Auth | Notes |
|---|---|---|---|
PostRegister | POST /v1/register | None — the unauthenticated bootstrap seam. The BootstrapToken plaintext IS the credential the redeeming substrate presents; the caller does NOT send an Authorization header. | Returns 200 on success with the RegisterResponse body; 400 on public-key shape failure (token NOT consumed); 403 on Validator denial; 404 on Resource lookup miss when the adoption branch does not apply (token NOT consumed — see Resource adoption on register); 422 on application-boundary invariant failure; 503 on pool exhaustion / allocator contention (token NOT consumed); 500 on NSK issuance / outbox-append failure (token NOT consumed because the tx rolls back). |
RegisterRequest schema
| Field | Type | Notes |
|---|---|---|
project_id | UUID | Project the redeeming substrate is enrolling into. Must match the project_id segment of the plaintext bootstrap_token; mismatch surfaces as 403 with reason=project_mismatch. |
resource_id | string (minLength: 1) | Human-readable Resource handle (e.g. edge-router-01). An unknown handle is resolved one of two ways: if the request carries a requested_resource_id and the deployment wired the adoption path, the service synthesises a new Resource (see Resource adoption on register); otherwise the miss surfaces as 404 with code=resource_not_found. |
bootstrap_token | string (^psb_[a-z]+_[a-z2-7]+_(node|bridge)_[a-z2-7]{20,}$) | Plaintext BootstrapToken in the documented format. The Validator parses the prefix to discover the env + project-id + kind triple before running the candidate scan. |
nonce | string (minLength: 1) | Request-side replay-protection nonce; backed by the partial UNIQUE on (project_id, consumed_nonce). |
public_key | base64 (pattern: ^[A-Za-z0-9+/]{43}=$, length: 44) | 32-byte X25519 public key, standard base64 with padding. Validated for length and all-zero degeneracy at the application boundary BEFORE any BootstrapToken consume. |
requested_resource_id | string, optional | The adoption trigger. When resource_id resolves no existing Resource, a non-empty requested_resource_id is the operator's explicit signal "synthesise a new Resource for this substrate rather than fail with 404". The value is carried into the new Resource as its ExternalRef. Empty / omitted means "no adoption" — an unknown handle then surfaces as 404 resource_not_found. The adoption branch additionally requires the deployment to have wired a ProjectScopeResolver; see Resource adoption on register. |
RegisterResponse schema
| Field | Type | Notes |
|---|---|---|
node_id | UUID (UUIDv7) | Freshly-allocated Node aggregate id. |
mesh_ip | string (canonical netip.Addr.String()) | Allocator-assigned mesh address inside the Domain's mesh CIDR (100.64.0.1, never zero-padded). |
signing_public_key | base64 | Domain's active signing public key. Plexd pins this and uses it to verify signed SSE events delivered by the signed event bus. |
signing_key_id | string | The kid the active signing key is indexed under. Carried so plexd can rotate-aware verify SSE events whose header references a non-current kid during a rotation grace window. |
nsk | base64, x-plexsphere-once: true | Per-Node NSK plaintext (32 bytes). Returned exactly once. No API surface ever returns the plaintext again because the persistence layer stores only the wrapped form. The Spectral rule plexsphere-write-once-post-must-be-issue-response gates this marker to the PostRegister response (and Issue* operations) so the plaintext cannot accidentally appear on any other operation. |
peer_snapshot | array of {node_id, mesh_ip, public_key} | Initial wireguard peer set the registering Node needs to bring its table up. May be empty for the first Node enrolled into a Domain. |
domain_mesh_cidr | string (netip.Prefix) | The Domain's mesh CIDR (e.g. 100.64.0.0/10). Carried so plexd can program its routing table without a follow-up Domain lookup. |
tenancy.NodeRegistered outbox payload
The registration tx appends one outbox row of type tenancy.NodeRegistered (internal/identity/tenancy/events/events.go) inside the same SQL transaction that inserts the Node. Payload mirrors the response by-field; consumers (the signed SSE event bus) read the row through the outbox dispatcher.
| Field | Source | Notes |
|---|---|---|
event_id | UUIDv7 (tenancy.NewID) | Outbox-row identifier. |
occurred_at | UTC timestamp (ConsumedAt) | Same timestamp as the BootstrapToken's consumed_at so audit replay reads consistently across the two tables. |
node_id | the freshly-issued Node id | Same value as the response node_id. |
resource_id | the resolved Resource id | |
project_id | the resolved Project id | |
domain_id | the resolved Domain id | Denormalised so downstream consumers can route without an extra lookup. |
mesh_ip | the allocator-assigned address, netip.Addr.String() | Canonical string form so JSON round-tripping is lossless (100.64.0.1, never zero-padded). |
The outbox append happens in step 8 of the RegistrationTx — same tx as every other write — so a consumer that reads NodeRegistered is guaranteed to find the matching nodes and domain_mesh_ip_allocations rows already committed.
Audit contract
The registration service emits exactly two audit-row outcomes through the AuditSink port (registration/service.go):
| Trigger | Subject | Relation | Outcome string | When |
|---|---|---|---|---|
Service.Register success | service:registration | register | register_complete | Last action on the success path, after the response value passes RegistrationResult.Validate. |
RegisterCommand.Validate rejected the public key | service:registration | register | register_invalid_public_key | Boundary failure BEFORE any BootstrapToken Identify attempt. The Object field carries the literal unknown for the node-id segment because no Node has been allocated yet. |
The exact wire-form is Subject="service:registration", Relation="register", Outcome ∈ {"register_complete", "register_invalid_public_key"} (underscore-separated, NOT dot-separated). Dashboards should filter on subject="service:registration" AND relation="register" to surface every registration-tier audit row regardless of outcome.
Other failure paths (BootstrapToken precedence denials, pool exhaustion, allocator contention, NSK issuance failures) deliberately do NOT emit a registration-layer audit row — the BootstrapToken identifier middleware (bootaudit.IdentifyMiddleware) already records one entry per Identify attempt with Relation="identify" and a double emission at the registration layer would corrupt the audit stream.
Operator note —
identify grantedrows on rolled-back registrations. Identify runs OUTSIDE the registration tx (see Identify-then-commit application flow) andbootaudit.IdentifyMiddlewareemits its row as soon as Identify returns. If the registration tx then rolls back (NSK issuance failure, allocator pool exhaustion, outbox INSERT failure, etc.) the BootstrapToken'sconsumed_atreverts toNULLbut theidentifyaudit row stays. When triaging, correlate everyrelation="identify"row withoutcome="granted"against the matchingrelation="register"row in the same correlation window — a granted Identify with noregister_completecompanion is the signature of a rolled-back registration the operator can safely retry with the same plaintext.
The AuditEntry shape mirrors bootstraptokens/audit.Entry:
| Field | Value |
|---|---|
Subject | service:registration (fixed; see AuditSubject). Mirrors the bootstraptokens convention service:bootstrap-tokens so operators have one stable subject per bounded context to filter on. |
Relation | register (fixed; see AuditRelationRegister). |
Object | node:<node-id-or-"unknown">:<outcome> — a single object LIKE 'node:%' filter catches every entry this bounded context emits. |
Outcome | One of the two stable strings register_complete / register_invalid_public_key. |
Timestamp | The injected clock() at decision time, always UTC. |
The recordAudit helper swallows Sink errors deliberately: the AuditSink is downstream infrastructure (a hash-chained sink, a Kafka producer, etc.); a failure there must not abort a successful registration. The Sink implementation is expected to log its own failure for operator visibility — mirrors the bootstraptokens/audit.recordOrLog policy.
Metrics
Two Prometheus collectors surface from the registration service (registration/metrics.go):
plexsphere_register_total{outcome}— counter incremented on everyService.Registerattempt. Theoutcomelabel takes one of the values below; operators alert onrate()of specific failure outcomes.plexsphere_register_pool_exhausted_total{domain_id, scope}— counter incremented when the mesh-IP allocator returnsnodes.ErrPoolExhausted. Distinct fromregister_total{outcome="pool_exhausted"}so dashboards can pivot per-Domain without re-aggregating buckets.scope∈{domain, project_subrange}.
Outcome alphabet:
| Label value | Triggered by |
|---|---|
complete | Successful Service.Register. |
invalid_public_key | nodes.ErrPublicKeyInvalid or nodes.ErrPublicKeyAllZero. |
bootstrap_token_invalid | bootstraptokens.ParsePlaintext failed. |
project_mismatch | Plaintext-encoded project_id disagrees with the request's project_id, OR consumeRes.ProjectID disagrees with the resolved view.ProjectID. |
kind_mismatch | bootstraptokens.ErrKindMismatch. |
token_consumed | bootstraptokens.ErrTokenConsumed. |
token_expired | bootstraptokens.ErrTokenExpired. |
token_revoked | bootstraptokens.ErrTokenRevoked. |
nonce_collision | bootstraptokens.ErrNonceCollision. |
token_not_found | bootstraptokens.ErrTokenNotFound. |
resource_not_found | nodes.ErrResourceNotFound from the TenancyResolver. |
pool_exhausted | nodes.ErrPoolExhausted from the allocator. |
allocator_contention | nodes.ErrAllocatorContention from the per-Domain advisory lock. |
nsk_issuance_failed | nodes.ErrNSKIssuanceFailed from the NSK Issuer port. |
internal | Default fallthrough — any error not matched by the classifier above. |
Both collectors tolerate prometheus.AlreadyRegisteredError so multiple Service instances in one test share the same series; a nil registerer makes every observation a no-op.
Threat model
The threat model below names the attacker shapes this context defends against, the invariant that holds the line, and the integration test that proves the invariant. Walk this table before approving any change to the validation, allocation, NSK issuance, or persistence path.
| Threat | Defence | Test anchor |
|---|---|---|
| Replay — same plaintext + same nonce presented twice (or in concurrent races). | Three layers: (a) validator.Consume is a single UPDATE … RETURNING WHERE consumed_at IS NULL; (b) the (project_id, consumed_nonce) partial UNIQUE rejects the second envelope as ErrNonceCollision; (c) when 32 goroutines race the same plaintext, exactly one succeeds and 31 see ErrTokenConsumed. | tests/integration/register_concurrent_replay_test.go |
| Cross-Project token use — token issued for Project A presented at Project B. | Project ID is encoded in the plaintext; the Validator rejects mismatch before the candidate scan and the registration service additionally cross-checks consumeRes.ProjectID == view.ProjectID after tenancy.ResolveByResourceHandle so a schema-drift bug in the resolver still trips the gate. | tests/integration/register_authz_test.go |
| Cross-kind confusion — Bridge token presented at the Node-kind register seam (or vice-versa). | Kind is encoded in the plaintext; ParsePlaintext extracts it and the registration service forwards it as validator.ConsumeParams.ExpectedKind so kind mismatch surfaces as ErrKindMismatch before the candidate hash compare runs. | tests/integration/register_authz_test.go |
| Atomicity violation — partial registration leaves token consumed but Node missing (or vice-versa). | All seven SQL writes plus the outbox append run inside one pgx.Tx. Any error in steps 1–8 rolls back the whole tx, including the BootstrapToken consume; the operator can retry with the same plaintext token. | tests/integration/register_atomicity_test.go |
| Pool-exhaustion DoS — attacker racing registrations to drain the Domain mesh CIDR or burn BootstrapTokens. | The per-Domain register_pool_exhausted_total counter pivots capacity events without bucket re-aggregation; nodes.ErrPoolExhausted does NOT consume the token (the registration tx rolls back); the per-Domain advisory lock prevents two Domains saturating one allocator goroutine. | tests/integration/register_pool_exhaustion_test.go, tests/integration/register_concurrent_distinct_domains_test.go |
| Weak public key — caller presents an all-zero X25519 public key (RFC 7748 §5 small-order point) or a wrong-length payload. | RegisterCommand.Validate enforces length=32 BEFORE all-zero, both BEFORE any BootstrapToken consume; the rejected attempt does NOT waste a token; register_invalid_public_key is the audit and metric outcome on rejection. | tests/integration/register_publickey_validation_test.go |
What this context is NOT
This context ships the application service, persistence tx, HTTP handler, OpenAPI surface, audit + metrics emission, and the dev-mode software wrap-key adapter for Node Registration. Adjacent capabilities live in dedicated contexts so this context's surface stays minimal:
- NOT the SSE signed event bus. This context emits the
tenancy.NodeRegisteredoutbox row inside the registration tx; the consumer that reads the outbox, signs the event, and pushes it down the SSE channel is the signed SSE event bus. The outbox row is the seam the two contexts share. - NOT the reconciliation pull
GET /v1/nodes/{id}/state. plexd uses a long-poll reconciliation endpoint to discover peer-set drift after registration; that endpoint is the Node-state reconciliation pull. This context returns the initial peer snapshot in the registration response; subsequent drift is the reconciliation pull endpoint's responsibility. - NOT the dashboard signing-key / NSK rotation UI. Operators trigger signing-key rotation and NSK re-wrap from the dashboard; that surface is the dashboard signing-key rotation track. This context ships the wrap-key ledger schema (
plexsphere.domain_nsk_wrap_keywithactive | rotating | retired) so the rotation UI lands without a fresh migration. - NOT the OpenBao / Vault wrap-key store. This context ships the
nsk.Issuerabstraction and the dev-mode software adapter only; the production OpenBao-backed adapter ships in a separate context. The/livezwarning constantnsk-software-provider-dev-onlylets production liveness probes fail closed when the dev adapter is detected. - NOT the deregistration handler. Removing a Node from a Domain, releasing its mesh-IP back to the allocator, and emitting the
tenancy.NodeDeregisteredoutbox row is owned by the Node-deregistration handler. This context issues identities; the deregistration handler retires them.