Appearance
Bridge validation — cross-aggregate refusals before persist
This document is the authoritative bounded-context reference for the bridge validation pipeline — the stateless application service the four bridge application services invoke on every Configure / Update path to enforce the cross-row invariants the per-aggregate boundary deliberately cannot see. It covers what the validator is, the four refusals it can return, the total failure-precedence order and the per-entry-point prefix each candidate runs, the three sub-validators and their precise inputs and outputs, the wire Problem-code catalogue the refusals map onto, the observability counters, and why the validator writes no event.
The validator is a stateless, read-only cross-aggregate scan. Each bridge service runs it after the ReBAC permission check has authorised the caller and before the persist transaction opens, so the scan sits between authorisation and the write. It runs outside the persistence RunInTx transaction: the outbound ACME network probe it may perform never holds a write transaction (and its row locks) open for the duration of a network round-trip. Every entry point returns either nil — the candidate configuration is applicable — or exactly one of four cross-aggregate refusals. It validates one candidate aggregate against the sibling rows and cross-Domain state the per-aggregate boundary invariants cannot reach; it never re-validates the shape invariants each aggregate already owns.
Slug and SNI-host uniqueness are deliberately not the validator's job. Because the validator runs outside the write transaction, any uniqueness scan it ran would carry a time-of-check-to-time-of-use gap: a concurrent request could insert the same slug between the validator's read and the caller's write, so a validator-side check could only ever be advisory. The per-table unique indexes close that race authoritatively, and the repository layer classifies the resulting unique-index violation onto the slug-conflict refusal. The validator therefore scopes itself to the genuinely cross-aggregate invariants the single-table constraints cannot express — port collisions across aggregate tables, subnet overlaps as prefix containment, and ACME issuance feasibility — and leaves single-table uniqueness to the constraints that own it.
Validator taxonomy
The validator composes three sub-validators that yield four distinct refusals. Each refusal is a cross-aggregate invariant the per-table SQL constraints cannot express, mapped at the transport boundary onto a wire Problem code and an HTTP status.
| Check | Guards | Cross-aggregate invariant | Refusal (Problem code, HTTP) |
|---|---|---|---|
| Host-port conflict | Candidate relay listen port, user-access provider listen port, or public-ingress forward target | The relay singleton listen port and every user-access provider listen port share one host-port namespace per Resource; each ingress rule occupies a distinct (target node, target port) slot | bridge_port_conflict (409) |
| Subnet overlap with mesh | Candidate site-to-site tunnel allowed subnets | A tunnel's allowed subnets must not overlap the Domain mesh CIDR the bridge Resource's Domain owns | subnet_overlap_with_mesh (409) |
| Subnet overlap with sibling tunnel | Candidate site-to-site tunnel allowed subnets | A tunnel's allowed subnets must not overlap any sibling tunnel's allowed subnets within the same Resource | subnet_overlap_with_tunnel (409) |
| ACME issuance feasibility | Candidate public-ingress rule requesting ACME issuance | An ingress rule whose account reference requests ACME issuance must point at a reachable, well-formed ACME directory | acme_directory_unreachable (422) |
Failure precedence
The pipeline sits between authorisation and persistence, and when a candidate could trip more than one stage the first stage in a fixed total order wins, so the operator and the audit log always see the highest-precedence cause rather than whichever check happened to run first.
text
ReBAC check ──► validator ──► persist (RunInTx)
(authorise) (this doc) (write + outbox + audit)
total order: port ─► subnet-mesh ─► subnet-tunnel ─► acme
per entry point (prefix of the total order it can violate):
ValidateRelay ─► port
ValidateUserAccess ─► port
ValidateIngress ─► port ─► acme
ValidateSiteToSite ─► subnet-mesh ─► subnet-tunnelNot every candidate runs every stage. Each entry point runs only the prefix of the total order its candidate can actually violate: a relay or user-access provider can only collide a host port, an ingress rule can collide a forward-target port and then — only when it requests ACME issuance — fail the feasibility probe, and a site-to-site tunnel can overlap the mesh CIDR and then a sibling tunnel.
Two ordering choices are deliberate:
- Mesh precedes sibling-tunnel. A subnet that overlaps the Domain mesh CIDR would collide with the bridge daemon's own mesh interface and break Node reachability — a structural impossibility. A subnet that overlaps only a sibling tunnel is a mere forwarding-table ambiguity. Reporting the mesh cause first surfaces the structural fault ahead of the lesser routing one.
- Port precedes ACME. The port scan is a cheap in-memory comparison; the ACME probe performs a network round-trip. Checking the port first refuses a colliding ingress slot without spending an outbound ACME round-trip, and fixes the cheaper, structural refusal as the higher-precedence cause.
Sub-validators
Host-port conflict
The host-port scan compares a candidate's claimed port against the sibling rows that share its host-port namespace. The relay singleton listen port and every user-access provider listen port share one host-port namespace per Resource: a candidate on either side of that namespace conflicts with any sibling claiming the same port. A relay candidate is scanned against the user-access providers (the relay is a singleton, so there is no relay-versus-relay comparison and no self-exclusion); a user-access candidate is scanned against both the relay's listen port and every sibling provider's listen port.
Each public-ingress rule instead occupies a separate (target node, target port) slot. An ingress candidate conflicts only with a sibling rule forwarding to the same node on the same port — the same port on a different node is a distinct slot and is allowed. On an update the candidate carries its own surrogate id, so its own row is excluded from the scan and re-saving an unchanged aggregate does not self-conflict; on a create the surrogate id is the zero value and no row is excluded. A refusal names the colliding port — and, for ingress, the colliding node — so the audit reason can carry it, and is the host-port-conflict refusal.
Subnet overlap
The subnet-overlap scan applies only to a site-to-site tunnel candidate. A tunnel's allowed subnets must not overlap two things the per-aggregate boundary cannot see: the Domain mesh CIDR the bridge Resource's Domain owns (read through the Domain-mesh-CIDR reader), nor the allowed subnets of any sibling tunnel within the same Resource. The two checks run in a deliberate order — mesh first, sibling tunnels second — so a candidate overlapping both surfaces the higher-precedence mesh cause.
Overlap is a prefix-containment test, not a string match: a subnet overlaps another when either contains the other. A default route — the /0 prefix that covers the whole address space — overlaps everything, so a tunnel that routes /0 is refused against any non-empty mesh CIDR or sibling tunnel. Disjoint subnets pass. An absent (zero) mesh prefix means the Domain has no mesh CIDR to overlap, so the mesh stage is a no-op and the scan falls through to the sibling-tunnel comparison. On an update the candidate's own tunnel row is excluded so a no-op re-save does not collide with itself; on a create no row is excluded. A mesh-overlap refusal and a sibling-tunnel-overlap refusal are distinct refusals, each naming both conflicting prefixes in its detail.
ACME feasibility
The ACME-feasibility probe applies only to a public-ingress rule, and only when the rule requests ACME issuance — signalled by a non-nil account reference. A nil account reference means the rule terminates TLS with operator-supplied certificates, so the probe is skipped entirely and the prober is never consulted.
When issuance is requested, the validator probes the account's ACME directory at the configuration boundary so a late certificate-issuance failure at the bridge daemon becomes a synchronous refusal at the operator's request. The probe is a read-only discovery round-trip against the operator-configured directory URL, bounded by the PLEXSPHERE_BRIDGE_ACME_PROBER_TIMEOUT duration (ten seconds when the knob is unset; a non-positive or unparseable value is refused at boot), that asserts the directory advertises the new-account, new-order, and new-nonce endpoints issuance needs. Every failure mode — an unreachable directory, a timeout, a DNS failure, a server error, or a malformed directory document — is first logged with structured fields (the account reference and the underlying error) so the failure is never silently swallowed, and then collapsed onto the single ACME-feasibility refusal.
The directory URL is deliberately confined to the prober adapter at the composition root; it travels only inside the probe error the validator wraps and is never surfaced to the bridge domain or its structured log record. Keeping the third-party ACME endpoint out of the bridge domain matches the prober port's own contract. The probe's wall-clock duration is recorded whether the probe ultimately passes or refuses, so the latency histogram reflects every round-trip.
Problem-code catalogue
The four domain refusals are mapped onto wire Problem codes at the transport boundary. All four stamp the audit outcome invariant_violation — the platform declining to materialise an operator-requested configuration whose cross-row invariant the per-table SQL constraints cannot express — which is distinct from the conflict outcome the slug-conflict 409 uses.
| Problem code | HTTP | Raised by (operations) | Audit outcome | Meaning |
|---|---|---|---|---|
bridge_port_conflict | 409 | relay PUT, user-access POST / PATCH, ingress POST / PATCH | invariant_violation | A host port is already claimed by a sibling aggregate within the bridge Resource. |
subnet_overlap_with_mesh | 409 | site-to-site POST / PATCH | invariant_violation | A tunnel allowed subnet overlaps the Domain mesh CIDR. |
subnet_overlap_with_tunnel | 409 | site-to-site POST / PATCH | invariant_violation | A tunnel allowed subnet overlaps a sibling tunnel within the bridge Resource. |
acme_directory_unreachable | 422 | ingress POST / PATCH | invariant_violation | The ACME directory for the configured account reference is unreachable, errored, timed out, or advertised a malformed document. |
These four are wire codes mapped from the bridge domain sentinels at the transport boundary, not raw domain errors. For the full request and response shapes, status codes, and the complete bridge wire Problem-code catalogue see the bridge HTTP API reference at ../../reference/api/bridge.md.
Each refusal also stamps the conflicting values onto the audit row's reason field, so a compliance auditor can triage which axis fired and which values collided without parsing the client-facing Problem body. The validator carries the values out of the scan as a typed error (a port-conflict, subnet-overlap, or certificate-feasibility wrapper that still unwraps to its sentinel), and the transport boundary reads them back through errors.As to build the reason. The format is an axis token followed by key=value pairs:
| Axis | Audit reason shape |
|---|---|
| Host-port conflict | port_conflict host_port=<port> — or port_conflict target_port=<port> target_node=<node-id> for an ingress slot |
| Subnet overlap (mesh) | subnet_overlap_mesh subnet=<candidate> mesh_cidr=<mesh> |
| Subnet overlap (tunnel) | subnet_overlap_tunnel subnet=<candidate> sibling_subnet=<sibling> |
| ACME unreachable | acme_directory_unreachable account_ref=<ref> |
The ACME reason names only the operator-configured account reference; the directory URL the probe failed against stays confined to the prober adapter and the validator's structured log, exactly as the ACME feasibility section above describes, so the third-party endpoint never reaches the audit trail. The client-facing Problem body detail stays generic — the conflicting values live on the audit row, not on the wire response.
Observability
The validator owns two Prometheus collectors.
- A refusal counter,
plexsphere_bridge_validator_refusals_total, partitioned by anaxislabel that names the invariant that fired:port_conflict,subnet_overlap_mesh,subnet_overlap_tunnel, oracme_unreachable. A refusal increments the counter for its axis; operators alert on the rate to catch an operator repeatedly tripping a port collision, a subnet overlap, or an unreachable ACME directory. - An ACME probe-duration histogram,
plexsphere_bridge_validator_acme_probe_duration_seconds, recording the wall-clock duration of each outbound ACME directory probe so operators can watch the upper buckets for a slow or flapping directory.
A transient read error from one of the sibling-row scanners is propagated as-is, never collapsed onto a refusal sentinel, and is not counted in the refusal counter. The counter therefore measures genuine invariant violations rather than transient read failures, and a flaky reader surfaces as a generic internal error rather than a mislabelled invariant refusal.
Relationship to events
A validator refusal happens before persistence. Because the write transaction never opens, a refused command writes no outbox row and produces no bridge_config_updated wire event — the only path that emits an event is a committed mutation. A refusal is therefore visible to the operator as an HTTP Problem response and to the audit log as an invariant_violation row, but it leaves no trace on the per-Node SSE fan-out. For the closed outbox literals and the bridge_config_updated wire fan-out a committed change drives, see ./events.md.
Cross-references
./model.md— Bridge Orchestrator, the upstream authoring surface whose four aggregates the validator scans across../events.md— the bridge events surface a committed mutation drives; a refusal produces none of it.../mesh/peers.md— the Key and Peer Manager, the source of the Domain mesh CIDR the subnet-overlap scan reads.../../reference/api/bridge.md— the bridge HTTP API reference with the full wire Problem-code catalogue the four refusals map onto.