Skip to content

Cloud-init and DaemonSet bootstrap provisioning

This is the bootstrap-injection half of the Provisioning Broker: how a bootstrap token the reconcile loop mints reaches the substrate it just declared, and how the plexd agent on that substrate consumes the token to self-enrol into the mesh. broker.md owns the ProvisionedResource aggregate, the eight-phase status state machine, and the render-and-apply pipeline; this page is the render-side companion that documents the three injection strategies the pipeline threads a token through — the two new self-enrolment channels (a VM via cloud-init user-data, a Kubernetes cluster via a plexd DaemonSet) and the unchanged provider-secret leaf — and the apply-time invariants that keep the injected material alive across re-apply ticks.

The broker never connects inward to the substrate it provisions. It threads the token into the rendered Crossplane v2 Composite Resource, the blueprint Composition copies the field onto the provider's input, and the booting substrate pulls its own enrolment material and calls back out to the control plane's POST /v1/register. The control plane is the single inbound endpoint; the token is the only secret that crosses the boundary, and it crosses single-use and short-lived.

The renderers are pure functions in internal/provisioning/broker/render/: no cluster client, no logging, and the token plaintext never reaches a log line, an error string, or an event payload.

Cross-references

The two self-enrolment channels

A blueprint's InjectionStrategy names how the booting substrate receives its token. Two of the three strategies drive a self-enrolment channel where plexsphere ships no agent and opens no inbound connection; the third is the unchanged opaque-leaf strategy a provider-specific Composition consumes on its own.

ChannelStrategySubstrateHow the token arrives
Cloud-init VMcloud-init-user-dataA cloud VM (Hetzner, OpenStack, AWS EC2).The broker renders a full #cloud-config document into spec.userData; the Composition patches it onto the provider's user-data input; the cloud serves it through its instance metadata service; cloud-init runs it on first boot.
Kubernetes DaemonSethelm-values (broker-rendered, plexd Helm chart) / applied bundleA Kubernetes cluster — either one the broker provisions or one an operator already runs.The broker threads the token plus the control-plane URL and plexd image into spec.parameters.helmValues, which the Composition feeds a plexd Helm release; or an operator applies the multi-document DaemonSet bundle DaemonSetBundle renders directly.
Provider secretprovider-secretAny substrate whose Composition reads an opaque token leaf.The broker threads the bare token into spec.parameters.providerSecret.bootstrapToken; the Composition consumes it however the provider needs. Unchanged from the earlier broker milestone.

The rest of this page documents the cloud-init document contract, the DaemonSet bundle, the per-strategy field map the injection renderer writes, and the apply-time rules that keep all of it alive.

The user-data document contract

CloudInitUserData (render/cloudinit.go) renders the first-boot enrolment document a Provisioned VM consumes. It takes three required inputs — the control-plane base URL, the plexd binary download URL, and the minted token plaintext — and returns the full payload: the literal #cloud-config header line followed by a YAML body.

The rendered document does exactly three things on first boot:

  1. Drops the token and the environment file. A write_files entry writes the token plaintext to /etc/plexd/bootstrap-token at mode 0600, and a second entry writes PLEXD_API_URL=<api-base-url> to /etc/plexd/plexd.env at mode 0644. The token file is owner-read only; the env file carries no secret — only the public control-plane URL — so it is world-readable.
  2. Installs plexd. A runcmd step fetches the binary from the plexd download URL with curl -fsSL … -o /usr/local/bin/plexd and chmod +x it.
  3. Registers. The final runcmd step runs plexd register --bootstrap-token-file=/etc/plexd/bootstrap-token --api-url=<api-base-url>.

Two contract details are load-bearing:

  • The register step reads the token from the dropped file, never from the command line. The token already lives at /etc/plexd/bootstrap-token at mode 0600; passing --bootstrap-token=<plaintext> on the argv would copy the bearer into the process argument list and the cloud-init execution log on the VM for no gain. The --bootstrap-token-file form keeps the plaintext to exactly one on-disk location.
  • The #cloud-config header is prepended verbatim, not marshalled. cloud-init requires the literal first line #cloud-config to treat the payload as a cloud-config document. A YAML comment would not survive a marshal round-trip, so the header is never part of the typed struct — it is appended ahead of the marshalled body as raw bytes.

CloudInitUserData fails closed wrapping broker.ErrInvalidInput when any input is empty or whitespace, or when either URL is not an absolute http(s) URL with a host. Every validation error names the offending field — never the value — so a misrender is diagnosable without the token plaintext ever reaching the error surface.

The metadata-service delivery model

The cloud-init document is not pushed to the VM by plexsphere. It rides the cloud's instance metadata service:

  1. The broker threads the rendered document into the Composite Resource's spec.userData.
  2. The blueprint Composition patches spec.userData onto the provider's user-data input (FromCompositeFieldPath: spec.userData → spec.forProvider.userData).
  3. The Crossplane provider passes that value to the cloud's instance API as the instance user-data.
  4. The cloud stores it and serves it through its per-instance metadata endpoint (for example the link-local 169.254.169.254 service).
  5. cloud-init on the booting VM reads its user-data from that endpoint on first boot and executes the write_files + runcmd program.

This is why the token's secrecy posture matters more than transport encryption would suggest. The user-data document — token and all — is readable by anything on the instance that can reach the metadata service, and it persists in the cloud's user-data store for the instance's lifetime. plexsphere does not control that store, so the token is the one secret it deliberately makes single-use and short-lived: the redemption window is bounded by the token TTL, and the first successful POST /v1/register consumes it, so a token recovered from instance metadata after first boot is already inert. See Token secrecy and TTL posture.

The DaemonSet bundle

DaemonSetBundle (render/daemonset.go) renders the multi-document YAML bundle a Kubernetes target applies to run a self-enrolling plexd DaemonSet. The bundle carries three documents:

  1. A plexsphere-system Namespace.
  2. A token-delivery object — an embedded Secret or an External-Secrets-Operator ExternalSecret (see below).
  3. The plexd DaemonSet itself.

The DaemonSet mounts the delivered Secret named plexd-bootstrap-token at /etc/plexd with defaultMode 0440, projecting the token key onto the file bootstrap-token — so the on-disk path is /etc/plexd/bootstrap-token, the exact path the register --bootstrap-token-file=… argv reads, identical to the cloud-init channel. Every pod runs hardened: non-root UID 65532, fsGroup 65532, readOnlyRootFilesystem, all Linux capabilities dropped, no privilege escalation, the RuntimeDefault seccomp profile, and CPU/memory requests and limits. The token volume is read-only on a read-only root filesystem. The fsGroup re-groups the projected Secret to the non-root runtime so the 0440 (owner + group read) token file is readable through its group bit; an owner-only 0400 would leave the Secret owned root:root and deny the 65532 UID — the agent would crash-loop on open /etc/plexd/bootstrap-token: permission denied.

DaemonSetBundle fails closed wrapping broker.ErrInvalidInput when the api URL is not absolute http(s), when the plexd image is untagged or floats on :latest (a node agent must pin a reproducible image), or when the token-delivery fields are inconsistent with the selected mode.

Two token-delivery modes

The TokenDelivery field is a closed enum; its zero value is invalid so a caller that forgets to set it fails closed.

ModeToken-delivery documentCarries plaintext?When
secret (TokenDeliverySecret)A Kubernetes Secret whose stringData.token is the minted plaintext, shipped in the bundle.Yes.plexsphere provisions the cluster, mints the token, and ships it embedded. Requires a non-empty plaintext; an empty one is rejected.
eso (TokenDeliveryESO)An ExternalSecret (external-secrets.io/v1) that materialises the plexd-bootstrap-token Secret from a ClusterSecretStore at refreshInterval: 0s.No.An operator adopts a cluster they already run and pushes the token into their own secret store. The bundle must carry no plaintext; a non-empty one, or an empty store name or remote key, is rejected.

The ExternalSecret is rendered as an unstructured map literal rather than a typed external-secrets Go struct: render/ already builds Crossplane objects unstructured to avoid coupling the broker to a provider module's release cadence, and the ESO resource is a small, stable shape. Its refreshInterval: 0s fetches the token once on create and never re-reconciles it — the bootstrap token is single-use, so a rotating refresh would be wasted work that risks re-materialising a value the upstream store has since rotated away.

The ESO mode and the adoption pattern

The eso mode is the programmatic emitter of the bundle shape adoption.md documents for a human operator. Both converge on one contract:

  • the Secret is named plexd-bootstrap-token in the plexsphere-system namespace;
  • the token lives under the token data key;
  • the ExternalSecret reads it from a ClusterSecretStore with refreshInterval: 0s;
  • the DaemonSet mounts the Secret at /etc/plexd/bootstrap-token.

A broker-rendered ESO bundle and a hand-authored adoption ExternalSecret are therefore interchangeable: the broker renders what the adoption how-to walks an operator through by hand, and plexsphere never reads from the customer's secret store in either case.

The injected-XR field map

InjectBootstrap (render/injection.go) threads a BootstrapMaterial into the rendered Composite Resource at the single site the blueprint's InjectionStrategy names. A given XR ever carries exactly one strategy's fields.

StrategyXR field(s) writtenShapeMaterial fields consumed
cloud-init-user-dataspec.userDataA full #cloud-config document (rendered by CloudInitUserData).TokenPlaintext, APIBaseURL, PlexdDownloadURL
helm-valuesspec.parameters.helmValues.bootstrapToken, …​.apiUrl, …​.plexdImageThree scalar leaves.TokenPlaintext, APIBaseURL, PlexdImage
provider-secretspec.parameters.providerSecret.bootstrapTokenOne scalar leaf.TokenPlaintext

Two design points distinguish the cloud-init strategy from the earlier broker behaviour:

  • Cloud-init writes a whole document, not a bare token. The earlier broker dropped the bare token at spec.parameters.cloudInit.bootstrapToken and left the Composition to assemble the user-data. The reshaped Cloud-Init VM XRD now exposes spec.userData as the single field the Composition patches onto the provider, and a first-boot VM needs the whole enrolment program (token file, plexd install, register invocation) — not just the token. Rendering the complete document in the broker keeps the Composition a verbatim FromCompositeFieldPath copy and makes the broker the single owner of the enrolment contract.
  • Injection is fail-closed. A nil XR, a strategy outside the closed taxonomy, or material missing a field the selected strategy consumes yields an error wrapping broker.ErrInvalidInput rather than a half-threaded Composite Resource. As with every renderer, the token plaintext is threaded only into the XR object and never into the returned error, a log line, or any other surface.

The three knobs the cloud-init and helm-values strategies consume — APIBaseURL, PlexdDownloadURL, PlexdImage — arrive from the composition root; see The three composition-root knobs.

The force-apply preservation rule

The broker reconcile loop re-applies the Composite Resource every tick under a stable Server-Side Apply field manager (plexsphere-provisioning-broker) with force-ownership. This is the normal steady state — an apply against an already-present object is an idempotent no-op write — but it interacts with bootstrap injection in a way that needs a deliberate guard.

A bootstrap token is minted exactly once per ProvisionedResource, gated by the aggregate's HasBootstrapToken. That splits the apply path into two kinds of tick:

  • Mint tick — the token was issued this tick, so the plaintext is in hand. renderObjects calls InjectBootstrap, and the rendered XR carries the strategy's bootstrap field(s).
  • Re-apply tickHasBootstrapToken is already true, so the issuer is not called and there is no plaintext. The freshly-rendered XR omits the bootstrap field(s) entirely.

A force Server-Side Apply prunes any field the owning manager no longer declares. On a re-apply tick the rendered XR omits the bootstrap fields, so a naïve force-apply would delete the previously-injected spec.userData (or helm/provider leaf) the booting node depends on.

liveBootstrapFields (reconcile/apply.go) closes that gap. On a re-apply tick it reads the broker-owned bootstrap fields back from the live XR and re-threads them onto the freshly-rendered object before the force-apply — exactly the idiom parameterValues uses to read spec.parameters back. The set of owned paths is closed:

text
spec.userData
spec.parameters.helmValues.bootstrapToken
spec.parameters.helmValues.apiUrl
spec.parameters.helmValues.plexdImage
spec.parameters.providerSecret.bootstrapToken

The read is strategy-agnostic: it copies whichever of the five paths exist on the live XR. A given XR only ever carries one strategy's fields, so reading all five copies exactly the populated one and leaves the rest absent — no empty-string writes, no re-deriving the strategy. On the first apply the XR does not exist yet (apierrors.IsNotFound), so liveBootstrapFields is a no-op: the mint path injects the fields on the tick the token is issued, and this guard only protects the re-apply ticks that follow. Re-supplying the same live values is an idempotent no-op write.

The same read-back-the-live-XR posture is why required blueprint parameters carry the limitation documented in broker.md — the broker persists neither the parameter values nor the rendered bootstrap document on the aggregate, treating the live XR as their durable home once applied.

The three composition-root knobs

The cloud-init and helm-values strategies need three values the broker domain cannot invent: where the booting node registers, where it fetches plexd, and which plexd image a Helm release deploys. They are required configuration, read once at boot.

Environment variableReconcile Config fieldThreaded intoValidation
PLEXSPHERE_PROVISIONING_BROKER_ENROL_BASE_URLEnrolBaseURLBootstrapMaterial.APIBaseURL — the cloud-init plexd.env + register --api-url, the helm-values apiUrl leaf, and the DaemonSet api URL.Absolute http(s) URL with a host.
PLEXSPHERE_PROVISIONING_BROKER_PLEXD_DOWNLOAD_URLPlexdDownloadURLBootstrapMaterial.PlexdDownloadURL — the cloud-init runcmd curl that fetches the plexd binary.Absolute http(s) URL with a host.
PLEXSPHERE_PROVISIONING_BROKER_PLEXD_IMAGEPlexdImageBootstrapMaterial.PlexdImage — the helm-values plexdImage leaf (and the DaemonSet image the e2e suite pins).Tagged or digest-pinned; the floating :latest tag and a tag-less reference are rejected.

All three are required when PLEXSPHERE_DSN is set — that is, whenever the provisioning broker is wired at all. The composition root reads each with strings.TrimSpace and returns a dedicated missing-knob sentinel when the broker is active but a value is blank. The NewReconciler constructor re-validates them (reconcile/config.go): ErrReconcilerEnrolBaseURLRequired, ErrReconcilerPlexdDownloadURLRequired, and ErrReconcilerPlexdImageRequired fail a mis-wired boot before the first reconcile tick, so a tag-less image or a malformed URL never surfaces deep inside an apply against live substrate. The dev overlay pins the plexd image to the repository's tracked plexd version so the local stack and the published binary never drift.

Token secrecy and TTL posture

The bootstrap token is the only secret that crosses into the substrate, and every layer of this feature treats it as one-time, short-lived bearer material:

  • Never logged, never in an error, never persisted in cleartext. Every renderer performs no logging and names only the offending field or strategy in a validation error — never the token value. The broker persists only the token id on the ProvisionedResource aggregate; the plaintext lives in-process for exactly one apply and is handed to InjectBootstrap and nowhere else.
  • Minted at most once. The HasBootstrapToken gate ensures the issuer is called once per ProvisionedResource. The token id is persisted before the apply, so a crash between the issue and the apply still leaves HasBootstrapToken true on the next tick — the issuer is never called twice for one resource.
  • Single-use. The first successful POST /v1/register consumes the token and flips its consumed_at. A second consumer — a DaemonSet replica that loses the race, a re-run of a VM's first boot — receives 403 token_consumed. The single-use semantics are owned by the identity context (see ../identity/bootstrap-tokens.md).
  • Short-lived. Every minted token carries the broker's configured TTL (PLEXSPHERE_PROVISIONING_BROKER_TOKEN_TTL, validated at boot against the issuer's [MinTTL, MaxTTL] window; see the operational model in broker.md). A token un-redeemed past its TTL cannot be consumed, and the broker does not re-issue — the at-most-once gate keeps the original id on the aggregate.
  • Least-privilege on disk. The cloud-init channel drops the token at mode 0600; the DaemonSet channel projects it at mode 0440 (owner + group read, paired with the pod's fsGroup 65532 so the non-root runtime can read it), read-only, on a non-root, read-only-root-filesystem pod.

Self-enrolment troubleshooting runbook

This section is the operator-facing companion to the contract above. Each entry follows the same shape — Symptom, Diagnostic, Remediation — and is scoped to a single self-enrolment failure mode. These entries pick up where the broker.md recovery runbook's entry 5 ("Bootstrap token issued but the Node never registers") leaves off: that entry diagnoses the broker side; these diagnose the substrate side.

1. Token expired before first boot completed

Symptom. A ProvisionedResource sits in Enrolling across many sweeps and never crosses into Ready, even though Crossplane reports the substrate up. The VM booted, or the DaemonSet rolled out, but no Node registered.

Diagnostic.

  • The token carries the broker's configured TTL, stamped at mint time. If the substrate took longer to boot, install plexd, and reach the control plane than the TTL allowed, the token expired un-redeemed and POST /v1/register now rejects it as expired.
  • On a VM, read the cloud-init execution log (for example /var/log/cloud-init-output.log): a register step that ran after the TTL elapsed reports a rejected token. On a DaemonSet, read kubectl -n plexsphere-system logs daemonset/plexd.
  • Confirm the gap: compare the token's issue time against its TTL and the substrate's first-boot timestamp. Slow image pulls, a cold cloud region, or a slow plexd download URL are the usual causes.

Remediation. The broker does not re-issue a token for an existing ProvisionedResource — the at-most-once gate keeps the original id. The substrate must be re-provisioned with a fresh token: declare a new ProvisionedResource. To widen the redemption window for substrate that is slow to come up, raise PLEXSPHERE_PROVISIONING_BROKER_TOKEN_TTL before re-declaring, and confirm the plexd download URL is reachable and fast from the substrate's network.

2. DaemonSet replica race — 403 token_consumed

Symptom. A plexd DaemonSet rolled out across a multi-node cluster. One node registered; the rest crash-loop. Their logs show 403 token_consumed on POST /v1/register.

Diagnostic.

  • The bootstrap token is single-use. A DaemonSet schedules one plexd pod per node, and every pod mounts the same plexd-bootstrap-token Secret. The first pod to call register wins and consumes the token; every other pod presents the now-consumed token and is correctly refused with 403 token_consumed.
  • This is expected behaviour for a single-token DaemonSet across more than one node, not a fault: a single-use credential cannot enrol N nodes.

Remediation. Decide the intended enrolment model. For a Kubernetes cluster that should register as one Adopted Resource, a single registration is correct — let the winning pod enrol and remove the DaemonSet (or scope it to one node) so the others stop crash-looping. For a fleet where each node should enrol independently, issue a distinct bootstrap token per node and deliver one per node rather than a single shared Secret. See ./adoption.md for the adoption enrolment model.

3. ExternalSecret never materialises — pod stuck pending-secret

Symptom. An eso-mode DaemonSet bundle was applied, but the plexd pods never start. They sit Pending (or ContainerCreating) and the plexd-bootstrap-token Secret does not exist in plexsphere-system.

Diagnostic.

  • The DaemonSet mounts the plexd-bootstrap-token Secret as a volume, so a pod cannot start until that Secret exists. In eso mode the Secret is materialised by the External Secrets Operator from the ExternalSecret, not shipped in the bundle — so an ExternalSecret that never reconciles leaves the pod waiting on a missing volume.
  • Read the ExternalSecret status: kubectl -n plexsphere-system get externalsecret plexd-bootstrap-token -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'. A SecretSyncedError points at the store; a missing status points at ESO not running.
  • The usual causes: the External Secrets Operator is not installed in the cluster; the referenced ClusterSecretStore does not exist or is not Ready; the operator's credentials cannot read the remote key; or the operator never pushed the token into the store under that key.

Remediation. Restore the ESO path: install or repair the External Secrets Operator, confirm the ClusterSecretStore named in the bundle exists and is Ready, and confirm the token plaintext was pushed into the store under the remote key the ExternalSecret reads. Once the Secret materialises, the DaemonSet pods leave Pending and the winning pod registers. Because refreshInterval is 0s, fix the store before the token TTL elapses — ESO fetches the value once on create and does not retry on a rotated schedule. If the token expired while the ExternalSecret was stuck, fall through to entry 1 and re-provision with a fresh token.