Appearance
Dev stack runbook
This runbook brings up the full local development stack — every dependency the plexsphere API and the plexd sibling product talk to in production — on a single-node kind cluster, seeds a deterministic demo tenant, authenticates plexctl against the seeded Dex IdP, runs the end-to-end golden flow against the dev cluster, and is the reference for the manifest-to-CI mapping, the SpiceDB wiring, the canonical troubleshooting matrix, and the reset recipe. It is self-contained — every step from git clone to a green golden-flow run is below.
Reach for this runbook when you need the full stack rather than the lean cluster the Set up your local plexsphere tutorial brings up:
- You are onboarding to the repository and want the shortest path from
git cloneto a usable/v1API and CLI. - You are reproducing a CI failure on a feature branch without re-creating the cluster on every test iteration.
- You are triaging a
dev-stack-smokered run and want to map the failing job back to a singlemaketarget on your laptop.
See also:
- plexsphere local dev-stack reference — the per-dependency reference (one section per service, image pin, dev-mode flags, prod-delta). Open it whenever a Pod is unhealthy and you need to know what flags it boots with and how that diverges from production.
- CI pipeline — jobs, triggers, and reproducing failures — the per-job CI guide, including the
dev-smokeschedule + label gate and the SHA-pinned third-party actions each job depends on. - Testing — Pyramid, Build Tags, and the Shared Harness — the test-pyramid guide; the integration tier under
tests/integration/dev_*and the E2E tier undertests/e2e/dev/both exercise the same manifests this guide applies. - How to seed tenancy Domains with plexsphere-bootstrap — the operator runbook for seeding tenancy Domains via the
plexsphere-bootstrapJob. - How to exercise the /v1/auth/* surface, How to exercise the /v1/admin/* surface, How to exercise the /v1/label-* surface, and How to exercise the /v1/domains/{id}/identities surface — per-surface walkthroughs of the four
/v1/{auth,admin,label-*, domains/{id}/identities}HTTP families now wired against the production binary.
Prerequisites
The dev stack runs entirely on the contributor's laptop — no cloud credentials, no external services. The four tools below must be on $PATH before make dev is invoked. Pinned versions match the hosted runner image the CI dev-stack-smoke job uses; a newer patch release will almost always work, but the listed version is the version the test suite is proven against.
| Tool | Pinned version | Purpose |
|---|---|---|
| Go | 1.26.3 | Pinned by .go-version; drives make docker-build and make e2e. |
| Docker | >= 24.0 | Container runtime for the kind node and every dev-stack image. |
| kind | v0.26.0 (action helm/kind-action@v1.14.0) | Runs the local single-node Kubernetes cluster shared by all dev targets. |
| kubectl | >= v1.32.0 | Applies the dev overlay and waits on Deployment + Job readiness. |
The dev overlay layers seven additional dependencies on top of the narrower user-facing core (API + Postgres / NATS / Dex / SpiceDB): OpenBao (secrets), SeaweedFS (object store), Mimir (metrics), Loki (logs), Crossplane (control plane), External Secrets Operator (secret sync), and plexd (the bootstrap-token registration agent). Every image is SHA-pinned under deploy/local/base/<svc>/ — see plexsphere local dev-stack reference for the per-dependency reference.
The plexd image tag is the version recorded in the repo-root PLEXD_VERSION file. Bumping plexd is a single-line edit; the dev-stack drift gate in tests/workspace/plexd_version_test.go enforces the pin.
Pinned container images applied by the lean overlay (all SHA-referenced in deploy/local/base/*/):
postgres:16-alpine— core + spicedb logical databases.nats:2.10.20— messaging plane with JetStream enabled.ghcr.io/dexidp/dex:v2.41.1— OIDC provider with the staticplexsphere-testclient andadmin@example.comstatic user; the constants mirrorinternal/platform/testutil/containers/dex.goso local behaviour matches integration tests.authzed/spicedb:v1.39.0— ReBAC authorisation engine, gRPC-only on port 50051, Postgres datastore shared with plexsphere, preshared-key auth (tokentest-key). Pin mirrorsinternal/platform/testutil/containers/common.go::SpiceDBImage.
The plexsphere images (plexsphere:dev, plexsphere-migrate:dev) are built locally by make docker-build and side-loaded into the kind node — no registry push is required.
Bring up the dev stack
A single command builds every plexsphere image, creates the kind cluster, applies the dev overlay (lean core + the six currently-wired dev-stack dependencies), waits for every Deployment to become Ready, and seeds the demo tenant:
bash
make devmake dev is the inner-loop entrypoint and is a one-shot bring-up with no separate seed step. It provisions the kind cluster, builds the plexsphere images and side-loads them, preloads the SHA-pinned upstream dependency images through the host Docker cache (scripts/dev-preload-upstream-images.sh — the cache survives make dev-down, so a cluster re-create does not re-download them), hydrates Gateway API + Envoy Gateway, applies the deploy/local/overlays/dev overlay, and runs the plexsphere-bootstrap Job. The Job is the single seed surface: it plants the three demo Domains (Acme Corp, Beta LLC, Gamma Cooperative), binds each one to the in-cluster Dex, pre-creates the canonical admin@example.com user under every Domain keyed by the Dex sub claim, and writes the admin + auditor ReBAC tuples on each Domain for that user.
Note — plexd is deferred. The dev overlay does not currently render the plexd Deployment: the plexd image build pipeline (
cmd/plexd,docker-build-plexd,scripts/kind-load.sh) is tracked as a follow-up, so re-aggregating the manifest against an unbuildable image would wedge everymake devrollout inErrImageNeverPull. Themake dev-up-plexdtarget remains declared so it can be invoked directly once the pipeline lands; until then it exits69against the absent Deployment with a tagged message. See plexsphere local dev-stack reference for the deferred plexd reference.
The target is idempotent at every layer. Re-running make dev against an existing plexsphere cluster re-applies the overlay, re-waits for readiness, and re-runs the bootstrap Job. The bootstrap binary's Domain pass treats a slug-conflict re-apply as a no-op (domain-skipped reason=already-exists), the IdP-binding pass takes the partial-unique-index conflict branch on duplicate (domain_id, issuer) pairs, and the admin-seed pass loads the existing User row via UNIQUE (domain_id, external_subject) and re-touches the same ReBAC tuples through SpiceDB's OPERATION_TOUCH semantics — so a re-run is a deterministic no-op even on a fully populated database.
Startup ordering
The dev overlay applies every manifest in one kubectl apply -k pass, so the API and signer Deployments — and the bootstrap Job — are created at the same moment as Postgres, NATS, SpiceDB, and Dex. To keep the plexsphere-built workloads from CrashLoopBackOff-ing against a not-yet-ready dependency, each one gates its boot with initContainers that hold the main container until the services it dials are online:
plexsphererunswait-for-deps— a TCP probe of NATS, SpiceDB, and Dex — followed bywait-for-migrate, which waits for Postgres and for the migrate Job to have applied every schema migration.plexsphere-signerrunswait-for-migrate: it reads the signing-key tables, so it must not boot against an unmigrated schema.plexsphere-bootstraprunswait-for-deps(a TCP probe of SpiceDB) followed bywait-for-migrate(Postgres and the migrate Job). The bootstrap binary writes Domains and IdP bindings through Postgres and ReBAC tuples through SpiceDB, so it must not run until both are reachable.
A reachable spicedb:50051 proves the SpiceDB process is listening, not that the ReBAC schema is installed. The schema is installed via a gRPC WriteSchema call, which both the plexsphere API (on boot) and the bootstrap binary perform. Because the dev overlay imposes no ordering between the API Deployment and the bootstrap Job, the bootstrap binary applies the schema itself before its admin-seed pass writes any tuple — the apply is idempotent, so whichever workload reaches SpiceDB first installs the schema and the other short-circuits. Without that self-applied schema the Job would race the API and exit with object definition "domain" not found.
A Pod sitting at Init:0/1 (or Init:0/2) in kubectl get pods is therefore waiting on a dependency, not failing — kubectl logs <pod> -c wait-for-deps (or -c wait-for-migrate) prints which endpoint it is still polling.
Every Pod under deploy/local/base/ whose main container exists because the dev stack relaxes a production posture announces that relaxation at boot — exactly one level=WARN line with attributes env=dev, dependency=<svc>, prod_delta=<sentence>. Tail the log of any Pod to confirm the warning is emitted:
bash
kubectl -n default logs deployment/openbao -c dev-warningThe dev overlay applies into the default namespace (make dev invokes kubectl apply -k deploy/local/overlays/dev without an -n flag), so log queries for the bundled dependencies target default rather than the plexsphere-dev namespace some older dev prose still references.
plexsphere API env-var contract
The plexsphere API container reads a fixed set of PLEXSPHERE_* env vars at boot — without them the corresponding /v1/* surface short- circuits to a 501 Not Implemented stub. The dev overlay wires every required var (Secret plexsphere-dev-cursor-keys, ConfigMap plexsphere-config, and the dev overlay's JSON6902 patches). The full table — env var, dev source, factory file, /v1/* surface required by, and the prod-delta rotation contract — lives in the plexsphere section of the local dev-stack reference.
After make dev reports success, verify the container exposes every required env var:
bash
kubectl exec deployment/plexsphere -c plexsphere -- env | grep PLEXSPHERE_The drift gate tests/workspace/dev_overlay_factory_env_drift_test.go walks every productionXxxConfigFromEnv FuncDecl in cmd/plexsphere/ and fails closed if a new factory adds an env reader without a corresponding manifest wiring — running go test ./tests/workspace/... from the worktree root catches the gap before it reaches CI.
Seeded Domain and binding IDs
A fresh make dev plants three demo Domains — Acme Corp, Beta LLC, Gamma Cooperative — and one active IdP binding each, all keyed to the in-cluster Dex. Their UUIDs are deterministic fixtures the bootstrap Job writes to Postgres; plexctl login and every --domain-scoped call consume them.
Read the seeded slugs, Domain IDs, and active binding IDs straight from the dev Postgres StatefulSet:
bash
kubectl exec statefulset/postgres -- \
env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
"SELECT d.slug, d.id, b.id
FROM plexsphere.domains d
JOIN plexsphere.idp_bindings b
ON b.domain_id = d.id AND b.status = 'active'
ORDER BY d.slug"Expected output lists the three slugs acme-corp, beta-llc, gamma-coop in ascending order, each with its domain_id and active idp_binding_id. A second make dev run re-seeds idempotently — the Domain pass takes the already-exists no-op branch and the IdP-binding pass takes the partial-unique-index conflict branch — so the IDs are stable across re-runs.
Sign-in with the seeded Dex test user
The bootstrap manifest at deploy/local/base/plexsphere-bootstrap/configmap.yaml plants three demo Domains, binds each one to the in-cluster Dex IdP, AND pre-creates the canonical admin@example.com user under every Domain with admin + auditor ReBAC tuples — so a real Dex-driven sign-in adopts the seed user via UNIQUE (domain_id, external_subject) and inherits its permissions. No follow-up seed step is required.
The same seeded Dex IdP backs the CLI. Authenticate with the RFC 8628 device-code flow, passing the Domain and active IdP-binding UUIDs you read from Postgres above:
bash
plexctl login --domain-id "$DOMAIN_ID" --idp-binding-id "$BINDING_ID"plexctl login prints a verification URL and a user code, then polls until the device code is approved. The seeded static-user credentials are admin@example.com / password — the canonical fixture mirrored across the integration suite, with the bcrypt hash in deploy/local/base/dex/configmap.yaml matching dexPasswordHash in internal/platform/testutil/containers/dex.go. The Dex internal userID (08a8684b-db88-4b73-90a9-3cd1661f5466) drives the canonical sub claim the bootstrap manifest's domain_admins[].external_subject keys the seeded user on; bumping the Dex userID is the only change that requires updating the manifest.
Open the printed /v1/device?user_code=… link in a browser to complete the approval: the control plane serves the verification page itself, so this stack needs no external dashboard. Sign in through Dex with the static-user credentials above and click Approve; the waiting plexctl poll then receives its token. See Cookie sessions — plexsphere_session contract on every /v1/* route for the per-request session contract the /v1/* routes enforce.
The four operator how-to guides (auth, admin, label-*, identities) all target the three bootstrap Domains — the same seed admin's tuples make every surface respond 200 without an extra grant step.
Sign in with plexctl
The in-tree web dashboard has been removed: the dev overlay no longer renders a web Deployment, the kind config no longer binds a host port for it, and there is no SPA to open on this local stack. The kind config in deploy/local/kind.yaml binds the host ports the surviving workloads use — 8080 (API) and 5556 (Dex).
Drive the platform through the CLI and the /v1 API instead. The Drive the API with plexctl section below walks the same surfaces a dashboard would have shown — Domains, Projects, Identities, Audit, Groups, Labels — gated by the same per-Domain permissions.
The per-request authentication middleware recognises the plexsphere_session cookie (or a bearer token) on every /v1/* route. See Cookie sessions — plexsphere_session contract on every /v1/* route for the session contract — cookie name, scope, lifetime, SameSite posture, and the bypass list the middleware short-circuits on for unauthenticated probes (/v1/health, /v1/version, /v1/openapi.json, /v1/auth/, /v1/register, /v1/docs).
Two follow-up surfaces remain wired against the API for when the plexd build pipeline lands:
- Registered nodes — once
make dev-up-plexdis re-chained intomake dev, the plexd Pod registers itself viaPOST /v1/registeron startup. Until thenplexctl node listreturns an empty set against a freshly-seeded dev cluster. - Reachability — when plexd dies (Pod
Terminating), the node transitions tounreachablewithin the configured TTL, observable through the same/v1node surface.
If a /v1 call hangs or fails the most common cause is a missing or failing dependency Pod — kubectl get pods should show every workload Running. The dev overlay applies into the default namespace.
Drive the API with plexctl
The seeded Dex IdP backs the CLI. make dev does not compile the operator CLI itself — build it now, prepend $PWD/bin to $PATH for the rest of the shell session, and read both UUIDs plexctl login needs out of the dev Postgres StatefulSet:
bash
make plexctl-build
export PATH="$PWD/bin:$PATH"
export PLEXSPHERE_URL=http://localhost:8080
plexctl --version
DOMAIN_ID=$(kubectl exec statefulset/postgres -- \
env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
"SELECT id FROM plexsphere.domains WHERE slug='acme-corp'")
BINDING_ID=$(kubectl exec statefulset/postgres -- \
env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
"SELECT id FROM plexsphere.idp_bindings WHERE domain_id='$DOMAIN_ID' AND status='active'")
plexctl login --domain-id "$DOMAIN_ID" --idp-binding-id "$BINDING_ID"plexctl login prints a verification URL and a user code, then polls until the device code is approved. Open the printed /v1/device?user_code=… link in a browser to complete the approval — the control plane serves the verification page itself; sign in through Dex and click Approve to release the waiting plexctl poll.
For a permanent plexctl install on $PATH (sudo / ~/.local/bin) see How to install the plexctl CLI. For Domains with zero or multiple active IdP bindings see How to complete your first plexctl login.
Drive the API once authenticated:
bash
plexctl whoami
plexctl domain list
plexctl project list --domain "$DOMAIN_ID"
plexctl identity list --domain "$DOMAIN_ID"
plexctl audit entries list --domain "$DOMAIN_ID"
plexctl group list --domain "$DOMAIN_ID"Expected whoami output identifies the seed admin under the picked Domain:
text
PRINCIPAL SUBJECT DOMAIN ACR AMR
user <seed-user-uuid> <acme-corp-uuid> - -Every list call returns 200 — the bootstrap-seeded admin carries both admin and auditor ReBAC tuples on the addressed Domain.
For JSON output, pass --output json (long form only — plexctl does NOT bind it to -o):
bash
plexctl project list --domain "$DOMAIN_ID" --output jsonThe result shape is { "items": [...] }. The full subcommand catalogue with flags, exit codes, and JSON output shapes lives in plexctl — the plexsphere CLI.
Run the golden flow
The end-to-end golden flow is the canonical "the stack works" assertion. It is the same test the dev-stack-smoke CI job runs nightly. On the laptop, point make e2e at the dev cluster the laptop loop already brought up rather than letting the harness spin up a per-suite cluster:
bash
DEV_STACK=1 make e2eDEV_STACK=1 mode skips the per-suite kind-load.sh and kind create cluster steps and runs the chainsaw catalog directly against kind-plexsphere. A run takes ~40 seconds against a warm cluster vs. ~3 minutes for the default cold-start mode.
The golden flow runs as the chainsaw catalog tests/e2e/dev/golden-flow/chainsaw-test.yaml — the nine-step catalog that exercises the full bootstrap-token register → Domain create → Project create → IdP binding → Identity sign-in → reachability transition flow against the /v1 API.
The default make e2e (without DEV_STACK=1) behaviour is unchanged — the harness still creates a fresh cluster per suite, which is what CI uses for clean-room reproducibility. The tests/integration/dev_stack_e2e_mode_test.go integration test asserts that default-mode is unchanged and that DEV_STACK=1 against a missing cluster fails fast with a tagged diagnostic.
Manifest-to-CI mapping
Every manifest applied by make dev has a mirror in the hosted CI pipeline. The table below is the authoritative cross-reference between a local manifest directory and the CI job that exercises it. The drift gate in tests/docs/docs_dev_local_kind_test.go asserts every deploy/local/base/<service>/ directory appears as a row below; adding a new service without a row fails CI.
| Manifest path | Purpose | CI job / test that exercises it |
|---|---|---|
deploy/local/kind.yaml | kind cluster config; binds host ports 8080/5556 and patches the kubelet to pull images in parallel (serializeImagePulls: false, maxParallelImagePulls: 6) so a cold boot does not queue every upstream image pull behind a single slow download. | dev-smoke (schedule + label-gated PR) and tests/workspace/dev_manifests_test.go (kubelet-patch drift gate). |
deploy/local/base/postgres/ | Postgres 16 StatefulSet + Service + Secret; core + spicedb databases. | test-integration (testcontainers Postgres) and dev-smoke (cluster-local Postgres). |
deploy/local/base/nats/ | NATS StatefulSet + Service with JetStream enabled; mirrors the testcontainers fixture. | test-integration and dev-smoke. |
deploy/local/base/dex/ | Dex Deployment + Service + ConfigMap with the static plexsphere-test client and admin@example.com user. | test-integration (testcontainers Dex) and dev-smoke (sign-in through the Gateway). |
deploy/local/base/spicedb/ | SpiceDB Deployment + Service + Secret; ReBAC engine with gRPC on port 50051, Postgres datastore shared with plexsphere, preshared-key auth, single replica with dispatch disabled. | test-integration (testcontainers SpiceDB via internal/platform/testutil/containers/spicedb.go) and dev-smoke (cluster-local SpiceDB serving the plexsphere API's authz checks). |
deploy/local/base/migrate/ | One-shot Job running tools/migrate up against the in-cluster Postgres; initContainer waits for readiness. | dev-smoke + tests/integration/dev_kind_test.go::TestDevKind_MigrateCompletesBeforeAPIReady. |
deploy/local/base/plexsphere/ | API Deployment + Service + ConfigMap; mounts Postgres/NATS/Dex connection env vars. | dev-smoke (readiness probe via Gateway /readyz). |
deploy/local/base/plexsphere-signer/ | Signer Deployment + Service + ServiceAccount + NetworkPolicy + Secret; the gRPC signing service that the API calls over the in-cluster plexsphere-signer:8443 endpoint. | test-integration (signer binary tests under tests/integration/signer_*_test.go) and tests/workspace/signer_manifests_test.go (manifest drift gate). |
deploy/local/base/plexsphere-bootstrap/ | One-shot Job that seeds the initial tenancy Domain aggregates via cmd/plexsphere-bootstrap AND seeds one IdP binding per demo Domain against the in-cluster Dex IdP (http://localhost:5556/dex, client_id: plexsphere-test), then pre-creates the canonical admin@example.com user under every Domain with admin + auditor ReBAC tuples. Reads an inline ConfigMap manifest and is idempotent on re-apply (Domain pass takes the already-exists no-op branch; IdP-binding pass takes the partial-unique-index conflict branch). initContainer waits for the migrate Job. | test-integration (tests/integration/bootstrap_seed_idempotent_test.go) + tests/workspace/dev_manifests_test.go (drift gate). |
deploy/local/base/gateway/ | EnvoyProxy (pins data-plane Service to NodePort 30080) applied directly into envoy-gateway-system + GatewayClass (envoy-gateway, references the EnvoyProxy via parametersRef) + Gateway + one HTTPRoute (api listener 8080). The EnvoyProxy is NOT part of the dev overlay because kubectl apply -n plexsphere-dev -k … rejects cross-namespace manifests; make dev and the chainsaw install step apply it separately with kubectl apply -f deploy/local/base/gateway/envoyproxy.yaml. | dev-smoke + tests/e2e/dev/chainsaw-test.yaml (asserts GatewayClass.Accepted=True, then Gateway.Accepted=True / Gateway.Programmed=True, then each HTTPRoute.Accepted=True / HTTPRoute.ResolvedRefs=True — Gateway API v1 publishes ResolvedRefs only at Listener / HTTPRoute level, not on the Gateway itself). |
deploy/local/vendor/gateway-api-crds.yaml | SHA-pinned Gateway API v1 CRDs, applied before the overlay. | dev-smoke bootstrap step. |
deploy/local/vendor/envoy-gateway.yaml | SHA-pinned Envoy Gateway install manifest. | dev-smoke bootstrap step. |
deploy/local/overlays/dev/kustomization.yaml | Dev overlay; pins :dev image tags and sets imagePullPolicy: Never for plexsphere containers. | dev-smoke + tests/workspace/dev_manifests_test.go (parse-YAML drift gate). |
deploy/local/base/openbao/ | OpenBao 2.0.0 Deployment + Service + ConfigMap + Secret in -dev mode (in-memory storage, pre-unsealed, single root token); the cluster-local secrets engine. | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/seaweedfs/ | SeaweedFS 3.75 single-replica StatefulSet + Service running master + volume + filer + s3 with emptyDir-backed storage; the in-cluster S3-compatible object store. | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/mimir/ | Grafana Mimir 2.14.3 single-binary Deployment + Service + ConfigMap with filesystem-backed storage and multi-tenancy disabled; the in-cluster metrics backend. | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/loki/ | Grafana Loki 3.3.2 single-binary monolithic Deployment + Service + ConfigMap with filesystem-backed chunks and in-memory ring; the in-cluster logs backend. | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/crossplane/ | Crossplane v2.0.2 core controller (real core start) Deployment + ServiceAccount + scoped crossplane ClusterRole/Binding + webhook TLS Secrets + Service; the management-fleet control plane (no providers shipped). | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/external-secrets/ | External Secrets Operator v0.18.2 Deployment + ServiceAccount + ClusterRoleBinding + Service; pulls Secrets from cluster-external backends in production overlays. | dev-stack-smoke + tests/integration/dev_stack_kind_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
deploy/local/base/plexd/ | plexd sibling-product Deployment + Secret; the bootstrap-token registration agent that calls plexsphere's POST /v1/register on startup. Image tag is templated from /PLEXD_VERSION. | dev-stack-smoke + tests/integration/dev_stack_plexd_register_test.go and tests/workspace/dev_stack_manifests_test.go (drift gate). |
SpiceDB ReBAC engine
The dev stack ships SpiceDB as the authorisation engine behind the plexsphere API. Every identity-scoped HTTP endpoint checks a relation tuple against SpiceDB before serving a response, so the Deployment is a hard dependency for the /v1/... API surface.
- Role: relationship-based access control (ReBAC) evaluator. Stores subject/relation/object tuples and answers permission-check queries from the plexsphere API.
- Service:
spicedb:50051(in-cluster DNS), gRPC only. HTTP is disabled (--http-enabled=false) because the only consumer is the plexsphere API's authzed gRPC client. - Auth: preshared-key (
spicedb-credentials.SPICEDB_GRPC_PRESHARED_KEY, valuetest-key). Every plexsphere pod reads the same key from the Secret and attaches it asauthzed tokenbearer metadata on each request. - Datastore: Postgres. Uses
postgres-credentials.DATABASE_URL— SpiceDB keeps its relation tuples in its own schema namespace inside the shared database. - Dispatch: disabled (
--dispatch-cluster-enabled=false). Single replica, no peers, so cluster dispatch would only introduce a resolution failure on boot. A multi-replica HA posture is tracked separately as a follow-up. - Security: dev-only. Preshared-key auth is NOT production-safe — any pod that can reach
spicedb:50051can speak as any caller. A follow-up replaces this with mTLS + per-workload SPIFFE identities.
Inspect the Deployment (the dev overlay applies into the default namespace, so no -n flag is required):
bash
kubectl get deployment spicedb
kubectl logs deployment/spicedbConfirm the schema has loaded (the SpiceDB schema tables live in the shared Postgres database and are created by the SpiceDB migrate one-shot on first boot). Postgres is a StatefulSet — the Pod name is postgres-0, not deployment/postgres:
bash
kubectl exec postgres-0 -- \
psql -U plexsphere -d plexsphere -c '\dt plexsphere.*' | grep -i relationThe SpiceDB wiring rationale, the base manifests, and the full set of --datastore-* flags live in the kustomize base under deploy/local/base/spicedb/ — open the directory's README.md together with the manifests it pins for the per-flag justification.
Troubleshooting
Nine failure modes account for the vast majority of red dev runs. For each, the "how to recognise" column is what the operator typically sees first, and the "how to fix" column names the exact follow-up command.
| Symptom / failure mode | How to recognise | How to fix |
|---|---|---|
| Port collision — host port 8080 or 5556 is already bound by another process. | make dev aborts with failed to create cluster: exit status 1 and kind logs address already in use during docker run. | Free the port (lsof -i :8080, kill the offender) OR edit deploy/local/kind.yaml to remap the host side, then rerun make dev-down && make dev. |
| GatewayClass never admitted — the Gateway sits with the CRD-default conditions forever because the controller has not bound to its class. | kubectl describe gateway plexsphere -n plexsphere-dev shows Accepted/Programmed conditions with Status: Unknown, Message: "Waiting for controller", and LastTransitionTime: 1970-01-01T00:00:00Z; kubectl get gatewayclass envoy-gateway lacks Accepted=True. | Re-apply the class: kubectl apply -f deploy/local/base/gateway/gatewayclass.yaml. Confirm spec.controllerName=gateway.envoyproxy.io/gatewayclass-controller. The workspace drift gate TestDevManifests_GatewayClassMatchesGateway catches this offline. |
Gateway AddressNotAssigned — listeners are Programmed=True but the Gateway reports Programmed=False / Reason: AddressNotAssigned / Message: "No addresses have been assigned to the Gateway". | kubectl describe gateway plexsphere -n plexsphere-dev shows the AddressNotAssigned reason on the top-level Programmed condition; kubectl get svc -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=plexsphere -o wide lists the data-plane Service as type: LoadBalancer with an empty EXTERNAL-IP. | Envoy Gateway defaults to type: LoadBalancer; a fresh kind cluster has no LB. Re-apply the EnvoyProxy that pins the Service to NodePort: kubectl apply -f deploy/local/base/gateway/envoyproxy.yaml. Confirm GatewayClass.spec.parametersRef points at EnvoyProxy/envoy-proxy-config in envoy-gateway-system. The workspace drift gate TestDevManifests_EnvoyProxyPinsKindNodePorts catches drift between the EnvoyProxy pins and the kind extraPortMappings. |
Gateway readiness timeout — GatewayClass is Accepted=True but the Gateway stays Programmed=False. | kubectl get gateway shows Programmed=False; Envoy pods in envoy-gateway-system are crashlooping. | Re-apply the vendor manifests with server-side apply (the envoyproxies CRD exceeds the 256 KiB last-applied-configuration annotation cap): kubectl apply --server-side --force-conflicts -f build/dev-vendor/gateway-api-crds.yaml -f build/dev-vendor/envoy-gateway.yaml, then kubectl rollout restart deployment -n envoy-gateway-system. |
kubectl wait on an HTTPRoute hangs — a custom readiness check times out even though kubectl describe httproute shows Accepted=True + ResolvedRefs=True. | Route conditions in Gateway API v1 live under .status.parents[].conditions[] (per-parent), not top-level. kubectl wait --for=condition=X=True reads only the top-level list, so the wait can never succeed against HTTPRoute / GRPCRoute / TCPRoute / TLSRoute / UDPRoute. | Use a partial-match assertion against the nested status.parents[].conditions[] structure (see tests/e2e/dev/chainsaw-test.yaml step assert-httproutes-resolved). The workspace drift gate TestChainsawTests_NoWaitAgainstGatewayAPIRoutes enforces this offline. |
Dex clock skew — sign-in fails with id_token has expired immediately after issuance. | kubectl logs deployment/dex shows iat timestamps more than 60s apart from the kubelet clock. | Resync the host clock (sudo ntpdate -u pool.ntp.org on Linux; restart Docker Desktop on macOS), then make dev-down && make dev. |
ErrImageNeverPull — a plexsphere pod stays in ErrImageNeverPull because the image was never side-loaded. | kubectl describe pod shows Failed to pull image "plexsphere:dev": imagePullPolicy set to Never. | Re-run make dev (it re-invokes kind load docker-image); if the issue persists, confirm the image exists locally: docker image inspect plexsphere:dev. |
Dependency pods stuck in PodInitializing — on a cold boot, upstream images trickle in one at a time even though each pull is fast. | kubectl get events --field-selector reason=Pulled shows sub-second pulls with minutes of including waiting queue time; the stuck pods' init containers are long Completed. | The kubelet patch in deploy/local/kind.yaml (serializeImagePulls: false, maxParallelImagePulls: 6) removes the queueing, and the make dev preload step (scripts/dev-preload-upstream-images.sh) side-loads every SHA-pinned upstream image from the host Docker cache so a cluster re-create does not hit the network at all. A cluster created before either change keeps the old behaviour — make dev-down && make dev re-creates it. Only the very first preload on a machine still downloads each image once; a slow registry path bounds that one-time warm-up. |
kind node out of disk — pods evict with DiskPressure. | kubectl describe node plexsphere-control-plane shows Conditions: DiskPressure=True. | Tear down, prune, and rerun: make dev-down && docker system prune -af && docker volume prune -f && make dev. |
Tear down
make dev-down deletes the plexsphere kind cluster — and nothing else. It refuses to delete a foreign cluster named plexsphere that some other tooling created: the sentinel ConfigMap planted by make dev is the only signal the target trusts:
bash
make dev-downThe sentinel-ConfigMap probe lives in deploy/local/base/ and is applied by make dev. Three sub-tests in tests/integration/dev_stack_down_test.go enforce the contract:
- owned-deletes —
make devplanted the sentinel, somake dev-downdeletes the cluster. - foreign-preserves — a kind cluster named
plexspherewithout the sentinel is preserved; the target prints a tagged warning and exits zero. - force-down-bypasses —
DEV_FORCE_DOWN=1 make dev-downbypasses the sentinel check (escape hatch for an inconsistent cluster the operator deliberately wants to scrub).
Reset recipe
When the stack is in an indeterminate state — a half-applied overlay, a hung Deployment, or a cluster that the troubleshooting matrix does not cover — the reliable reset is: delete the cluster, prune Docker state, and rerun make dev from a clean baseline.
bash
make dev-down
docker system prune -f
make devmake dev-down deletes the plexsphere kind cluster only; unrelated kind clusters on the same machine are untouched. The docker system prune -f step reclaims dangling layers and exited containers — the same hint make dev-down prints as its final line. The subsequent make dev run takes ~3 minutes from a cold cache (the plexsphere image build dominates) and ~40 seconds from a warm one.