Dev stack runbook

This runbook brings up the full local development stack — every dependency the plexsphere API and the plexd sibling product talk to in production — on a single-node kind cluster, seeds a deterministic demo tenant, authenticates plexctl against the seeded Dex IdP, runs the end-to-end golden flow against the dev cluster, and is the reference for the manifest-to-CI mapping, the SpiceDB wiring, the canonical troubleshooting matrix, and the reset recipe. It is self-contained — every step from git clone to a green golden-flow run is below.

Reach for this runbook when you need the full stack rather than the lean cluster the Set up your local plexsphere tutorial brings up:

You are onboarding to the repository and want the shortest path from git clone to a usable /v1 API and CLI.
You are reproducing a CI failure on a feature branch without re-creating the cluster on every test iteration.
You are triaging a dev-stack-smoke red run and want to map the failing job back to a single make target on your laptop.

Prerequisites

The dev stack runs entirely on the contributor's laptop — no cloud credentials, no external services. The four tools below must be on $PATH before make dev is invoked. Pinned versions match the hosted runner image the CI dev-stack-smoke job uses; a newer patch release will almost always work, but the listed version is the version the test suite is proven against.

Tool	Pinned version	Purpose
Go	`1.26.3`	Pinned by `.go-version`; drives `make docker-build` and `make e2e`.
Docker	`>= 24.0`	Container runtime for the kind node and every dev-stack image.
kind	`v0.26.0` (action `helm/kind-action@v1.14.0`)	Runs the local single-node Kubernetes cluster shared by all dev targets.
kubectl	`>= v1.32.0`	Applies the dev overlay and waits on Deployment + Job readiness.

The dev overlay layers seven additional dependencies on top of the narrower user-facing core (API + Postgres / NATS / Dex / SpiceDB): OpenBao (secrets), SeaweedFS (object store), Mimir (metrics), Loki (logs), Crossplane (control plane), External Secrets Operator (secret sync), and plexd (the bootstrap-token registration agent). Every image is SHA-pinned under deploy/local/base/<svc>/ — see plexsphere local dev-stack reference for the per-dependency reference.

The plexd image tag is the version recorded in the repo-root PLEXD_VERSION file. Bumping plexd is a single-line edit; the dev-stack drift gate in tests/workspace/plexd_version_test.go enforces the pin.

Pinned container images applied by the lean overlay (all SHA-referenced in deploy/local/base/*/):

postgres:16-alpine — core + spicedb logical databases.
nats:2.10.20 — messaging plane with JetStream enabled.
ghcr.io/dexidp/dex:v2.41.1 — OIDC provider with the static plexsphere-test client and admin@example.com static user; the constants mirror internal/platform/testutil/containers/dex.go so local behaviour matches integration tests.
authzed/spicedb:v1.39.0 — ReBAC authorisation engine, gRPC-only on port 50051, Postgres datastore shared with plexsphere, preshared-key auth (token test-key). Pin mirrors internal/platform/testutil/containers/common.go::SpiceDBImage.

The plexsphere images (plexsphere:dev, plexsphere-migrate:dev) are built locally by make docker-build and side-loaded into the kind node — no registry push is required.

Bring up the dev stack

A single command builds every plexsphere image, creates the kind cluster, applies the dev overlay (lean core + the six currently-wired dev-stack dependencies), waits for every Deployment to become Ready, and seeds the demo tenant:

bash

make dev

make dev is the inner-loop entrypoint and is a one-shot bring-up with no separate seed step. It provisions the kind cluster, builds the plexsphere images and side-loads them, preloads the SHA-pinned upstream dependency images through the host Docker cache (scripts/dev-preload-upstream-images.sh — the cache survives make dev-down, so a cluster re-create does not re-download them), hydrates Gateway API + Envoy Gateway, applies the deploy/local/overlays/dev overlay, and runs the plexsphere-bootstrap Job. The Job is the single seed surface: it plants the three demo Domains (Acme Corp, Beta LLC, Gamma Cooperative), binds each one to the in-cluster Dex, pre-creates the canonical admin@example.com user under every Domain keyed by the Dex sub claim, and writes the admin + auditor ReBAC tuples on each Domain for that user.

Note — plexd is deferred. The dev overlay does not currently render the plexd Deployment: the plexd image build pipeline (cmd/plexd, docker-build-plexd, scripts/kind-load.sh) is tracked as a follow-up, so re-aggregating the manifest against an unbuildable image would wedge every make dev rollout in ErrImageNeverPull. The make dev-up-plexd target remains declared so it can be invoked directly once the pipeline lands; until then it exits 69 against the absent Deployment with a tagged message. See plexsphere local dev-stack reference for the deferred plexd reference.

The target is idempotent at every layer. Re-running make dev against an existing plexsphere cluster re-applies the overlay, re-waits for readiness, and re-runs the bootstrap Job. The bootstrap binary's Domain pass treats a slug-conflict re-apply as a no-op (domain-skipped reason=already-exists), the IdP-binding pass takes the partial-unique-index conflict branch on duplicate (domain_id, issuer) pairs, and the admin-seed pass loads the existing User row via UNIQUE (domain_id, external_subject) and re-touches the same ReBAC tuples through SpiceDB's OPERATION_TOUCH semantics — so a re-run is a deterministic no-op even on a fully populated database.

Startup ordering

The dev overlay applies every manifest in one kubectl apply -k pass, so the API and signer Deployments — and the bootstrap Job — are created at the same moment as Postgres, NATS, SpiceDB, and Dex. To keep the plexsphere-built workloads from CrashLoopBackOff-ing against a not-yet-ready dependency, each one gates its boot with initContainers that hold the main container until the services it dials are online:

plexsphere runs wait-for-deps — a TCP probe of NATS, SpiceDB, and Dex — followed by wait-for-migrate, which waits for Postgres and for the migrate Job to have applied every schema migration.
plexsphere-signer runs wait-for-migrate: it reads the signing-key tables, so it must not boot against an unmigrated schema.
plexsphere-bootstrap runs wait-for-deps (a TCP probe of SpiceDB) followed by wait-for-migrate (Postgres and the migrate Job). The bootstrap binary writes Domains and IdP bindings through Postgres and ReBAC tuples through SpiceDB, so it must not run until both are reachable.

A reachable spicedb:50051 proves the SpiceDB process is listening, not that the ReBAC schema is installed. The schema is installed via a gRPC WriteSchema call, which both the plexsphere API (on boot) and the bootstrap binary perform. Because the dev overlay imposes no ordering between the API Deployment and the bootstrap Job, the bootstrap binary applies the schema itself before its admin-seed pass writes any tuple — the apply is idempotent, so whichever workload reaches SpiceDB first installs the schema and the other short-circuits. Without that self-applied schema the Job would race the API and exit with object definition "domain" not found.

A Pod sitting at Init:0/1 (or Init:0/2) in kubectl get pods is therefore waiting on a dependency, not failing — kubectl logs <pod> -c wait-for-deps (or -c wait-for-migrate) prints which endpoint it is still polling.

Every Pod under deploy/local/base/ whose main container exists because the dev stack relaxes a production posture announces that relaxation at boot — exactly one level=WARN line with attributes env=dev, dependency=<svc>, prod_delta=<sentence>. Tail the log of any Pod to confirm the warning is emitted:

bash

kubectl -n default logs deployment/openbao -c dev-warning

The dev overlay applies into the default namespace (make dev invokes kubectl apply -k deploy/local/overlays/dev without an -n flag), so log queries for the bundled dependencies target default rather than the plexsphere-dev namespace some older dev prose still references.

plexsphere API env-var contract

The plexsphere API container reads a fixed set of PLEXSPHERE_* env vars at boot — without them the corresponding /v1/* surface short- circuits to a 501 Not Implemented stub. The dev overlay wires every required var (Secret plexsphere-dev-cursor-keys, ConfigMap plexsphere-config, and the dev overlay's JSON6902 patches). The full table — env var, dev source, factory file, /v1/* surface required by, and the prod-delta rotation contract — lives in the plexsphere section of the local dev-stack reference.

After make dev reports success, verify the container exposes every required env var:

bash

kubectl exec deployment/plexsphere -c plexsphere -- env | grep PLEXSPHERE_

The drift gate tests/workspace/dev_overlay_factory_env_drift_test.go walks every productionXxxConfigFromEnv FuncDecl in cmd/plexsphere/ and fails closed if a new factory adds an env reader without a corresponding manifest wiring — running go test ./tests/workspace/... from the worktree root catches the gap before it reaches CI.

Seeded Domain and binding IDs

A fresh make dev plants three demo Domains — Acme Corp, Beta LLC, Gamma Cooperative — and one active IdP binding each, all keyed to the in-cluster Dex. Their UUIDs are deterministic fixtures the bootstrap Job writes to Postgres; plexctl login and every --domain-scoped call consume them.

Read the seeded slugs, Domain IDs, and active binding IDs straight from the dev Postgres StatefulSet:

bash

kubectl exec statefulset/postgres -- \
  env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
  "SELECT d.slug, d.id, b.id
     FROM plexsphere.domains d
     JOIN plexsphere.idp_bindings b
       ON b.domain_id = d.id AND b.status = 'active'
    ORDER BY d.slug"

Expected output lists the three slugs acme-corp, beta-llc, gamma-coop in ascending order, each with its domain_id and active idp_binding_id. A second make dev run re-seeds idempotently — the Domain pass takes the already-exists no-op branch and the IdP-binding pass takes the partial-unique-index conflict branch — so the IDs are stable across re-runs.

The bootstrap manifest at deploy/local/base/plexsphere-bootstrap/configmap.yaml plants three demo Domains, binds each one to the in-cluster Dex IdP, AND pre-creates the canonical admin@example.com user under every Domain with admin + auditor ReBAC tuples — so a real Dex-driven sign-in adopts the seed user via UNIQUE (domain_id, external_subject) and inherits its permissions. No follow-up seed step is required.

The same seeded Dex IdP backs the CLI. Authenticate with the RFC 8628 device-code flow, passing the Domain and active IdP-binding UUIDs you read from Postgres above:

bash

plexctl login --domain-id "$DOMAIN_ID" --idp-binding-id "$BINDING_ID"

plexctl login prints a verification URL and a user code, then polls until the device code is approved. The seeded static-user credentials are admin@example.com / password — the canonical fixture mirrored across the integration suite, with the bcrypt hash in deploy/local/base/dex/configmap.yaml matching dexPasswordHash in internal/platform/testutil/containers/dex.go. The Dex internal userID (08a8684b-db88-4b73-90a9-3cd1661f5466) drives the canonical sub claim the bootstrap manifest's domain_admins[].external_subject keys the seeded user on; bumping the Dex userID is the only change that requires updating the manifest.

Open the printed /v1/device?user_code=… link in a browser to complete the approval: the control plane serves the verification page itself, so this stack needs no external dashboard. Sign in through Dex with the static-user credentials above and click Approve; the waiting plexctl poll then receives its token. See Cookie sessions — plexsphere_session contract on every /v1/* route for the per-request session contract the /v1/* routes enforce.

The four operator how-to guides (auth, admin, label-*, identities) all target the three bootstrap Domains — the same seed admin's tuples make every surface respond 200 without an extra grant step.

The in-tree web dashboard has been removed: the dev overlay no longer renders a web Deployment, the kind config no longer binds a host port for it, and there is no SPA to open on this local stack. The kind config in deploy/local/kind.yaml binds the host ports the surviving workloads use — 8080 (API) and 5556 (Dex).

Drive the platform through the CLI and the /v1 API instead. The Drive the API with plexctl section below walks the same surfaces a dashboard would have shown — Domains, Projects, Identities, Audit, Groups, Labels — gated by the same per-Domain permissions.

The per-request authentication middleware recognises the plexsphere_session cookie (or a bearer token) on every /v1/* route. See Cookie sessions — plexsphere_session contract on every /v1/* route for the session contract — cookie name, scope, lifetime, SameSite posture, and the bypass list the middleware short-circuits on for unauthenticated probes (/v1/health, /v1/version, /v1/openapi.json, /v1/auth/, /v1/register, /v1/docs).

Two follow-up surfaces remain wired against the API for when the plexd build pipeline lands:

Registered nodes — once make dev-up-plexd is re-chained into make dev, the plexd Pod registers itself via POST /v1/register on startup. Until then plexctl node list returns an empty set against a freshly-seeded dev cluster.
Reachability — when plexd dies (Pod Terminating), the node transitions to unreachable within the configured TTL, observable through the same /v1 node surface.

If a /v1 call hangs or fails the most common cause is a missing or failing dependency Pod — kubectl get pods should show every workload Running. The dev overlay applies into the default namespace.

Drive the API with `plexctl`

The seeded Dex IdP backs the CLI. make dev does not compile the operator CLI itself — build it now, prepend $PWD/bin to $PATH for the rest of the shell session, and read both UUIDs plexctl login needs out of the dev Postgres StatefulSet:

bash

make plexctl-build
export PATH="$PWD/bin:$PATH"
export PLEXSPHERE_URL=http://localhost:8080
plexctl --version

DOMAIN_ID=$(kubectl exec statefulset/postgres -- \
  env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
  "SELECT id FROM plexsphere.domains WHERE slug='acme-corp'")
BINDING_ID=$(kubectl exec statefulset/postgres -- \
  env PGPASSWORD=plexsphere psql -U plexsphere -d plexsphere -tAc \
  "SELECT id FROM plexsphere.idp_bindings WHERE domain_id='$DOMAIN_ID' AND status='active'")

plexctl login --domain-id "$DOMAIN_ID" --idp-binding-id "$BINDING_ID"

plexctl login prints a verification URL and a user code, then polls until the device code is approved. Open the printed /v1/device?user_code=… link in a browser to complete the approval — the control plane serves the verification page itself; sign in through Dex and click Approve to release the waiting plexctl poll.

For a permanent plexctl install on $PATH (sudo / ~/.local/bin) see How to install the plexctl CLI. For Domains with zero or multiple active IdP bindings see How to complete your first plexctl login.

Drive the API once authenticated:

bash

plexctl whoami
plexctl domain list
plexctl project list       --domain "$DOMAIN_ID"
plexctl identity list      --domain "$DOMAIN_ID"
plexctl audit entries list --domain "$DOMAIN_ID"
plexctl group list         --domain "$DOMAIN_ID"

Expected whoami output identifies the seed admin under the picked Domain:

text

PRINCIPAL  SUBJECT                               DOMAIN                                ACR  AMR
user       <seed-user-uuid>                      <acme-corp-uuid>                      -    -

Every list call returns 200 — the bootstrap-seeded admin carries both admin and auditor ReBAC tuples on the addressed Domain.

For JSON output, pass --output json (long form only — plexctl does NOT bind it to -o):

bash

plexctl project list --domain "$DOMAIN_ID" --output json

The result shape is { "items": [...] }. The full subcommand catalogue with flags, exit codes, and JSON output shapes lives in plexctl — the plexsphere CLI.

Run the golden flow

The end-to-end golden flow is the canonical "the stack works" assertion. It is the same test the dev-stack-smoke CI job runs nightly. On the laptop, point make e2e at the dev cluster the laptop loop already brought up rather than letting the harness spin up a per-suite cluster:

bash

DEV_STACK=1 make e2e

DEV_STACK=1 mode skips the per-suite kind-load.sh and kind create cluster steps and runs the chainsaw catalog directly against kind-plexsphere. A run takes ~40 seconds against a warm cluster vs. ~3 minutes for the default cold-start mode.

The golden flow runs as the chainsaw catalog tests/e2e/dev/golden-flow/chainsaw-test.yaml — the nine-step catalog that exercises the full bootstrap-token register → Domain create → Project create → IdP binding → Identity sign-in → reachability transition flow against the /v1 API.

The default make e2e (without DEV_STACK=1) behaviour is unchanged — the harness still creates a fresh cluster per suite, which is what CI uses for clean-room reproducibility. The tests/integration/dev_stack_e2e_mode_test.go integration test asserts that default-mode is unchanged and that DEV_STACK=1 against a missing cluster fails fast with a tagged diagnostic.

Manifest-to-CI mapping

Every manifest applied by make dev has a mirror in the hosted CI pipeline. The table below is the authoritative cross-reference between a local manifest directory and the CI job that exercises it. The drift gate in tests/docs/docs_dev_local_kind_test.go asserts every deploy/local/base/<service>/ directory appears as a row below; adding a new service without a row fails CI.

Manifest path	Purpose	CI job / test that exercises it
`deploy/local/kind.yaml`	kind cluster config; binds host ports 8080/5556 and patches the kubelet to pull images in parallel (`serializeImagePulls: false`, `maxParallelImagePulls: 6`) so a cold boot does not queue every upstream image pull behind a single slow download.	`dev-smoke` (schedule + label-gated PR) and `tests/workspace/dev_manifests_test.go` (kubelet-patch drift gate).
`deploy/local/base/postgres/`	Postgres 16 StatefulSet + Service + Secret; core + spicedb databases.	`test-integration` (testcontainers Postgres) and `dev-smoke` (cluster-local Postgres).
`deploy/local/base/nats/`	NATS StatefulSet + Service with JetStream enabled; mirrors the testcontainers fixture.	`test-integration` and `dev-smoke`.
`deploy/local/base/dex/`	Dex Deployment + Service + ConfigMap with the static `plexsphere-test` client and `admin@example.com` user.	`test-integration` (testcontainers Dex) and `dev-smoke` (sign-in through the Gateway).
`deploy/local/base/spicedb/`	SpiceDB Deployment + Service + Secret; ReBAC engine with gRPC on port 50051, Postgres datastore shared with plexsphere, preshared-key auth, single replica with dispatch disabled.	`test-integration` (testcontainers SpiceDB via `internal/platform/testutil/containers/spicedb.go`) and `dev-smoke` (cluster-local SpiceDB serving the plexsphere API's authz checks).
`deploy/local/base/migrate/`	One-shot Job running `tools/migrate up` against the in-cluster Postgres; `initContainer` waits for readiness.	`dev-smoke` + `tests/integration/dev_kind_test.go::TestDevKind_MigrateCompletesBeforeAPIReady`.
`deploy/local/base/plexsphere/`	API Deployment + Service + ConfigMap; mounts Postgres/NATS/Dex connection env vars.	`dev-smoke` (readiness probe via Gateway `/readyz`).
`deploy/local/base/plexsphere-signer/`	Signer Deployment + Service + ServiceAccount + NetworkPolicy + Secret; the gRPC signing service that the API calls over the in-cluster `plexsphere-signer:8443` endpoint.	`test-integration` (signer binary tests under `tests/integration/signer_*_test.go`) and `tests/workspace/signer_manifests_test.go` (manifest drift gate).
`deploy/local/base/plexsphere-bootstrap/`	One-shot Job that seeds the initial tenancy Domain aggregates via `cmd/plexsphere-bootstrap` AND seeds one IdP binding per demo Domain against the in-cluster Dex IdP (`http://localhost:5556/dex`, `client_id: plexsphere-test`), then pre-creates the canonical `admin@example.com` user under every Domain with `admin` + `auditor` ReBAC tuples. Reads an inline ConfigMap manifest and is idempotent on re-apply (Domain pass takes the `already-exists` no-op branch; IdP-binding pass takes the partial-unique-index conflict branch). `initContainer` waits for the migrate Job.	`test-integration` (`tests/integration/bootstrap_seed_idempotent_test.go`) + `tests/workspace/dev_manifests_test.go` (drift gate).
`deploy/local/base/gateway/`	EnvoyProxy (pins data-plane Service to NodePort 30080) applied directly into `envoy-gateway-system` + GatewayClass (`envoy-gateway`, references the EnvoyProxy via `parametersRef`) + Gateway + one HTTPRoute (`api` listener 8080). The EnvoyProxy is NOT part of the dev overlay because `kubectl apply -n plexsphere-dev -k …` rejects cross-namespace manifests; `make dev` and the chainsaw install step apply it separately with `kubectl apply -f deploy/local/base/gateway/envoyproxy.yaml`.	`dev-smoke` + `tests/e2e/dev/chainsaw-test.yaml` (asserts `GatewayClass.Accepted=True`, then `Gateway.Accepted=True` / `Gateway.Programmed=True`, then each `HTTPRoute.Accepted=True` / `HTTPRoute.ResolvedRefs=True` — Gateway API v1 publishes `ResolvedRefs` only at Listener / HTTPRoute level, not on the Gateway itself).
`deploy/local/vendor/gateway-api-crds.yaml`	SHA-pinned Gateway API v1 CRDs, applied before the overlay.	`dev-smoke` bootstrap step.
`deploy/local/vendor/envoy-gateway.yaml`	SHA-pinned Envoy Gateway install manifest.	`dev-smoke` bootstrap step.
`deploy/local/overlays/dev/kustomization.yaml`	Dev overlay; pins `:dev` image tags and sets `imagePullPolicy: Never` for plexsphere containers.	`dev-smoke` + `tests/workspace/dev_manifests_test.go` (parse-YAML drift gate).
`deploy/local/base/openbao/`	OpenBao 2.0.0 Deployment + Service + ConfigMap + Secret in `-dev` mode (in-memory storage, pre-unsealed, single root token); the cluster-local secrets engine.	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/seaweedfs/`	SeaweedFS 3.75 single-replica StatefulSet + Service running master + volume + filer + s3 with emptyDir-backed storage; the in-cluster S3-compatible object store.	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/mimir/`	Grafana Mimir 2.14.3 single-binary Deployment + Service + ConfigMap with filesystem-backed storage and multi-tenancy disabled; the in-cluster metrics backend.	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/loki/`	Grafana Loki 3.3.2 single-binary monolithic Deployment + Service + ConfigMap with filesystem-backed chunks and in-memory ring; the in-cluster logs backend.	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/crossplane/`	Crossplane v2.0.2 core controller (real `core start`) Deployment + ServiceAccount + scoped `crossplane` ClusterRole/Binding + webhook TLS Secrets + Service; the management-fleet control plane (no providers shipped).	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/external-secrets/`	External Secrets Operator v0.18.2 Deployment + ServiceAccount + ClusterRoleBinding + Service; pulls Secrets from cluster-external backends in production overlays.	`dev-stack-smoke` + `tests/integration/dev_stack_kind_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).
`deploy/local/base/plexd/`	plexd sibling-product Deployment + Secret; the bootstrap-token registration agent that calls plexsphere's `POST /v1/register` on startup. Image tag is templated from `/PLEXD_VERSION`.	`dev-stack-smoke` + `tests/integration/dev_stack_plexd_register_test.go` and `tests/workspace/dev_stack_manifests_test.go` (drift gate).

SpiceDB ReBAC engine

The dev stack ships SpiceDB as the authorisation engine behind the plexsphere API. Every identity-scoped HTTP endpoint checks a relation tuple against SpiceDB before serving a response, so the Deployment is a hard dependency for the /v1/... API surface.

Role: relationship-based access control (ReBAC) evaluator. Stores subject/relation/object tuples and answers permission-check queries from the plexsphere API.
Service: spicedb:50051 (in-cluster DNS), gRPC only. HTTP is disabled (--http-enabled=false) because the only consumer is the plexsphere API's authzed gRPC client.
Auth: preshared-key (spicedb-credentials.SPICEDB_GRPC_PRESHARED_KEY, value test-key). Every plexsphere pod reads the same key from the Secret and attaches it as authzed token bearer metadata on each request.
Datastore: Postgres. Uses postgres-credentials.DATABASE_URL — SpiceDB keeps its relation tuples in its own schema namespace inside the shared database.
Dispatch: disabled (--dispatch-cluster-enabled=false). Single replica, no peers, so cluster dispatch would only introduce a resolution failure on boot. A multi-replica HA posture is tracked separately as a follow-up.
Security: dev-only. Preshared-key auth is NOT production-safe — any pod that can reach spicedb:50051 can speak as any caller. A follow-up replaces this with mTLS + per-workload SPIFFE identities.

Inspect the Deployment (the dev overlay applies into the default namespace, so no -n flag is required):

bash

kubectl get deployment spicedb
kubectl logs deployment/spicedb

Confirm the schema has loaded (the SpiceDB schema tables live in the shared Postgres database and are created by the SpiceDB migrate one-shot on first boot). Postgres is a StatefulSet — the Pod name is postgres-0, not deployment/postgres:

bash

kubectl exec postgres-0 -- \
  psql -U plexsphere -d plexsphere -c '\dt plexsphere.*' | grep -i relation

The SpiceDB wiring rationale, the base manifests, and the full set of --datastore-* flags live in the kustomize base under deploy/local/base/spicedb/ — open the directory's README.md together with the manifests it pins for the per-flag justification.

Troubleshooting

Nine failure modes account for the vast majority of red dev runs. For each, the "how to recognise" column is what the operator typically sees first, and the "how to fix" column names the exact follow-up command.

Symptom / failure mode	How to recognise	How to fix
Port collision — host port 8080 or 5556 is already bound by another process.	`make dev` aborts with `failed to create cluster: exit status 1` and kind logs `address already in use` during `docker run`.	Free the port (`lsof -i :8080`, kill the offender) OR edit `deploy/local/kind.yaml` to remap the host side, then rerun `make dev-down && make dev`.
GatewayClass never admitted — the Gateway sits with the CRD-default conditions forever because the controller has not bound to its class.	`kubectl describe gateway plexsphere -n plexsphere-dev` shows `Accepted`/`Programmed` conditions with `Status: Unknown`, `Message: "Waiting for controller"`, and `LastTransitionTime: 1970-01-01T00:00:00Z`; `kubectl get gatewayclass envoy-gateway` lacks `Accepted=True`.	Re-apply the class: `kubectl apply -f deploy/local/base/gateway/gatewayclass.yaml`. Confirm `spec.controllerName=gateway.envoyproxy.io/gatewayclass-controller`. The workspace drift gate `TestDevManifests_GatewayClassMatchesGateway` catches this offline.
Gateway `AddressNotAssigned` — listeners are `Programmed=True` but the Gateway reports `Programmed=False / Reason: AddressNotAssigned / Message: "No addresses have been assigned to the Gateway"`.	`kubectl describe gateway plexsphere -n plexsphere-dev` shows the `AddressNotAssigned` reason on the top-level `Programmed` condition; `kubectl get svc -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=plexsphere -o wide` lists the data-plane Service as `type: LoadBalancer` with an empty `EXTERNAL-IP`.	Envoy Gateway defaults to `type: LoadBalancer`; a fresh kind cluster has no LB. Re-apply the EnvoyProxy that pins the Service to NodePort: `kubectl apply -f deploy/local/base/gateway/envoyproxy.yaml`. Confirm `GatewayClass.spec.parametersRef` points at `EnvoyProxy/envoy-proxy-config` in `envoy-gateway-system`. The workspace drift gate `TestDevManifests_EnvoyProxyPinsKindNodePorts` catches drift between the EnvoyProxy pins and the kind `extraPortMappings`.
Gateway readiness timeout — GatewayClass is `Accepted=True` but the Gateway stays `Programmed=False`.	`kubectl get gateway` shows `Programmed=False`; Envoy pods in `envoy-gateway-system` are crashlooping.	Re-apply the vendor manifests with server-side apply (the envoyproxies CRD exceeds the 256 KiB last-applied-configuration annotation cap): `kubectl apply --server-side --force-conflicts -f build/dev-vendor/gateway-api-crds.yaml -f build/dev-vendor/envoy-gateway.yaml`, then `kubectl rollout restart deployment -n envoy-gateway-system`.
`kubectl wait` on an HTTPRoute hangs — a custom readiness check times out even though `kubectl describe httproute` shows `Accepted=True` + `ResolvedRefs=True`.	Route conditions in Gateway API v1 live under `.status.parents[].conditions[]` (per-parent), not top-level. `kubectl wait --for=condition=X=True` reads only the top-level list, so the wait can never succeed against HTTPRoute / GRPCRoute / TCPRoute / TLSRoute / UDPRoute.	Use a partial-match assertion against the nested `status.parents[].conditions[]` structure (see `tests/e2e/dev/chainsaw-test.yaml` step `assert-httproutes-resolved`). The workspace drift gate `TestChainsawTests_NoWaitAgainstGatewayAPIRoutes` enforces this offline.
Dex clock skew — sign-in fails with `id_token has expired` immediately after issuance.	`kubectl logs deployment/dex` shows `iat` timestamps more than 60s apart from the kubelet clock.	Resync the host clock (`sudo ntpdate -u pool.ntp.org` on Linux; restart Docker Desktop on macOS), then `make dev-down && make dev`.
`ErrImageNeverPull` — a plexsphere pod stays in `ErrImageNeverPull` because the image was never side-loaded.	`kubectl describe pod` shows `Failed to pull image "plexsphere:dev": imagePullPolicy set to Never`.	Re-run `make dev` (it re-invokes `kind load docker-image`); if the issue persists, confirm the image exists locally: `docker image inspect plexsphere:dev`.
Dependency pods stuck in `PodInitializing` — on a cold boot, upstream images trickle in one at a time even though each pull is fast.	`kubectl get events --field-selector reason=Pulled` shows sub-second pulls with minutes of `including waiting` queue time; the stuck pods' init containers are long `Completed`.	The kubelet patch in `deploy/local/kind.yaml` (`serializeImagePulls: false`, `maxParallelImagePulls: 6`) removes the queueing, and the `make dev` preload step (`scripts/dev-preload-upstream-images.sh`) side-loads every SHA-pinned upstream image from the host Docker cache so a cluster re-create does not hit the network at all. A cluster created before either change keeps the old behaviour — `make dev-down && make dev` re-creates it. Only the very first preload on a machine still downloads each image once; a slow registry path bounds that one-time warm-up.
kind node out of disk — pods evict with `DiskPressure`.	`kubectl describe node plexsphere-control-plane` shows `Conditions: DiskPressure=True`.	Tear down, prune, and rerun: `make dev-down && docker system prune -af && docker volume prune -f && make dev`.

Tear down

make dev-down deletes the plexsphere kind cluster — and nothing else. It refuses to delete a foreign cluster named plexsphere that some other tooling created: the sentinel ConfigMap planted by make dev is the only signal the target trusts:

bash

make dev-down

The sentinel-ConfigMap probe lives in deploy/local/base/ and is applied by make dev. Three sub-tests in tests/integration/dev_stack_down_test.go enforce the contract:

owned-deletes — make dev planted the sentinel, so make dev-down deletes the cluster.
foreign-preserves — a kind cluster named plexsphere without the sentinel is preserved; the target prints a tagged warning and exits zero.
force-down-bypasses — DEV_FORCE_DOWN=1 make dev-down bypasses the sentinel check (escape hatch for an inconsistent cluster the operator deliberately wants to scrub).

Reset recipe

When the stack is in an indeterminate state — a half-applied overlay, a hung Deployment, or a cluster that the troubleshooting matrix does not cover — the reliable reset is: delete the cluster, prune Docker state, and rerun make dev from a clean baseline.

bash

make dev-down
docker system prune -f
make dev

make dev-down deletes the plexsphere kind cluster only; unrelated kind clusters on the same machine are untouched. The docker system prune -f step reclaims dangling layers and exited containers — the same hint make dev-down prints as its final line. The subsequent make dev run takes ~3 minutes from a cold cache (the plexsphere image build dominates) and ~40 seconds from a warm one.

Dev stack runbook ​

Prerequisites ​

Bring up the dev stack ​

Startup ordering ​

plexsphere API env-var contract ​

Seeded Domain and binding IDs ​

Sign-in with the seeded Dex test user ​

Sign in with plexctl ​

Drive the API with plexctl ​

Run the golden flow ​

Manifest-to-CI mapping ​

SpiceDB ReBAC engine ​

Troubleshooting ​

Tear down ​

Reset recipe ​

Dev stack runbook

Prerequisites

Bring up the dev stack

Startup ordering

plexsphere API env-var contract

Seeded Domain and binding IDs

Sign-in with the seeded Dex test user

Sign in with plexctl

Drive the API with `plexctl`

Run the golden flow

Manifest-to-CI mapping

SpiceDB ReBAC engine

Troubleshooting

Tear down

Reset recipe