Appearance
gRPC surface and operations
This page covers the wire surface the signer exposes, the network topology around it, and the debugging workflow operators reach for when something goes wrong. For the security contracts the surface relies on see Security invariants; for the entry point see the index.
gRPC surface
The Signing Service exposes exactly two RPCs on api/proto/signing/v1/signing.proto (generated Go stubs live under pkg/proto/signing/v1/):
Sign(SignRequest) → SignResponse— produces an Ed25519 signature overcanonical_bytesunder the requestedscopeandkey_id. The response echoes the actual signing key id so callers that passed an emptykey_idcan pin to it on subsequent requests. The signer NEVER re-canonicalises the payload.PublicKey(PublicKeyRequest) → PublicKeyResponse— returns the 32-byte public half plus the lifecyclestate(active/rotating/retired), and acachedbool reporting the source ofpublic_key:falsewhen the bytes came from a liveKeyProvider.PublicKeycall (authoritative),truewhen the provider was unreachable and the signer served the locally-cached bytes from thesigning_keyrow instead. Callers that must distinguish provider-sourced from DB-cached bytes (e.g. trust-bundle publishers about to ship a re-keyed public half) branch oncachedto gate retries; plain signature-verification callers can ignore it. Callers must tolerate every state: signatures minted by a retired key still need verifying.
Stable status-detail strings the server returns (pinned by the proto doc comment and mirrored as constants next to mapSignErrorToStatus in internal/signing/grpc_server.go):
| Error class | gRPC code | Detail string |
|---|---|---|
| Provider handle down | UNAVAILABLE | signing: key provider unavailable |
| Key retired | FAILED_PRECONDITION | signing: key retired |
| Rotation in progress | FAILED_PRECONDITION | signing: rotation in progress |
| Key not found | NOT_FOUND | signing: key not found |
| Scope mismatch | NOT_FOUND | signing: scope mismatch |
| Peer identity denied | PERMISSION_DENIED | signing: client identity denied |
| Invariant violation | INVALID_ARGUMENT | signing: invariant violation |
Clients should branch on the detail string, not the human message field. Changing any of these strings is a breaking change for every client and must go through the buf breaking-change gate together with a coordinated consumer update.
gRPC reflection is intentionally disabled — see the DECISION block in cmd/plexsphere-signer/app.go for the rationale: exposing reflection on an mTLS-only surface would turn the authn boundary into an information-disclosure surface.
The typed in-tree client lives at internal/access/signer_client.go. It pools connections (default 4), retries UNAVAILABLE with exponential backoff capped at 200ms, and enforces a 5s per-call deadline that mirrors the server's own request deadline.
NetworkPolicy topology
The signer listens on two ports:
- :8443 — the mTLS gRPC signer surface. NetworkPolicy restricts ingress to pods carrying
app.kubernetes.io/name: plexsphere, so a peer without that label cannot even open the TCP socket. A compromised sidecar in the same namespace does not reach the signer unless it also wears theplexspherelabel. - :8081 — the HTTP probe / metrics port (
/livez,/readyz,/metrics). Kubelet reaches probe ports directly over the Pod IP and bypasses NetworkPolicy, so ingress rules on this port would be a no-op. Prometheus scraping is admitted via a separate scrape-allow policy authored by the observability stack.
Egress stays default-allow so the signer reaches the cloud KMS / HSM backing service. Tightening egress to the specific KMS endpoints is a follow-up once per-provider endpoints are known — see the in-manifest header comment.
The manifest lives at deploy/local/base/plexsphere-signer/networkpolicy.yaml; the full signer kustomization (ServiceAccount, Secret, Service, Deployment) is in deploy/local/base/plexsphere-signer/.
Debugging
When a Sign call fails in production, walk the layers outside-in. Every step is designed to narrow the blast radius without touching private key material.
- Check
/metricson the signer's :8081 surface. The per-method counters and histograms show whether denials are stacking up at the peer-identity interceptor (counter spike oninsufficient_relation), at the RPC layer (spike oncaveat_violationorinvariant violation), or in the provider (spike onkey provider unavailable). - Tail
/livezand/readyzon :8081./livezis a liveness signal — a failure means the process is unhealthy andkubeletwill restart it./readyzadditionally runs the signing reconcile probe registered byRegisterSigningReconcileProbeincmd/plexsphere-signer/app.go: a red/readyzusually means theKeyProviderhandle is down or the per-scope reconcile call failed. - Confirm the caller's SPIFFE identity. Extract the client certificate the caller presents on the mTLS handshake (via
openssl s_client -connect signer:8443 -showcertsfrom a pod on theplexspherepodSelector) and read the URI SAN. Compare byte-for-byte against the--allow-spiffe-idvalues on the signer Deployment; the interceptor does a plain map lookup with no normalisation beyond whaturl.URL.String()performs. - Tail the peer-identity audit log. Every admission decision writes one entry under
service:signer:peer=<spiffe-id>withReason=grantedorReason=insufficient_relation. A missing admission entry means the call never reached the signer (check the NetworkPolicy, the load balancer, and the client's own mTLS config). - Decode the gRPC status detail. On a client-side failure, read
status.FromError(err).Message()— the stable detail string is the one the signer wrote; the client never needs to parse human prose. Branch on the exact bytes in the table above. - List the active keys for a scope. Over the typed client in
internal/access/signer_client.go, callPublicKey(scope, "")to retrieve the active key plus its lifecycle state. ANotFoundresponse means the reconcile loop has not populated the scope; arotatingstate means the rotation window is open and the old key is still signing untilClosesAt. - Inspect the rotation repo query set.
KeyRepo.ListPublicKeysForScopeininternal/signing/ports.goreturns every key under a scope not yet beyond the retention window — useful when serving trust bundles while the provider is briefly unreachable.
Operators NEVER need to read signing-private material to debug a signer incident. Any runbook step that asks for it is a bug against the invariants in this reference.
What the Signing Service is not
Pinned verbatim against ../../../README.md#signing-service. The signer exposes sign(canonical-bytes) → signature and public-key(key-id) only; everything outside that two-RPC surface is explicitly out of scope.
- NOT a private-key export API. The signer NEVER exports private key material under any circumstance. No RPC, flag, or debug endpoint returns an Ed25519 private half.
- NOT a policy engine. The signer NEVER evaluates policy. It signs what the caller hands it; policy decisions live in the Policy Engine bounded context.
- NOT an envelope / JWT persistence layer. The signer NEVER persists envelopes or session JWTs. The caller (SSE Event Bus, Access Orchestrator) owns the resulting signature byte-for-byte and is responsible for storage, replay, and distribution.
- NOT a session-state store. The signer NEVER stores session state. Every RPC is stateless at the application layer; the only state the signer owns is the SigningKey lifecycle (
active/rotating/retired) and the overlap-window timestamps. - NOT a canonicalisation service. The signer NEVER re-canonicalises the payload. Deterministic encoding is the caller's responsibility.