Skip to content

gRPC surface and operations

This page covers the wire surface the signer exposes, the network topology around it, and the debugging workflow operators reach for when something goes wrong. For the security contracts the surface relies on see Security invariants; for the entry point see the index.

gRPC surface

The Signing Service exposes exactly two RPCs on api/proto/signing/v1/signing.proto (generated Go stubs live under pkg/proto/signing/v1/):

  • Sign(SignRequest) → SignResponse — produces an Ed25519 signature over canonical_bytes under the requested scope and key_id. The response echoes the actual signing key id so callers that passed an empty key_id can pin to it on subsequent requests. The signer NEVER re-canonicalises the payload.
  • PublicKey(PublicKeyRequest) → PublicKeyResponse — returns the 32-byte public half plus the lifecycle state (active / rotating / retired), and a cached bool reporting the source of public_key: false when the bytes came from a live KeyProvider.PublicKey call (authoritative), true when the provider was unreachable and the signer served the locally-cached bytes from the signing_key row instead. Callers that must distinguish provider-sourced from DB-cached bytes (e.g. trust-bundle publishers about to ship a re-keyed public half) branch on cached to gate retries; plain signature-verification callers can ignore it. Callers must tolerate every state: signatures minted by a retired key still need verifying.

Stable status-detail strings the server returns (pinned by the proto doc comment and mirrored as constants next to mapSignErrorToStatus in internal/signing/grpc_server.go):

Error classgRPC codeDetail string
Provider handle downUNAVAILABLEsigning: key provider unavailable
Key retiredFAILED_PRECONDITIONsigning: key retired
Rotation in progressFAILED_PRECONDITIONsigning: rotation in progress
Key not foundNOT_FOUNDsigning: key not found
Scope mismatchNOT_FOUNDsigning: scope mismatch
Peer identity deniedPERMISSION_DENIEDsigning: client identity denied
Invariant violationINVALID_ARGUMENTsigning: invariant violation

Clients should branch on the detail string, not the human message field. Changing any of these strings is a breaking change for every client and must go through the buf breaking-change gate together with a coordinated consumer update.

gRPC reflection is intentionally disabled — see the DECISION block in cmd/plexsphere-signer/app.go for the rationale: exposing reflection on an mTLS-only surface would turn the authn boundary into an information-disclosure surface.

The typed in-tree client lives at internal/access/signer_client.go. It pools connections (default 4), retries UNAVAILABLE with exponential backoff capped at 200ms, and enforces a 5s per-call deadline that mirrors the server's own request deadline.

NetworkPolicy topology

The signer listens on two ports:

  • :8443 — the mTLS gRPC signer surface. NetworkPolicy restricts ingress to pods carrying app.kubernetes.io/name: plexsphere, so a peer without that label cannot even open the TCP socket. A compromised sidecar in the same namespace does not reach the signer unless it also wears the plexsphere label.
  • :8081 — the HTTP probe / metrics port (/livez, /readyz, /metrics). Kubelet reaches probe ports directly over the Pod IP and bypasses NetworkPolicy, so ingress rules on this port would be a no-op. Prometheus scraping is admitted via a separate scrape-allow policy authored by the observability stack.

Egress stays default-allow so the signer reaches the cloud KMS / HSM backing service. Tightening egress to the specific KMS endpoints is a follow-up once per-provider endpoints are known — see the in-manifest header comment.

The manifest lives at deploy/local/base/plexsphere-signer/networkpolicy.yaml; the full signer kustomization (ServiceAccount, Secret, Service, Deployment) is in deploy/local/base/plexsphere-signer/.

Debugging

When a Sign call fails in production, walk the layers outside-in. Every step is designed to narrow the blast radius without touching private key material.

  1. Check /metrics on the signer's :8081 surface. The per-method counters and histograms show whether denials are stacking up at the peer-identity interceptor (counter spike on insufficient_relation), at the RPC layer (spike on caveat_violation or invariant violation), or in the provider (spike on key provider unavailable).
  2. Tail /livez and /readyz on :8081. /livez is a liveness signal — a failure means the process is unhealthy and kubelet will restart it. /readyz additionally runs the signing reconcile probe registered by RegisterSigningReconcileProbe in cmd/plexsphere-signer/app.go: a red /readyz usually means the KeyProvider handle is down or the per-scope reconcile call failed.
  3. Confirm the caller's SPIFFE identity. Extract the client certificate the caller presents on the mTLS handshake (via openssl s_client -connect signer:8443 -showcerts from a pod on the plexsphere podSelector) and read the URI SAN. Compare byte-for-byte against the --allow-spiffe-id values on the signer Deployment; the interceptor does a plain map lookup with no normalisation beyond what url.URL.String() performs.
  4. Tail the peer-identity audit log. Every admission decision writes one entry under service:signer:peer=<spiffe-id> with Reason=granted or Reason=insufficient_relation. A missing admission entry means the call never reached the signer (check the NetworkPolicy, the load balancer, and the client's own mTLS config).
  5. Decode the gRPC status detail. On a client-side failure, read status.FromError(err).Message() — the stable detail string is the one the signer wrote; the client never needs to parse human prose. Branch on the exact bytes in the table above.
  6. List the active keys for a scope. Over the typed client in internal/access/signer_client.go, call PublicKey(scope, "") to retrieve the active key plus its lifecycle state. A NotFound response means the reconcile loop has not populated the scope; a rotating state means the rotation window is open and the old key is still signing until ClosesAt.
  7. Inspect the rotation repo query set.KeyRepo.ListPublicKeysForScope in internal/signing/ports.go returns every key under a scope not yet beyond the retention window — useful when serving trust bundles while the provider is briefly unreachable.

Operators NEVER need to read signing-private material to debug a signer incident. Any runbook step that asks for it is a bug against the invariants in this reference.

What the Signing Service is not

Pinned verbatim against ../../../README.md#signing-service. The signer exposes sign(canonical-bytes) → signature and public-key(key-id) only; everything outside that two-RPC surface is explicitly out of scope.

  • NOT a private-key export API. The signer NEVER exports private key material under any circumstance. No RPC, flag, or debug endpoint returns an Ed25519 private half.
  • NOT a policy engine. The signer NEVER evaluates policy. It signs what the caller hands it; policy decisions live in the Policy Engine bounded context.
  • NOT an envelope / JWT persistence layer. The signer NEVER persists envelopes or session JWTs. The caller (SSE Event Bus, Access Orchestrator) owns the resulting signature byte-for-byte and is responsible for storage, replay, and distribution.
  • NOT a session-state store. The signer NEVER stores session state. Every RPC is stateless at the application layer; the only state the signer owns is the SigningKey lifecycle (active / rotating / retired) and the overlap-window timestamps.
  • NOT a canonicalisation service. The signer NEVER re-canonicalises the payload. Deterministic encoding is the caller's responsibility.