Skip to content

Observability Ingest HTTP API

This is the reference for the three node-facing batch-ingest endpoints under /v1/nodes/{id}PostNodeMetrics, PostNodeLogs, and PostNodeAudit. It maps each operation to its OpenAPI schema, the NSK authentication seam, the path-id self-check, the request/response shapes, and the closed Problem.code taxonomy. The wire-contract origin is api/openapi/plexsphere-v1.yaml; this doc is a map, not a duplicate contract.

The Observability Ingest surface is the per-Node front door for plexd telemetry. A Node pushes batches of metric samples, structured log lines, and normalised audit events; the front door admits the batch (structure, encoding, size, and per-batch record caps), quota-gates it against per-Node and per-Domain byte budgets, and hands the survivor onto a per-signal JetStream buffer stream. Metrics land in Grafana Mimir; logs and audit events land in Grafana Loki. Records are tagged server-side with the originating Node's Domain and Project — the wire body never carries an identity subject or email — so dashboards and alerts scope through the platform label model. For the bounded-context narrative — the ubiquitous language, the quota matrix, the per-signal buffer-stream subject grammar, and the de-personalised PII posture — see the context reference at ../../contexts/observability/ingest.md.

Operations

MethodPathOperation IDAuthReBAC/authz gateAudit relationBody type & cap
POST/v1/nodes/{id}/metricsPostNodeMetricsNSK bearerpath-id == resolved Node (no ReBAC tuple; node-self)observability.ingest (only on a node_id_mismatch denial)application/json (array of MetricSample), 4 MiB wire cap
POST/v1/nodes/{id}/logsPostNodeLogsNSK bearerpath-id == resolved Node (no ReBAC tuple; node-self)observability.ingest (only on a node_id_mismatch denial)application/x-ndjson (NDJSON LogLine), 4 MiB wire cap
POST/v1/nodes/{id}/auditPostNodeAuditNSK bearerpath-id == resolved Node (no ReBAC tuple; node-self)observability.ingest (only on a node_id_mismatch denial)application/x-ndjson (NDJSON AuditEvent), 4 MiB wire cap
  • The authorization model is node-self: there is no ReBAC tuple check. The NSK middleware authenticates the Node and the handler then confirms the URL path id is the very Node the NSK belongs to. A push addressed at any other Node is refused — there is no relation to grant one Node visibility of another's ingest path.
  • The audit relation observability.ingest is stamped only on the 403 node_id_mismatch denial — the security-relevant event of an NSK trying to push under a foreign path-id. A successful ingest is recorded on the byte, record, and ingest-lag metrics, not on the audit log.
  • The handler ships behind a fail-closed scaffold gate — when no observability ingest backend is configured (the PLEXSPHERE_OBS_NATS_URL opt-in is unset), the surface is disabled and every request returns 501 observability_ingest_not_provisioned. The surface is either fully wired or fully off.

Authentication

The three ingest operations do not use the operator bearer scheme. They authenticate the Node against the per-Node NSK plaintext supplied in the Authorization: Bearer header, exactly the same seam the heartbeat and Secret Store fetch surfaces use. The NSK middleware resolves the Node from the presented credential and attaches it to the request context. The handler then double-checks the URL path id against the resolved Node and refuses a cross-Node push with 403 node_id_mismatch, so a leaked NSK cannot be replayed against another Node's ingest path.

A missing or malformed credential surfaces as 401 unauthorized; a revoked credential surfaces as 401 nsk_revoked, so log scrapers can distinguish a never-valid credential from a previously valid one that was revoked.

Path parameters & headers

ParameterTypeRequiredNotes
id (path)string (uuid)yesNode identifier (UUIDv7) — the ingest scope. Must equal the Node the NSK resolves to, else 403 node_id_mismatch.
Authorization (header)stringyesBearer <NSK plaintext> — the per-Node Node Secret Key issued at registration.
X-Plexsphere-Sent-At (header)string (date-time)yesRFC 3339 timestamp at which plexd dispatched the batch. Required by the handler; a missing or unparseable value is refused with 400 ingest_sent_at_invalid. Drives the ingest-lag metric the platform records on acceptance.
Content-Encoding (header)stringnogzip or identity/empty only — gzip is the on-the-wire compression. Any other value is refused with 415 ingest_encoding_unsupported.

The X-Plexsphere-Sent-At header is declared required: false in the OpenAPI document on purpose: were it required: true, the generated request wrapper would reject a missing header with its own generic 400 before the handler runs, so the refusal could not carry the stable ingest_sent_at_invalid code operators alert off. The handler enforces presence and RFC 3339 format itself. The Content-Encoding header is enforced entirely by the handler — it is not declared as an OpenAPI parameter.

Request bodies

Each operation carries a batch of records. Metrics are a JSON array; logs and audit events are newline-delimited JSON (application/x-ndjson), one JSON object per line. Across all three signals a batch may carry at most 10000 records; the compressed wire body is capped at 4 MiB and the decompressed body at 32 MiB.

  • PostNodeMetricsapplication/json, a JSON array of MetricSample. Each sample requires group, name, value, and timestamp; labels is an optional dimension map.
    • group — closed enum: node_resources, tunnel_health, peer_latency, agent_stats.
    • namestring, the metric name within its group (non-empty).
    • valuenumber, the numeric sample value.
    • timestampstring (date-time), RFC 3339, when the sample was observed.
    • labels — optional stringstring map; the platform adds the Domain, Project, and Node tags itself.
  • PostNodeLogsapplication/x-ndjson, one LogLine per line. Each line requires severity, message, and timestamp; unit and hostname are optional.
    • severity — closed enum (syslog keywords): emerg, alert, crit, err, warning, notice, info, debug.
    • messagestring, the log message body (non-empty).
    • timestampstring (date-time), RFC 3339, when the line was emitted.
    • unit — optional string, the originating systemd unit or source.
    • hostname — optional string, the host the line was emitted on.
  • PostNodeAuditapplication/x-ndjson, one AuditEvent per line. Each line requires source, action, outcome, and timestamp.
    • source — closed enum: auditd, k8s.
    • actionstring, the audited action (syscall name or Kubernetes verb; non-empty).
    • outcomestring, the outcome of the audited action (non-empty).
    • timestampstring (date-time), RFC 3339, when the event occurred.

A record that violates the required-field, closed-enum, or shape constraints — a non-array metrics body, a non-object NDJSON line, an empty batch, a missing required field, or a value outside a closed enum — is refused with 400 ingest_batch_malformed.

Success response

A 202 Accepted carries an IngestReceipt and a Cache-Control: no-store directive (the handler stamps no-store on every response, success and error alike).

FieldTypeMeaning
accepted_atstring (date-time)Server-side timestamp at which the batch was accepted and handed to the ingest buffer.
recordsintegerNumber of records accepted from the batch — the array length (metrics) or the count of non-blank NDJSON lines (logs / audit).

Error taxonomy

All error responses use the shared Problem envelope (application/problem+json); the 403 path uses the PermissionDenied shape for the path-id denial. The 429 and 503 arms carry a required Retry-After header. The closed Problem.code set these surfaces emit:

HTTP statusProblem.codeTriggerRetry-After
400ingest_sent_at_invalidX-Plexsphere-Sent-At missing or unparseable.no
400ingest_encoding_invalidThe gzip stream failed to inflate.no
400ingest_batch_malformedBody not the shape its signal requires — non-array metrics, non-object NDJSON line, empty batch, missing required field, or a closed-enum violation.no
401unauthorizedNSK in the Authorization: Bearer header is missing or malformed.no
401nsk_revokedThe NSK has been revoked.no
403node_id_mismatchThe NSK authenticates but resolves to a Node other than the path id; uses the PermissionDenied shape and is audited on the observability.ingest relation.no
413ingest_body_too_largeThe compressed wire body or the inflated batch exceeds its byte cap.no
413ingest_batch_too_many_recordsThe batch carries more than the per-batch record cap (10000).no
415ingest_encoding_unsupportedContent-Encoding names an encoding other than gzip or identity.no
429per_node_rate_limitedThe per-Node byte budget is exhausted.yes (Retry-After, ~1s)
429capacity_exceededThe per-Domain aggregate byte budget is exhausted.yes (Retry-After, ~5s)
501observability_ingest_not_provisionedThe surface is not wired (PLEXSPHERE_OBS_NATS_URL unset).no
503ingest_buffer_unavailableThe JetStream buffer is unreachable or saturated.yes (Retry-After, ~5s)

The OpenAPI spec enumerates the response codes 202, 400, 401, 403, 413, 415, 429, 501, and 503 per operation. The 500 internal arm is the transport's defensive fallback for an unexpected server-side failure — it is not enumerated in the spec; the wire body stays generic and no backend or driver text is interpolated into it.

Cross-references