Alert Rules — per-Domain stored alerting definitions

This document is the authoritative bounded-context reference for the Alert Rule context — the half of the observability stack where a Domain declares the alert rules its telemetry should be evaluated against. An alert rule pairs an observability signal (a metric or log series expression in the backend's query language) with a comparator and a numeric threshold, so the rule describes the crossing a downstream evaluator should fire on, at a declared severity. The context owns the AlertRule aggregate, its value objects, the per-Domain name-uniqueness invariant, the five CRUD HTTP operations, and the persistence table that durably stores the operator's alerting intent. The domain root that pins the ubiquitous language is ../../../internal/observability/alerts.

The platform stores rules; it does not evaluate or fire them. This is the single most important scope boundary of the context. plexsphere is the durable system of record for the alert definitions an operator authors — their signal, comparator, threshold, severity, and enabled state — and nothing more. The evaluation that watches a live metric series, decides a threshold has been crossed, and raises a firing alert lives entirely downstream in the bundled Grafana Mimir / Grafana stack, which reads the telemetry the ingest and routing halves deliver. A reader should never describe this context as evaluating, triggering, or firing a rule: it persists the rule and serves it back over /v1, and the enabled flag merely records whether the operator wants the rule to participate in that downstream evaluation without forcing a delete and re-author.

Ubiquitous language

The terms below travel verbatim across the domain root, the aggregate, the value objects, the application service, the repository port, and the transport surface. Documentation, JSON fields, and persisted columns adopt the exact spelling.

Term	Definition	Code anchor
AlertRule	The aggregate root: one Domain-scoped alert rule. It records the Signal it watches, the Comparator and Threshold that describe the crossing it should be evaluated for, the Severity it should be raised at, whether it is enabled, and its persistence timestamps. Fields are unexported so the invariants are only established through `New` (a fresh rule) or `Hydrate` (a persisted one).	`../../../internal/observability/alerts/alert_rule.go` (`AlertRule`)
Signal	The metric or log series expression the rule watches, expressed in the backend's query language. It is a free-form bounded string (non-empty after trimming, at most 1024 runes) — the context does not parse or evaluate it, it stores it verbatim for the downstream evaluator.	`../../../internal/observability/alerts/alert_rule.go` (`AlertRule.Signal`)
Comparator	The closed set of crossing directions a rule's Signal is compared against its Threshold with — `gt`, `gte`, `lt`, `lte`. String-backed so it JSON-encodes and persists as its wire token; the zero value is invalid by construction.	`../../../internal/observability/alerts/types.go` (`Comparator`)
Threshold	The numeric value the Signal is compared against. A `float64` that must be finite — NaN and both infinities are rejected so a non-finite threshold can never be persisted.	`../../../internal/observability/alerts/validation.go` (`validateThreshold`)
Severity	The closed set of severities a rule is raised at — `info`, `warning`, `critical`. String-backed; the zero value is invalid by construction.	`../../../internal/observability/alerts/types.go` (`Severity`)
enabled	A plain boolean recording whether the operator wants the rule to participate in downstream evaluation. A fresh rule is enabled; toggling it never fails (it carries no invariant beyond the type).	`../../../internal/observability/alerts/alert_rule.go` (`AlertRule.SetEnabled`)
name	The Domain-unique logical name a rule is addressed and displayed by. Non-empty after trimming, at most 200 runes, and unique within a Domain — the persistence layer enforces the `(domain_id, name)` uniqueness.	`../../../internal/observability/alerts/validation.go` (`validateName`)
Repository	The aggregate-shaped outbound persistence port the application service drives. Every method takes or returns whole `AlertRule` aggregates, never row structs, so the domain stays free of persistence concerns.	`../../../internal/observability/alerts/ports.go` (`Repository`)
AuditSink	The optional outbound audit port the application service records each mutating decision through. A nil sink is tolerated and the service degrades silently.	`../../../internal/observability/alerts/ports.go` (`AuditSink`)

The aggregate shape

An AlertRule carries exactly these fields, all unexported behind accessor methods:

Field	Type	Meaning
`id`	`AlertRuleID`	The app-minted UUIDv7 identifier.
`domainID`	`DomainID`	The owning Domain (an external reference to the identity context's Domain aggregate).
`name`	`string`	The Domain-unique logical name.
`signal`	`string`	The metric / log series expression.
`comparator`	`Comparator`	The crossing direction (`gt` / `gte` / `lt` / `lte`).
`threshold`	`float64`	The finite numeric breach point.
`severity`	`Severity`	The severity (`info` / `warning` / `critical`).
`enabled`	`bool`	Whether the rule participates in downstream evaluation.
`createdAt` / `updatedAt`	`time.Time`	Persistence timestamps; zero until persisted.

There is no expression / for-duration / labels triple on the aggregate: a rule is the (signal, comparator, threshold, severity, enabled) shape above, plus its name and Domain scope. The signal field is the query-language expression; the comparator and threshold are stored separately rather than folded into one expression string.

The two constructors. New builds a fresh rule, mints a UUIDv7 id, sets it enabled, and leaves the timestamps zero (the repository stamps them on insert). Hydrate rebuilds a persisted rule from a scanned row without re-minting the id. Both run the same field validators, so a corrupt row can never produce a half-formed aggregate.

The mutators. Rename, SetSignal, SetComparator, SetThreshold, and SetSeverity each validate before applying and leave the receiver untouched on rejection; SetEnabled is the one mutator that cannot fail.

Invariants and validation

Every create-time and hydrate-time rejection (except the structural zero-id / zero-domain guards Hydrate adds) wraps the single parent sentinel ErrInvalidAlertRule, so the transport boundary maps one sentinel onto one 400 Problem code.

Invariant	Rule	Rejection
Domain scope	`domainID` must be non-zero.	`ErrInvalidAlertRule` (New); a bare structural error (Hydrate).
Name	Non-empty after trimming whitespace; at most 200 runes.	`ErrInvalidAlertRule`
Signal	Non-empty after trimming whitespace; at most 1024 runes.	`ErrInvalidAlertRule`
Threshold	A finite `float64` — NaN, `+Inf`, and `-Inf` are rejected.	`ErrInvalidAlertRule`
Comparator	A member of the closed `gt` / `gte` / `lt` / `lte` set.	`ErrInvalidAlertRule` (or `ErrUnknownComparator` from `ParseComparator`).
Severity	A member of the closed `info` / `warning` / `critical` set.	`ErrInvalidAlertRule` (or `ErrUnknownSeverity` from `ParseSeverity`).
Name uniqueness	Each `(domain_id, name)` pair is unique within a Domain.	`ErrAlertRuleConflict` (enforced by the repository's unique index).

The bound constants are not operator knobs: the 200-rune name cap and the 1024-rune signal cap are domain constants that cap the persisted columns so an unbounded value can never reach the database.

The application service

The application service in ../../../internal/observability/alerts/services orchestrates the aggregate against the Repository port. It exposes Create, Get, List, Update, and Delete. The constructor panics on a nil repository (a composition-root wiring bug), while the audit sink, clock, and logger are nil-tolerated options. Two behaviours are worth pinning:

Update is a load-mutate-persist cycle over an optional-field patch. It reads the rule, applies each supplied patch field through the aggregate's mutators (a nil pointer leaves the field untouched), and persists the result. A validation error from a mutator surfaces before any persistence.
Mutations emit a post-persist audit row through the sink when one is wired (alerts.create / alerts.update / alerts.delete); a flaky audit backend is logged, never propagated, so it can never turn a successful mutation into a user-visible error.

The HTTP surface

The five CRUD operations live under /v1/domains/{domainId}/alert-rules and are implemented by the anti-corruption transport package ../../../internal/transport/http/v1/alerts. The transport package re-declares the domain port, the read-model DTO, and the error sentinels locally and never imports the domain module; the production adapter at the composition root translates the domain AlertRule and its sentinels onto the transport-local shapes.

Operation	Method	Path
`ListAlertRules`	`GET`	`/v1/domains/{domainId}/alert-rules`
`CreateAlertRule`	`POST`	`/v1/domains/{domainId}/alert-rules`
`GetAlertRule`	`GET`	`/v1/domains/{domainId}/alert-rules/{alertRuleId}`
`UpdateAlertRule`	`PATCH`	`/v1/domains/{domainId}/alert-rules/{alertRuleId}`
`DeleteAlertRule`	`DELETE`	`/v1/domains/{domainId}/alert-rules/{alertRuleId}`

The wire-contract origin is ../../../api/openapi/plexsphere-v1.yaml; this doc is a map of the surface, not a duplicate of the schema.

Cursor pagination. ListAlertRules is paginated transport-side: the handler applies an opaque cursor over the rule slice the service returns, so the list read stays stable as a Domain's rule set grows.

Order of checks (mutations). A mutating handler runs a fixed gate order so an unauthorised caller never reaches the create or decode: a 501 when the surface is unwired (nil Service or Authz), a 401 when the request carries no authenticated principal, a 400 when the addressed Domain id is the zero UUID, the ReBAC manage check on the addressed Domain before the body is decoded, then a 400 on a malformed body, and finally the domain validation / name-conflict mapping or a 201 / 200 on success.

Error-code taxonomy

Every failure surface carries a stable Problem.code. The closed set this surface emits, with its HTTP status:

HTTP	`Problem.code`	Meaning
`400`	`invalid_domain_id`	The addressed Domain id is the zero UUID.
`400`	`invalid_alert_rule_id`	The addressed alert-rule id is malformed.
`400`	`invalid_cursor`	The list cursor is malformed.
`400`	`invalid_body`	The request body is not a valid alert-rule document.
`400`	`alert_rule_invalid`	The rule failed a domain validation rule (name, signal, threshold, comparator, or severity).
`401`	`unauthenticated`	The request carries no authenticated principal.
`403`	(PermissionDenied)	The caller lacks the required relation on the addressed Domain.
`404`	`alert_rule_not_found`	No alert rule with the requested id exists.
`409`	`alert_rule_name_conflict`	A create or rename collides with an existing rule's `(domain, name)` uniqueness.
`501`	`alert_rules_not_provisioned`	The surface is not wired in this build.
`500`	(generic)	An unexpected server-side fault; the underlying error is logged, never interpolated onto the wire.

The transport-local sentinels the production adapter translates the domain errors onto are ErrAlertRuleNotFound (→ 404), ErrAlertRuleNameConflict (→ 409), and ErrAlertRuleInvalid (→ 400), declared in ../../../internal/transport/http/v1/alerts/errors.go.

ReBAC posture

Reading an alert rule gates the Domain read relation; mutating one (create / update / delete) gates the Domain manage relation. The ReBAC object is the canonical domain:<uuid>, and the check runs before any read or mutation so an unauthorised caller never observes the existence side-channel a forward-then-check flow would leak.

Note the relation names: the codebase gates on the schema relations read and manage (matching the existing Domain surfaces and the tenancy schema). The OpenAPI 403 prose names the required permissions with the operator-facing labels domain-view (reads) and domain-edit (mutations); the gate uses the schema names. This is recorded in a DECISION block in ../../../internal/transport/http/v1/alerts/errors.go.

Audit contract

The surface emits a canonical (subject, relation, object, outcome, correlation_id) audit tuple on two paths:

A denial — a caller attempting an operation it lacks the relation for — writes one permission_denied row before the 403 response is flushed, stamping the missing relation into the row's caveat context so an auditor can pivot from the denial to the gate that fired.
A successful mutation — create / update / delete — writes one granted row with the verb-style relation (alerts.create / alerts.update / alerts.delete).

A nil sink degrades silently — the audit row is dropped while the security gate still fires — and a sink error is logged through slog, never propagated to the caller.

Persistence

The single table is plexsphere.alert_rules, created by migration ../../../internal/platform/db/migrations/0052_alert_rules.sql. It is keyed on the app-minted UUID id, FKs the owning Domain ON DELETE CASCADE (a rule has no meaning once its Domain is gone), and enforces the (domain_id, name) uniqueness through a unique index. The comparator and severity columns carry SQL CHECK constraints pinning their closed sets, and enabled defaults to true. The migration's down arm refuses the downgrade with SQLSTATE 0A000: the table holds operator-authored alerting configuration, and dropping it would silently discard every authored rule.

Cross-references

./ingest.md — the ingest front door that admits and buffers the telemetry a downstream evaluator reads when applying these rules.
./routing.md — the egress half that delivers the buffered telemetry to Grafana Mimir / Grafana Loki, where rule evaluation actually happens.
./incidents.md — the sibling Incident context, the operational record a firing alert may feed into.
./query.md — the read-only metrics / logs query proxy an operator uses to explore the same series a rule's signal names.
../../reference/cli/plexctl/alert.md — the plexctl alert CLI reference for managing rules from the terminal.
../index.md — the bounded-contexts landing page.
../../../internal/observability/alerts — the bounded-context root that pins the ubiquitous language.
../../../api/openapi/plexsphere-v1.yaml — the OpenAPI spec the five CRUD operations originate from.

Alert Rules — per-Domain stored alerting definitions ​

Ubiquitous language ​

The aggregate shape ​

Invariants and validation ​

The application service ​

The HTTP surface ​

Error-code taxonomy ​

ReBAC posture ​

Audit contract ​

Persistence ​

Cross-references ​