Appearance
Incidents — per-Domain operational incidents with an append-only timeline
This document is the authoritative bounded-context reference for the Incident context — the per-Domain record of an operational incident and the append-only timeline of how it progressed. An incident carries a title, a severity, an open-then-resolved lifecycle, the instant it opened, the instant it resolved, and an ordered trail of timeline events (operator notes and status changes). The context owns the Incident aggregate, the TimelineEvent value object, the lifecycle state machine, and the two persistence tables. The domain root that pins the ubiquitous language is ../../../internal/observability/incidents.
The lifecycle has exactly two states: open and resolved. There is no acknowledged state. An incident is opened in the open state, accumulates timeline events while it is open, and transitions exactly once to the terminal resolved state. A reader should never describe an intermediate acknowledged step — the closed Status set in code is open | resolved and nothing more, and the persistence CHECK constraint encodes the same two values.
Ubiquitous language
The terms below travel verbatim across the domain root, the aggregate, the value objects, the application service, the repository port, and the transport surface. Documentation, JSON fields, and persisted columns adopt the exact spelling.
| Term | Definition | Code anchor |
|---|---|---|
| Incident | The aggregate root: one per-Domain operational incident. It records its title, severity, open -> resolved lifecycle, the instant it opened, the instant it resolved (nil while open), and an append-only timeline. Fields are unexported so the invariants are only established through Open (a fresh incident) or Hydrate (a persisted one). | ../../../internal/observability/incidents/incident.go (Incident) |
| Status | The closed lifecycle state — open or resolved. The zero value is not a member, so IsValid rejects it. There is no acknowledged member. | ../../../internal/observability/incidents/types.go (Status) |
| Severity | The closed severity classification — info, warning, critical. Defined locally in this context rather than imported from the alerts context, so the bounded contexts stay decoupled. | ../../../internal/observability/incidents/types.go (Severity) |
| TimelineEvent | A value object recording one entry on the incident's append-only timeline: a note or a status-change marker, scoped to its parent Incident and stamped with the moment it occurred. Its invariants are established through NewTimelineEvent or HydrateTimelineEvent. | ../../../internal/observability/incidents/timeline_event.go (TimelineEvent) |
| TimelineEventKind | The closed kind set for a timeline event — note (a free-form operator note) or status_change (a lifecycle-transition marker). | ../../../internal/observability/incidents/types.go (TimelineEventKind) |
| Repository | The aggregate-shaped outbound persistence port the application service drives. It exposes whole Incident aggregates (and their timeline) in domain terms, leaking no ORM or query-builder types. | ../../../internal/observability/incidents/ports.go (Repository) |
| AuditSink | The optional outbound audit port the application service records each performed operation through. A nil sink is tolerated and the service degrades silently. | ../../../internal/observability/incidents/ports.go (AuditSink) |
The aggregate shape
An Incident carries exactly these fields, all unexported behind accessor methods:
| Field | Type | Meaning |
|---|---|---|
id | IncidentID | The app-minted UUIDv7 identifier. |
domainID | DomainID | The owning Domain (an external reference to the identity context's Domain aggregate). |
title | string | The operator-facing label. |
severity | Severity | The severity (info / warning / critical). |
status | Status | The lifecycle state (open / resolved). |
openedAt | time.Time | The instant the incident opened. |
resolvedAt | *time.Time | The instant it resolved; nil while open. |
timeline | []TimelineEvent | The append-only ordered trail. |
A TimelineEvent carries its own id, the parent incidentID, the kind, a free-form message, and the occurredAt instant.
The lifecycle state machine and its invariants
The incident lifecycle is a two-state, single-transition machine, and its invariants are enforced on the aggregate (the application service holds no business rules of its own):
open(initial).Openbuilds a fresh incident in the open state with a nilresolvedAtand an empty timeline, after enforcing the creation invariants — a non-zero Domain, a non-empty title within the rune bound, a member severity, and a setopenedAt.resolved(terminal).Resolvetransitionsopen->resolvedand stampsresolvedAt. It enforces the single-resolve guard: a resolve against an already-resolved incident returnsErrIncidentAlreadyResolved.
Three invariants bound the model:
| Invariant | Rule | Enforced by |
|---|---|---|
| Append-only while open | A timeline event may only be appended while the incident is open; an append against a resolved incident returns ErrIncidentResolved. | Incident.AppendEvent |
| Single resolve | An incident transitions to resolved exactly once; a second resolve returns ErrIncidentAlreadyResolved. | Incident.Resolve |
| Status / resolved-at XOR | status == resolved if and only if resolvedAt != nil. A corrupt row asserting one without the other is rejected on hydrate. | Hydrate (and the persistence CHECK constraint) |
The append-only ordered timeline. The timeline is never edited or pruned in place — it is the record of how an incident progressed. Events are appended in occurred-at order, the persisted trail is read in (incident_id, occurred_at) order, and the aggregate hands out a copy of its internal slice on Timeline() so a caller can never mutate the stored trail. The list projection of an incident drops the timeline entirely; only the single-incident read populates it.
Bounds. A title is non-empty after trimming and at most 200 runes; a timeline-event message is non-empty after trimming and at most 4000 runes. Both bounds cap the persisted columns and are domain constants, not operator knobs.
The application service
The application service in ../../../internal/observability/incidents/services orchestrates the aggregate against the Repository port. It exposes Open, AppendEvent, Resolve, Get, and List. The constructor panics on a nil repository (a composition-root wiring bug), while the clock, audit sink, and logger are nil-tolerated options. Two behaviours are worth pinning:
AppendEventandResolveare read-mutate-persist cycles. Each reads the incident first (surfacingErrIncidentNotFoundwhen absent), drives the mutation through the aggregate so the status guards fire, then persists.- Every operation emits a post-persist audit row through the sink when one is wired (
incident.open/incident.append_event/incident.resolve); a flaky audit backend is logged, never propagated.
The HTTP surface
The five operations live under /v1/domains/{domainId}/incidents and are implemented by the anti-corruption transport package ../../../internal/transport/http/v1/incidents. The transport package re-declares the domain port, the read-model DTOs, and the error sentinels locally and never imports the domain module; the production adapter at the composition root translates the domain Incident / TimelineEvent and their sentinels onto the transport-local shapes.
| Operation | Method | Path |
|---|---|---|
ListIncidents | GET | /v1/domains/{domainId}/incidents |
OpenIncident | POST | /v1/domains/{domainId}/incidents |
GetIncident | GET | /v1/domains/{domainId}/incidents/{incidentId} |
AppendIncidentEvent | POST | /v1/domains/{domainId}/incidents/{incidentId}/events |
ResolveIncident | POST | /v1/domains/{domainId}/incidents/{incidentId}:resolve |
The wire-contract origin is ../../../api/openapi/plexsphere-v1.yaml; this doc is a map of the surface, not a duplicate of the schema.
Cursor pagination. ListIncidents is paginated transport-side over the header projection (the list omits the timeline), so the read stays stable as a Domain's incident set grows.
Error-code taxonomy
Every failure surface carries a stable Problem.code. The closed set this surface emits, with its HTTP status:
| HTTP | Problem.code | Meaning |
|---|---|---|
400 | invalid_domain_id | The addressed Domain id is the zero UUID. |
400 | invalid_incident_id | The addressed incident id is malformed. |
400 | invalid_cursor | The list cursor is malformed. |
400 | invalid_body | The request body is not a valid document. |
400 | incident_invalid | The open-time body failed the aggregate's validation (bad title or severity). |
400 | timeline_event_invalid | The append-time body failed the aggregate's validation (bad kind or message). |
401 | unauthenticated | The request carries no authenticated principal. |
403 | (PermissionDenied) | The caller lacks the required relation on the addressed Domain. |
404 | incident_not_found | No incident with the requested id exists. |
409 | incident_resolved | A timeline append was attempted against a resolved incident. |
409 | incident_already_resolved | A resolve was attempted against an already-resolved incident. |
501 | incidents_not_provisioned | The surface is not wired in this build. |
The transport-local sentinels the production adapter translates the domain errors onto are ErrIncidentNotFound (→ 404), ErrIncidentAlreadyResolved (→ 409), ErrIncidentResolved (→ 409), ErrIncidentInvalid (→ 400), and ErrTimelineEventInvalid (→ 400), declared in ../../../internal/transport/http/v1/incidents/errors.go.
ReBAC posture
Reading an incident (ListIncidents / GetIncident) gates the Domain read relation; mutating one (OpenIncident / AppendIncidentEvent / ResolveIncident) gates the Domain manage relation. The ReBAC object is the canonical domain:<uuid>, and the check runs before any read or mutation.
As with the sibling surfaces, the codebase gates on the schema relations read and manage, while the OpenAPI 403 prose names the operator- facing labels domain-view / domain-edit; this divergence is recorded in a DECISION block in ../../../internal/transport/http/v1/incidents/errors.go.
Audit contract
The surface emits a canonical (subject, relation, object, outcome, correlation_id) audit tuple:
- A denial writes one
permission_deniedrow before the403response is flushed, stamping the missing relation into the row's caveat context. - A successful mutation — open / append / resolve — writes one
grantedrow with the verb-style relation (incident.open/incident.append_event/incident.resolve).
A nil sink degrades silently — the row is dropped while the security gates still fire — and a sink error is logged, never propagated.
Persistence
The context owns two tables, created by migration ../../../internal/platform/db/migrations/0053_incidents.sql:
plexsphere.incidents— one row per incident, keyed on the app-minted UUIDid, FKing the owning DomainON DELETE CASCADE. Theseverityandstatuscolumns carry SQLCHECKconstraints pinning their closed sets (status IN ('open', 'resolved')), and aCHECKencodes the lifecycle XOR:resolved_atis non-NULL exactly whenstatusisresolved, so a half-resolved row is structurally impossible at rest.plexsphere.incident_timeline— one row per timeline event, keyed on the app-minted UUIDid, FKing the owning incidentON DELETE CASCADE. Thekindcolumn carries aCHECKpinningnote/status_change. Rows are only ever appended; a composite(incident_id, occurred_at)index backs both the cascade and the in-order trail read.
The migration's down arm refuses the downgrade with SQLSTATE 0A000: the two tables hold the incident lifecycle record and its append-only audit trail, which a post-incident review depends on.
Cross-references
./alerts.md— the sibling Alert Rule context; a downstream evaluator firing a stored rule may be the trigger that opens an incident../ingest.mdand./routing.md— the ingest and egress halves of the observability pipeline that carry the telemetry surrounding an incident../query.md— the read-only metrics / logs query proxy for investigating an incident.../index.md— the bounded-contexts landing page.../../../internal/observability/incidents— the bounded-context root that pins the ubiquitous language.../../../api/openapi/plexsphere-v1.yaml— the OpenAPI spec the five operations originate from.