Skip to content

Alarms

The Alarms page shows all system and action alarms with their lifecycle state, severity, and notification history. You can acknowledge, resolve, and bulk-manage alarms from this view.

Alarms are raised by the alarm engine when a monitored condition exceeds a threshold or when the system detects an issue. Each alarm tracks its lifecycle from firing through acknowledgement to resolution, with a full state change history.


Every alarm follows a three-state lifecycle: Firing, Acknowledged, and Resolved.

ASCII fallback
+-----------+
condition | | condition
triggers --> | FIRING | re-triggers
| | (updates last_fired_at)
+-----+-----+
|
acknowledge
|
+-----v-----+
| |
|ACKNOWLEDGED| ---> ack expires ---> FIRING
| |
+-----+-----+
|
resolve
|
+-----v-----+
| |
| RESOLVED |
| |
+-----------+

The alarm is active. The triggering condition is still present. If the same source key fires again while already firing, the existing alarm is updated (description, metric value, last_fired_at) rather than creating a duplicate.

An operator has seen the alarm and acknowledged it. Acknowledgement can include an optional duration (e.g., “acknowledge for 2 hours”). When the duration expires, the alarm returns to Firing if the condition persists.

You can also un-acknowledge an alarm to return it to the Firing state if you need to re-alert on it.

The alarm is no longer active. Resolution happens in two ways:

  • Auto-resolve: The alarm engine calls Resolve(sourceKey) when the triggering condition clears
  • Manual resolve: An operator manually resolves the alarm from the GUI

Resolved alarms remain visible in the alarm list with filters but do not generate new notifications.


Each alarm carries a severity level that determines its visual priority and notification routing.

SeverityVisualMeaningExample
CriticalRed badgeImmediate attention required, service impactClickHouse unreachable, disk space critical, license expired
WarningOrange badgeNeeds investigation, potential impactClickHouse slow queries, disk space low, license expiring
OKGreen badgeCondition cleared, informationalUsed when a previously alarming metric returns to normal
UnknownGray badgeUnable to determine stateMetric collection failed

Severity is set by the component that emits the alarm. System health checks use predefined thresholds; LLM-driven action alarms derive severity from the risk score.

If Critical / Warning / OK / Unknown looks familiar — yes, it’s a tip of the cap to Nagios. Some classics never need replacing.


The channel field indicates the source subsystem that raised the alarm.

ChannelSourceExamples
systemSystem health monitoringClickHouse connectivity, disk space, license status
actionEnforcement action eventsDevice blocked, device throttled, action expired

Notification rules can filter by channel, allowing you to route system alarms to infrastructure teams and action alarms to security teams.

System alarms use structured source keys for programmatic handling:

Source KeyDescription
system:clickhouse:unreachableClickHouse database is not responding
system:clickhouse:slowClickHouse queries are running slowly
system:clickhouse:storage_warningClickHouse data directory low on space
system:clickhouse:storage_criticalClickHouse data directory critically low
system:disk:space_warningDisk space below warning threshold
system:disk:space_criticalDisk space below critical threshold
system:license:expiringLicense will expire soon
system:license:expiredLicense has expired

Action alarms are built dynamically from the action type and the target device or rule. They follow these shapes:

PatternRaised when
action:block:<MAC-or-DUID>A device is blocked
action:deny:<MAC-or-DUID>A device is denied
action:throttle:<MAC-or-DUID>A device is throttled
action:automation:<rule-name>An automation rule triggered actions

Allow, monitor, analyze, and normal actions do not raise alarms.

Source keys are not user-configurable — they are assigned by the system that raises the alarm and used internally to deduplicate repeated firings. You cannot rename them.

You can match against them in notification rules. The source_key_pattern field on a notification rule accepts SQL LIKE patterns:

  • % matches any sequence of characters
  • _ matches a single character
  • % on its own matches everything (the default)

Examples:

  • system:disk:% — all disk alarms
  • action:block:% — every device block
  • system:%:critical would not work, because the severity is not part of the source key — use the min_severity field instead

The main alarms table displays all alarms with filtering, sorting, and pagination.

ColumnDescription
SeverityColor-coded badge (critical, warning, ok, unknown)
StateCurrent lifecycle state (firing, acknowledged, resolved)
ChannelSource channel (system or action)
DescriptionHuman-readable summary of the alarm condition
DeviceMAC address or DUID if the alarm is device-specific (null for system alarms)
Metric ValueThe measured value that triggered the alarm
ThresholdThe threshold that was exceeded
First FiredWhen the alarm first entered firing state
Last FiredMost recent re-firing time
Acknowledged ByUsername of the operator who acknowledged (if applicable)

Click an alarm row to expand it and access per-alarm actions.

Transitions the alarm from Firing to Acknowledged. Optionally set a duration (in seconds). When the duration expires, the alarm returns to Firing if not resolved.

Use acknowledgement when you are aware of an issue and working on it. This prevents the alarm from generating repeat notifications.

Returns an acknowledged alarm to the Firing state. Use this when you need the alarm to re-trigger notifications (e.g., handoff to another operator).

Manually resolves the alarm regardless of the current metric state. Use this when you have addressed the root cause and want to clear the alarm.

Opens the state change timeline for a single alarm, showing every transition with timestamps and who performed each action.


Select multiple alarms to acknowledge or resolve them in a single operation.

  1. Select alarms using the checkboxes in the first column
  2. Click “Acknowledge Selected” in the toolbar
  3. Optionally set an acknowledgement duration
  4. Confirm the operation

The system processes each alarm individually and reports success/failure counts. If some alarms cannot be acknowledged (e.g., already resolved), they appear in the failure list with reasons.

A bulk-resolve endpoint is not currently wired — POST /api/alarms/bulk-acknowledge is the only bulk operation available. To resolve a set of alarms in one go, acknowledge them in bulk first; individual alarms then auto-resolve when their condition clears, or you can resolve them one by one.


The bell icon in the header shows the count of unread notifications and provides quick access to recent alerts.

The bell badge displays the number of unread notifications. The count is fetched from GET /api/notifications/bell/count which provides a total count and severity breakdown.

Click the bell to open a dropdown showing the 20 most recent notifications (newest first). Each entry shows:

  • Severity badge (color-coded)
  • Description text
  • Timestamp
  • Read/unread indicator
  • Single: Click a notification to mark it as read
  • Mark All Read: Click “Mark All Read” at the bottom of the dropdown to clear the unread count

Notifications are stored in the bell_notifications table and persist across sessions.


Notification rules control which alarms generate bell notifications or external deliveries.

Navigate to Settings > Notifications to manage notification rules. Each rule specifies:

FieldDescriptionOptions
NameHuman-readable rule nameFree text
Channel MatchWhich alarm channels to matchsystem, action, or any
Min SeverityMinimum severity to triggerok, warning, critical
Source Key PatternSQL LIKE pattern for source key% (all), system:%, system:clickhouse:%
Delivery ChannelHow to deliver the notificationbell (in-app) or vector (external via Vector pipeline)
Notify OnWhen to sendfired (alarm fires), resolved (alarm resolves), all (both)

Route all critical alarms to bell notifications:

  • Channel Match: any
  • Min Severity: critical
  • Source Key Pattern: %
  • Delivery Channel: bell
  • Notify On: fired

Send ClickHouse alerts to external monitoring via Vector:

  • Channel Match: system
  • Min Severity: warning
  • Source Key Pattern: system:clickhouse:%
  • Delivery Channel: vector
  • Notify On: all
  • Enable/Disable: Toggle a rule without deleting it. Disabled rules are skipped during dispatch
  • Edit: Update any field of an existing rule. Changes take effect immediately after the dispatcher reloads
  • Delete: Permanently remove a rule

No alarms showing? Check that the alarm engine is enabled in your configuration. Verify ClickHouse connectivity — the engine stores and queries alarms from the alarms table. If the system just started, health check alarms may take one evaluation cycle (typically 60 seconds) to appear.

Alarms firing but no bell notifications? You need at least one notification rule configured with delivery channel bell. Navigate to Settings > Notifications and create a rule matching the alarm channel and severity.

Acknowledged alarm keeps coming back? Acknowledgement with a duration automatically reverts to Firing when the duration expires. Either resolve the underlying issue, acknowledge without a duration, or manually resolve the alarm.

Bell count does not decrease after clicking? Ensure you are clicking the notification entry (not just opening the dropdown). The “Mark Read” action requires an explicit click on each notification or use “Mark All Read.”