Alarms

The Alarms page shows all system and action alarms with their lifecycle state, severity, and notification history. You can acknowledge, resolve, and bulk-manage alarms from this view.

Alarms are raised by the alarm engine when a monitored condition exceeds a threshold or when the system detects an issue. Each alarm tracks its lifecycle from firing through acknowledgement to resolution, with a full state change history.

Alarm Lifecycle

Every alarm follows a three-state lifecycle: Firing, Acknowledged, and Resolved.

ASCII fallback

                 +-----------+
    condition    |           |  condition
    triggers --> |  FIRING   |  re-triggers
                 |           | (updates last_fired_at)
                 +-----+-----+
                       |
                  acknowledge
                       |
                 +-----v-----+
                 |           |
                 |ACKNOWLEDGED| ---> ack expires ---> FIRING
                 |           |
                 +-----+-----+
                       |
                   resolve
                       |
                 +-----v-----+
                 |           |
                 | RESOLVED  |
                 |           |
                 +-----------+

Firing

The alarm is active. The triggering condition is still present. If the same source key fires again while already firing, the existing alarm is updated (description, metric value, last_fired_at) rather than creating a duplicate.

Acknowledged

An operator has seen the alarm and acknowledged it. Acknowledgement can include an optional duration (e.g., “acknowledge for 2 hours”). When the duration expires, the alarm returns to Firing if the condition persists.

You can also un-acknowledge an alarm to return it to the Firing state if you need to re-alert on it.

Resolved

The alarm is no longer active. Resolution happens in two ways:

Auto-resolve: The alarm engine calls Resolve(sourceKey) when the triggering condition clears
Manual resolve: An operator manually resolves the alarm from the GUI

Resolved alarms remain visible in the alarm list with filters but do not generate new notifications.

Severity Levels

Each alarm carries a severity level that determines its visual priority and notification routing.

Severity	Visual	Meaning	Example
Critical	Red badge	Immediate attention required, service impact	ClickHouse unreachable, disk space critical, license expired
Warning	Orange badge	Needs investigation, potential impact	ClickHouse slow queries, disk space low, license expiring
OK	Green badge	Condition cleared, informational	Used when a previously alarming metric returns to normal
Unknown	Gray badge	Unable to determine state	Metric collection failed

Severity is set by the component that emits the alarm. System health checks use predefined thresholds; LLM-driven action alarms derive severity from the risk score.

If Critical / Warning / OK / Unknown looks familiar — yes, it’s a tip of the cap to Nagios. Some classics never need replacing.

Alarm Channels

The channel field indicates the source subsystem that raised the alarm.

Channel	Source	Examples
system	System health monitoring	ClickHouse connectivity, disk space, license status
action	Enforcement action events	Device blocked, device throttled, action expired

Notification rules can filter by channel, allowing you to route system alarms to infrastructure teams and action alarms to security teams.

Predefined System Alarm Source Keys

System alarms use structured source keys for programmatic handling:

Source Key	Description
`system:clickhouse:unreachable`	ClickHouse database is not responding
`system:clickhouse:slow`	ClickHouse queries are running slowly
`system:clickhouse:storage_warning`	ClickHouse data directory low on space
`system:clickhouse:storage_critical`	ClickHouse data directory critically low
`system:disk:space_warning`	Disk space below warning threshold
`system:disk:space_critical`	Disk space below critical threshold
`system:license:expiring`	License will expire soon
`system:license:expired`	License has expired

Action Alarm Source Key Patterns

Action alarms are built dynamically from the action type and the target device or rule. They follow these shapes:

Pattern	Raised when
`action:block:<MAC-or-DUID>`	A device is blocked
`action:deny:<MAC-or-DUID>`	A device is denied
`action:throttle:<MAC-or-DUID>`	A device is throttled
`action:automation:<rule-name>`	An automation rule triggered actions

Allow, monitor, analyze, and normal actions do not raise alarms.

Setting Source Key Values

Source keys are not user-configurable — they are assigned by the system that raises the alarm and used internally to deduplicate repeated firings. You cannot rename them.

You can match against them in notification rules. The source_key_pattern field on a notification rule accepts SQL LIKE patterns:

% matches any sequence of characters
_ matches a single character
% on its own matches everything (the default)

Examples:

system:disk:% — all disk alarms
action:block:% — every device block
system:%:critical would not work, because the severity is not part of the source key — use the min_severity field instead

Alarm List View

The main alarms table displays all alarms with filtering, sorting, and pagination.

Columns

Column	Description
Severity	Color-coded badge (critical, warning, ok, unknown)
State	Current lifecycle state (firing, acknowledged, resolved)
Channel	Source channel (system or action)
Description	Human-readable summary of the alarm condition
Device	MAC address or DUID if the alarm is device-specific (null for system alarms)
Metric Value	The measured value that triggered the alarm
Threshold	The threshold that was exceeded
First Fired	When the alarm first entered firing state
Last Fired	Most recent re-firing time
Acknowledged By	Username of the operator who acknowledged (if applicable)

Individual Alarm Actions

Click an alarm row to expand it and access per-alarm actions.

Acknowledge

Transitions the alarm from Firing to Acknowledged. Optionally set a duration (in seconds). When the duration expires, the alarm returns to Firing if not resolved.

Use acknowledgement when you are aware of an issue and working on it. This prevents the alarm from generating repeat notifications.

Un-Acknowledge

Returns an acknowledged alarm to the Firing state. Use this when you need the alarm to re-trigger notifications (e.g., handoff to another operator).

Resolve

Manually resolves the alarm regardless of the current metric state. Use this when you have addressed the root cause and want to clear the alarm.

View History

Opens the state change timeline for a single alarm, showing every transition with timestamps and who performed each action.

Bulk Actions

Select multiple alarms to acknowledge or resolve them in a single operation.

Bulk Acknowledge

Select alarms using the checkboxes in the first column
Click “Acknowledge Selected” in the toolbar
Optionally set an acknowledgement duration
Confirm the operation

The system processes each alarm individually and reports success/failure counts. If some alarms cannot be acknowledged (e.g., already resolved), they appear in the failure list with reasons.

A bulk-resolve endpoint is not currently wired — POST /api/alarms/bulk-acknowledge is the only bulk operation available. To resolve a set of alarms in one go, acknowledge them in bulk first; individual alarms then auto-resolve when their condition clears, or you can resolve them one by one.

Bell Icon and Notifications

The bell icon in the header shows the count of unread notifications and provides quick access to recent alerts.

Unread Count

The bell badge displays the number of unread notifications. The count is fetched from GET /api/notifications/bell/count which provides a total count and severity breakdown.

Click the bell to open a dropdown showing the 20 most recent notifications (newest first). Each entry shows:

Severity badge (color-coded)
Description text
Timestamp
Read/unread indicator

Mark as Read

Single: Click a notification to mark it as read
Mark All Read: Click “Mark All Read” at the bottom of the dropdown to clear the unread count

Notifications are stored in the bell_notifications table and persist across sessions.

Notification Rules

Notification rules control which alarms generate bell notifications or external deliveries.

Creating a Rule

Navigate to Settings > Notifications to manage notification rules. Each rule specifies:

Field	Description	Options
Name	Human-readable rule name	Free text
Channel Match	Which alarm channels to match	`system`, `action`, or `any`
Min Severity	Minimum severity to trigger	`ok`, `warning`, `critical`
Source Key Pattern	SQL LIKE pattern for source key	`%` (all), `system:%`, `system:clickhouse:%`
Delivery Channel	How to deliver the notification	`bell` (in-app) or `vector` (external via Vector pipeline)
Notify On	When to send	`fired` (alarm fires), `resolved` (alarm resolves), `all` (both)

Example Rules

Route all critical alarms to bell notifications:

Channel Match: any
Min Severity: critical
Source Key Pattern: %
Delivery Channel: bell
Notify On: fired

Send ClickHouse alerts to external monitoring via Vector:

Channel Match: system
Min Severity: warning
Source Key Pattern: system:clickhouse:%
Delivery Channel: vector
Notify On: all

Managing Rules

Enable/Disable: Toggle a rule without deleting it. Disabled rules are skipped during dispatch
Edit: Update any field of an existing rule. Changes take effect immediately after the dispatcher reloads
Delete: Permanently remove a rule

Troubleshooting

No alarms showing? Check that the alarm engine is enabled in your configuration. Verify ClickHouse connectivity — the engine stores and queries alarms from the alarms table. If the system just started, health check alarms may take one evaluation cycle (typically 60 seconds) to appear.

Alarms firing but no bell notifications? You need at least one notification rule configured with delivery channel bell. Navigate to Settings > Notifications and create a rule matching the alarm channel and severity.

Acknowledged alarm keeps coming back? Acknowledgement with a duration automatically reverts to Firing when the duration expires. Either resolve the underlying issue, acknowledge without a duration, or manually resolve the alarm.

Bell count does not decrease after clicking? Ensure you are clicking the notification entry (not just opening the dropdown). The “Mark Read” action requires an explicit click on each notification or use “Mark All Read.”