Alarms
The Alarms page shows all system and action alarms with their lifecycle state, severity, and notification history. You can acknowledge, resolve, and bulk-manage alarms from this view.
Alarms are raised by the alarm engine when a monitored condition exceeds a threshold or when the system detects an issue. Each alarm tracks its lifecycle from firing through acknowledgement to resolution, with a full state change history.
Alarm Lifecycle
Section titled “Alarm Lifecycle”Every alarm follows a three-state lifecycle: Firing, Acknowledged, and Resolved.
ASCII fallback
+-----------+ condition | | condition triggers --> | FIRING | re-triggers | | (updates last_fired_at) +-----+-----+ | acknowledge | +-----v-----+ | | |ACKNOWLEDGED| ---> ack expires ---> FIRING | | +-----+-----+ | resolve | +-----v-----+ | | | RESOLVED | | | +-----------+Firing
Section titled “Firing”The alarm is active. The triggering condition is still present. If the same source key fires again while already firing, the existing alarm is updated (description, metric value, last_fired_at) rather than creating a duplicate.
Acknowledged
Section titled “Acknowledged”An operator has seen the alarm and acknowledged it. Acknowledgement can include an optional duration (e.g., “acknowledge for 2 hours”). When the duration expires, the alarm returns to Firing if the condition persists.
You can also un-acknowledge an alarm to return it to the Firing state if you need to re-alert on it.
Resolved
Section titled “Resolved”The alarm is no longer active. Resolution happens in two ways:
- Auto-resolve: The alarm engine calls
Resolve(sourceKey)when the triggering condition clears - Manual resolve: An operator manually resolves the alarm from the GUI
Resolved alarms remain visible in the alarm list with filters but do not generate new notifications.
Severity Levels
Section titled “Severity Levels”Each alarm carries a severity level that determines its visual priority and notification routing.
| Severity | Visual | Meaning | Example |
|---|---|---|---|
| Critical | Red badge | Immediate attention required, service impact | ClickHouse unreachable, disk space critical, license expired |
| Warning | Orange badge | Needs investigation, potential impact | ClickHouse slow queries, disk space low, license expiring |
| OK | Green badge | Condition cleared, informational | Used when a previously alarming metric returns to normal |
| Unknown | Gray badge | Unable to determine state | Metric collection failed |
Severity is set by the component that emits the alarm. System health checks use predefined thresholds; LLM-driven action alarms derive severity from the risk score.
If Critical / Warning / OK / Unknown looks familiar — yes, it’s a tip of the cap to Nagios. Some classics never need replacing.
Alarm Channels
Section titled “Alarm Channels”The channel field indicates the source subsystem that raised the alarm.
| Channel | Source | Examples |
|---|---|---|
| system | System health monitoring | ClickHouse connectivity, disk space, license status |
| action | Enforcement action events | Device blocked, device throttled, action expired |
Notification rules can filter by channel, allowing you to route system alarms to infrastructure teams and action alarms to security teams.
Predefined System Alarm Source Keys
Section titled “Predefined System Alarm Source Keys”System alarms use structured source keys for programmatic handling:
| Source Key | Description |
|---|---|
system:clickhouse:unreachable | ClickHouse database is not responding |
system:clickhouse:slow | ClickHouse queries are running slowly |
system:clickhouse:storage_warning | ClickHouse data directory low on space |
system:clickhouse:storage_critical | ClickHouse data directory critically low |
system:disk:space_warning | Disk space below warning threshold |
system:disk:space_critical | Disk space below critical threshold |
system:license:expiring | License will expire soon |
system:license:expired | License has expired |
Action Alarm Source Key Patterns
Section titled “Action Alarm Source Key Patterns”Action alarms are built dynamically from the action type and the target device or rule. They follow these shapes:
| Pattern | Raised when |
|---|---|
action:block:<MAC-or-DUID> | A device is blocked |
action:deny:<MAC-or-DUID> | A device is denied |
action:throttle:<MAC-or-DUID> | A device is throttled |
action:automation:<rule-name> | An automation rule triggered actions |
Allow, monitor, analyze, and normal actions do not raise alarms.
Setting Source Key Values
Section titled “Setting Source Key Values”Source keys are not user-configurable — they are assigned by the system that raises the alarm and used internally to deduplicate repeated firings. You cannot rename them.
You can match against them in notification rules. The source_key_pattern field on a notification rule accepts SQL LIKE patterns:
%matches any sequence of characters_matches a single character%on its own matches everything (the default)
Examples:
system:disk:%— all disk alarmsaction:block:%— every device blocksystem:%:criticalwould not work, because the severity is not part of the source key — use themin_severityfield instead
Alarm List View
Section titled “Alarm List View”The main alarms table displays all alarms with filtering, sorting, and pagination.
Columns
Section titled “Columns”| Column | Description |
|---|---|
| Severity | Color-coded badge (critical, warning, ok, unknown) |
| State | Current lifecycle state (firing, acknowledged, resolved) |
| Channel | Source channel (system or action) |
| Description | Human-readable summary of the alarm condition |
| Device | MAC address or DUID if the alarm is device-specific (null for system alarms) |
| Metric Value | The measured value that triggered the alarm |
| Threshold | The threshold that was exceeded |
| First Fired | When the alarm first entered firing state |
| Last Fired | Most recent re-firing time |
| Acknowledged By | Username of the operator who acknowledged (if applicable) |
Individual Alarm Actions
Section titled “Individual Alarm Actions”Click an alarm row to expand it and access per-alarm actions.
Acknowledge
Section titled “Acknowledge”Transitions the alarm from Firing to Acknowledged. Optionally set a duration (in seconds). When the duration expires, the alarm returns to Firing if not resolved.
Use acknowledgement when you are aware of an issue and working on it. This prevents the alarm from generating repeat notifications.
Un-Acknowledge
Section titled “Un-Acknowledge”Returns an acknowledged alarm to the Firing state. Use this when you need the alarm to re-trigger notifications (e.g., handoff to another operator).
Resolve
Section titled “Resolve”Manually resolves the alarm regardless of the current metric state. Use this when you have addressed the root cause and want to clear the alarm.
View History
Section titled “View History”Opens the state change timeline for a single alarm, showing every transition with timestamps and who performed each action.
Bulk Actions
Section titled “Bulk Actions”Select multiple alarms to acknowledge or resolve them in a single operation.
Bulk Acknowledge
Section titled “Bulk Acknowledge”- Select alarms using the checkboxes in the first column
- Click “Acknowledge Selected” in the toolbar
- Optionally set an acknowledgement duration
- Confirm the operation
The system processes each alarm individually and reports success/failure counts. If some alarms cannot be acknowledged (e.g., already resolved), they appear in the failure list with reasons.
A bulk-resolve endpoint is not currently wired —
POST /api/alarms/bulk-acknowledgeis the only bulk operation available. To resolve a set of alarms in one go, acknowledge them in bulk first; individual alarms then auto-resolve when their condition clears, or you can resolve them one by one.
Bell Icon and Notifications
Section titled “Bell Icon and Notifications”The bell icon in the header shows the count of unread notifications and provides quick access to recent alerts.
Unread Count
Section titled “Unread Count”The bell badge displays the number of unread notifications. The count is fetched from GET /api/notifications/bell/count which provides a total count and severity breakdown.
Notification Dropdown
Section titled “Notification Dropdown”Click the bell to open a dropdown showing the 20 most recent notifications (newest first). Each entry shows:
- Severity badge (color-coded)
- Description text
- Timestamp
- Read/unread indicator
Mark as Read
Section titled “Mark as Read”- Single: Click a notification to mark it as read
- Mark All Read: Click “Mark All Read” at the bottom of the dropdown to clear the unread count
Notifications are stored in the bell_notifications table and persist across sessions.
Notification Rules
Section titled “Notification Rules”Notification rules control which alarms generate bell notifications or external deliveries.
Creating a Rule
Section titled “Creating a Rule”Navigate to Settings > Notifications to manage notification rules. Each rule specifies:
| Field | Description | Options |
|---|---|---|
| Name | Human-readable rule name | Free text |
| Channel Match | Which alarm channels to match | system, action, or any |
| Min Severity | Minimum severity to trigger | ok, warning, critical |
| Source Key Pattern | SQL LIKE pattern for source key | % (all), system:%, system:clickhouse:% |
| Delivery Channel | How to deliver the notification | bell (in-app) or vector (external via Vector pipeline) |
| Notify On | When to send | fired (alarm fires), resolved (alarm resolves), all (both) |
Example Rules
Section titled “Example Rules”Route all critical alarms to bell notifications:
- Channel Match:
any - Min Severity:
critical - Source Key Pattern:
% - Delivery Channel:
bell - Notify On:
fired
Send ClickHouse alerts to external monitoring via Vector:
- Channel Match:
system - Min Severity:
warning - Source Key Pattern:
system:clickhouse:% - Delivery Channel:
vector - Notify On:
all
Managing Rules
Section titled “Managing Rules”- Enable/Disable: Toggle a rule without deleting it. Disabled rules are skipped during dispatch
- Edit: Update any field of an existing rule. Changes take effect immediately after the dispatcher reloads
- Delete: Permanently remove a rule
Troubleshooting
Section titled “Troubleshooting”No alarms showing? Check that the alarm engine is enabled in your configuration. Verify ClickHouse connectivity — the engine stores and queries alarms from the
alarmstable. If the system just started, health check alarms may take one evaluation cycle (typically 60 seconds) to appear.
Alarms firing but no bell notifications? You need at least one notification rule configured with delivery channel
bell. Navigate to Settings > Notifications and create a rule matching the alarm channel and severity.
Acknowledged alarm keeps coming back? Acknowledgement with a duration automatically reverts to Firing when the duration expires. Either resolve the underlying issue, acknowledge without a duration, or manually resolve the alarm.
Bell count does not decrease after clicking? Ensure you are clicking the notification entry (not just opening the dropdown). The “Mark Read” action requires an explicit click on each notification or use “Mark All Read.”