Rules

blackbox.rules

21.928s ago

18.15ms

Rule State Error Last Evaluation Evaluation Time
alert: Ssl Certificate Has Expired Soon expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 25 for: 1d labels: severity: critical annotations: summary: SSL сертификат на {{ $labels.instance }} закончится через 25 дней ok 21.929s ago 2.128ms
alert: Ssl Certificate Expired expr: probe_ssl_earliest_cert_expiry - time() <= 0 for: 1d labels: severity: critical annotations: summary: SSL certificate has expired already on {{ $labels.instance }} ok 21.927s ago 1.518ms
alert: ATS down expr: probe_success{job=~"ATS"} == 0 for: 5m labels: severity: critical annotations: summary: '{{ $labels.instance }} АТС не отвечает больше 5 минут' ok 21.926s ago 382.8us
alert: Radioactive Down expr: probe_success{job=~"radioactive"} == 0 for: 5m labels: severity: radioactive annotations: summary: Радиоактивная комната {{ $labels.instance }} не отвечает больше 5 минут ok 21.925s ago 289.5us
alert: Slow Http expr: avg_over_time(probe_http_duration_seconds[5m]) > 5 for: 5m labels: severity: critical annotations: summary: Очень медленная обработка HTTP-запросов на {{ $labels.instance }} ok 21.925s ago 11.25ms
alert: Blackbox Slow Probe expr: avg_over_time(probe_duration_seconds[5m]) > 10 for: 5m labels: severity: critical annotations: summary: 'Blackbox очень долго обрабатывает сайт {{ $labels.instance }}. Время обработки: {{ $value }}' ok 21.914s ago 2.554ms

custom.rules

28.708s ago

23.18ms

Rule State Error Last Evaluation Evaluation Time
alert: Switch Down expr: up{job=~"snmp|dmc1002"} == 0 for: 1m labels: severity: critical annotations: summary: '{{ $labels.instance }} not responding for more than 1 minutes' ok 28.709s ago 8.239ms
alert: Squid Server Down expr: squid_server_up == 0 for: 10m labels: severity: critical annotations: summary: Squid on {{ $labels.instance }} not responding for more than 10 minutes ok 28.701s ago 383.3us
alert: Bind Server Down expr: bind_up == 0 for: 5m labels: severity: critical annotations: summary: '{{ $labels.job }} on {{ $labels.instance }} not responding for more than 5 minutes' ok 28.7s ago 335.5us
alert: Instance Down expr: up{job=~"squid|wpad|etcd|dhcpd6|bind"} == 0 for: 5m labels: severity: critical annotations: summary: '{{ $labels.job }} {{ $labels.instance }} not responding for more than 5 minutes.' ok 28.7s ago 717.1us
alert: Error in dhcpserv config expr: rate(dhcpserv_build_errors[1m]) > 1 for: 2m labels: severity: critical annotations: summary: '{{ $labels.instance }} have too many error in dhcpserv generate config: {{ $value }}' ok 28.7s ago 335.8us
alert: Cisco Netflow Cache High expr: netflow_active_entries:ratio > 0.8 for: 5m labels: severity: ctritical ok 28.7s ago 265.2us
alert: Postfix Queue High expr: postfix_showq_message_size_bytes_count > 200 for: 10m labels: severity: critical annotations: summary: Postfix query overflow. Query exceeded {{ $value }} mails ok 28.7s ago 319.1us
alert: Borg Backup Missing expr: time() - borgbackup_last_modified > 86400 * 8 for: 30m labels: severity: critical annotations: summary: Backup on {{ $labels.instance }} not created last 3 days ok 28.7s ago 516.9us
alert: Omada device down expr: omada_device_uptime_seconds == 0 for: 5m labels: severity: critical annotations: summary: Device {{ $labels.device_type }} {{ $labels.device }} from site {{ $labels.site }} is down ok 28.699s ago 3.195ms
alert: Unifi device down expr: unifipoller_device_uptime_seconds == 0 for: 5m labels: severity: critical annotations: summary: Device {{ $labels.type }} {{ $labels.name }} from site {{ $labels.site_name }} is down ok 28.696s ago 3.69ms
alert: Nginx is down expr: nginx_up == 0 for: 5m labels: severity: critical annotations: summary: Nginx is down on the host {{ $labels.instance }} for more than 5 minutes ok 28.693s ago 369.4us
alert: Nginx not all connections are handled expr: rate(nginx_connections_handled{job="nginx"}[5m]) / rate(nginx_connections_accepted{job="nginx"}[5m]) < 1 for: 3m labels: severity: critical annotations: summary: Nginx does not handle all accept connections on the host {{ $labels.instance }} for more than 3 minutes ok 28.693s ago 782.1us
alert: Nginx high active connections expr: nginx_connections_active{job="nginx"} > 300 for: 5m labels: severity: critical annotations: summary: Большое коллчиество активных подключений в Nginx на {{ $labels.instance }} в течении 5 минут ok 28.692s ago 308.7us
alert: Certificate expired expr: nginx_cert_exporter_file_expired - time() < 86400 * 25 for: 12h labels: severity: critical annotations: summary: Локальный сертификат {{ $labels.name }} на {{ $labels.instance }} просрочиться в течении 25 дней ok 28.692s ago 2.206ms
alert: DDOS expr: (rate(nginx_http_requests_total[5m]) / rate(nginx_http_requests_total[5m] offset 5m) > 5) and rate(nginx_http_requests_total[5m]) > 100 for: 5m labels: severity: critical annotations: summary: '{{ $labels.instance }} находится под DDOS атакой' ok 28.69s ago 1.096ms
alert: Fail2Ban DDOS expr: avg by(instance, jail) (f2b_jail_banned_current) > 20 for: 5m labels: severity: critical annotations: summary: 'Слишком часто срабатывает правило {{ $labels.jail }}: {{ $value }} на сервере {{ $labels.instance }}' ok 28.69s ago 350.2us

mysql.rules

10.671s ago

3.229ms

Rule State Error Last Evaluation Evaluation Time
alert: MysqlDown expr: mysql_up == 0 for: 1m labels: severity: critical annotations: description: MySQL instance is down on {{ $labels.instance }} summary: MySQL down (instance {{ $labels.instance }}) ok 10.671s ago 578.5us
alert: MysqlTooManyConnections expr: avg by(instance) (rate(mysql_global_status_threads_connected[5m])) / avg by(instance) (mysql_global_variables_max_connections) * 100 > 60 for: 5m labels: severity: critical annotations: description: More than 60% of MySQL connections are in use on {{ $labels.instance }} summary: MySQL too many connections (> 60%) (instance {{ $labels.instance }}) ok 10.671s ago 1.024ms
alert: MysqlHighThreadsRunning expr: avg by(instance) (rate(mysql_global_status_threads_running[1m])) / avg by(instance) (mysql_global_variables_max_connections) * 100 > 60 for: 5m labels: severity: critical annotations: description: More than 60% of MySQL connections are in running state on {{ $labels.instance }} summary: MySQL high threads running (instance {{ $labels.instance }}) ok 10.67s ago 656.8us
alert: MysqlSlowQueries expr: increase(mysql_global_status_slow_queries[2m]) > 10 for: 5m labels: severity: critical annotations: description: MySQL server mysql has some new slow query. {{ $value }} summary: MySQL slow queries (instance {{ $labels.instance }}) ok 10.67s ago 436us
alert: MysqlInnodbLogWaits expr: rate(mysql_global_status_innodb_log_waits[15m]) > 10 for: 5m labels: severity: critical annotations: description: MySQL innodb log writes stalling {{ $value }} summary: MySQL InnoDB log waits (instance {{ $labels.instance }}) ok 10.67s ago 506.2us

node.rules

22.988s ago

221.1ms

Rule State Error Last Evaluation Evaluation Time
alert: Node Down expr: up{job=~"node"} == 0 for: 1m labels: severity: critical annotations: summary: Node {{ $labels.instance }} not responding for more than 1 minutes. ok 22.988s ago 1.122ms
alert: Service Down expr: node_systemd_unit_state{name!="dnf-makecache.service",state=~"failed"} == 1 for: 5m labels: severity: critical annotations: summary: Service {{ $labels.name }} is not responding for 5m on {{ $labels.instance }} with status {{ $labels.state }} ok 22.987s ago 80.85ms
alert: Node Low Disk Space expr: (node_filesystem_avail_bytes * 100) / node_filesystem_size_bytes{mountpoint!~".mnt.*"} < 10 and on(instance, device, mountpoint) node_filesystem_readonly == 0 for: 5m labels: severity: critical annotations: summary: '{{ $labels.instance }} low disk space. {{ $value }} percent of the {{ $labels.mountpoint }} is free left' ok 22.907s ago 8.335ms
alert: Host Oom Kill Detected expr: increase(node_vmstat_oom_kill[1m]) > 0 labels: severity: warning annotations: summary: OOM kill detected on {{ $labels.instance }} ok 22.899s ago 368.1us
alert: Node Out Of Memory expr: node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100 < 20 for: 5m labels: severity: critical annotations: summary: 'Node {{ $labels.instance }} memory is filling up (< 20% left). Free now: {{ $value }}%' ok 22.899s ago 1.264ms
alert: Node Memory Fill Up Soon expr: predict_linear(node_memory_MemAvailable_bytes[2h], 1 * 3600) <= 0 for: 2h labels: severity: critical annotations: summary: '{{ $labels.instance }} memory will fiil up soon in 1h' ok 22.897s ago 2.43ms
alert: Node SWAP Out Of Memory expr: node_memory_SwapFree_bytes / node_memory_SwapTotal_bytes * 100 < 20 for: 10m labels: severity: critical annotations: summary: 'Node {{ $labels.instance }} swap memory is filing up (< 20% left). Free now: {{ $value }}%' ok 22.895s ago 2.204ms
alert: Node Memory Under Pressure expr: (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes * 100) < 20 and rate(node_vmstat_pgmajfault[5m]) > 100 for: 5m labels: severity: critical annotations: summary: Node {{ $labels.instance }} is under heavy memory pressure. High rate of major page faults. VALUE = {{ $value }} ok 22.893s ago 2.218ms
alert: Node Unusual Network ThroughputIn expr: sum by(instance) (rate(node_network_receive_bytes_total[30m])) / 1024 / 1024 > 100 for: 15m labels: severity: critical annotations: summary: Host unusual network throughput in {{ $labels.instance }} - {{ $value }} ok 22.891s ago 6.588ms
alert: Node Unusual Network ThroughputOut expr: sum by(instance) (rate(node_network_transmit_bytes_total[30m])) / 1024 / 1024 > 100 for: 15m labels: severity: critical annotations: summary: Host unusual network throughput out {{ $labels.instance }} - {{ $value }} ok 22.885s ago 6.3ms
alert: Node Unusual Disk Read Rate expr: sum by(instance) (rate(node_disk_read_bytes_total[30m])) / 1024 / 1024 > 200 for: 30m labels: severity: critical annotations: summary: Host unusual disk read rate last 30 minutes {{ $labels.instance }} - {{ $value }} mb/s ok 22.879s ago 7.471ms
alert: Node Unusual Disk Write Rate expr: sum by(instance) (rate(node_disk_written_bytes_total[30m])) / 1024 / 1024 > 200 for: 30m labels: severity: critical annotations: summary: Host unusual disk write rate last 30 minutes {{ $labels.instance }} - {{ $value }} mb/s ok 22.872s ago 9.098ms
alert: Node Unusual Disk ReadLatency expr: rate(node_disk_read_time_seconds_total[30m]) / rate(node_disk_reads_completed_total[30m]) > 0.3 and rate(node_disk_reads_completed_total[30m]) > 0 for: 30m labels: severity: critical annotations: summary: Host unusual disk read latency {{ $labels.instance }} - {{ $value }} s ok 22.863s ago 22.25ms
alert: Node Unusual Disk WriteLatency expr: rate(node_disk_write_time_seconds_total[30m]) / rate(node_disk_writes_completed_total[30m]) > 0.15 and rate(node_disk_writes_completed_total[30m]) > 0 for: 30m labels: severity: critical annotations: summary: Host unusual disk write latency {{ $labels.instance }} - {{ $value }} s ok 22.841s ago 22.58ms
alert: Big IOWait expr: avg by(instance) (irate(node_cpu_seconds_total{mode="iowait"}[1m])) * 100 > 15 for: 2m labels: severity: critical annotations: summary: '{{ $labels.instance }} have big iowait query: {{ $value }}%' ok 22.818s ago 9.793ms
alert: Node High Cpu Load expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{job="node",mode="idle"}[15m])) * 100) > 80 for: 15m labels: severity: critical annotations: summary: Host high CPU load {{ $labels.instance }} - {{ $value }}% ok 22.809s ago 14.8ms
alert: Node Physical Component TooHot expr: node_hwmon_temp_celsius{chip!~"nvme.+"} > 75 for: 10m labels: severity: critical annotations: summary: Host physical component too hot (instance {{ $labels.instance }}) {{ $labels.chip }} - {{ $value }}C ok 22.794s ago 6.492ms
alert: Node NVMe Drive TooHot expr: node_hwmon_temp_celsius{chip=~"nvme.+"} > 100 for: 10m labels: severity: critical annotations: summary: Host physical component too hot (instance {{ $labels.instance }}) {{ $labels.chip }} - {{ $value }}C ok 22.788s ago 539.6us
alert: Node Overtemperature Alarm expr: node_hwmon_temp_crit_alarm_celsius == 1 labels: severity: critical annotations: summary: Host node overtemperature alarm (instance {{ $labels.instance }}) - {{ $value }} ok 22.788s ago 3.647ms
alert: Node Raid Array Got Inactive expr: node_md_state{state="inactive"} > 0 for: 10m labels: severity: critical annotations: summary: RAID array {{ $labels.device }} is in degraded state due to one or more disks failures. Number of spare drives is insufficient to fix issue automatically ok 22.784s ago 472.1us
alert: Node Raid Disk Failure expr: node_md_disks{state="failed"} > 0 for: 10m labels: severity: critical annotations: summary: At least one device in RAID array on {{ $labels.instance }} failed. Array {{ $labels.md_device }} needs attention and possibly a disk swap ok 22.784s ago 581us
alert: Node Edac Correctable Errors Detected expr: increase(node_edac_correctable_errors_total[1m]) > 0 for: 10m labels: severity: critical annotations: summary: Host EDAC Correctable Errors detected (instance {{ $labels.instance }}) {{ $value }} ok 22.784s ago 1.321ms
alert: Node Edac Uncorrectable Errors Detected expr: node_edac_uncorrectable_errors_total > 0 labels: severity: critical annotations: summary: Host EDAC Uncorrectable Errors detected (instance {{ $labels.instance }}) {{ $value }} ok 22.782s ago 1.048ms
alert: Relocated Sectors expr: smart_attribute_raw{attribute_id=~"197|198"} > 0 for: 3h labels: severity: critical annotations: summary: '{{ $labels.instance }} имеет {{ $value }} переназначенных секторов. {{ $labels.device_model_family }} {{ $labels.device_model_name }} под именем {{ $labels.device_name }} с серийным номером {{ $labels.device_serial_number }}' ok 22.782s ago 3.69ms
alert: Node Clock Not Synchronising expr: min_over_time(node_timex_sync_status[1m]) == 0 and node_timex_maxerror_seconds >= 16 for: 2m labels: severity: critical annotations: summary: Clock not synchronising. Ensure NTP is configured on this host {{ $labels.instance }} ok 22.778s ago 1.194ms
alert: HostClockSkew expr: (node_timex_offset_seconds > 0.05 and deriv(node_timex_offset_seconds[5m]) >= 0) or (node_timex_offset_seconds < -0.05 and deriv(node_timex_offset_seconds[5m]) <= 0) for: 2m labels: severity: warning annotations: summary: Clock skew detected. Clock is out of sync. Ensure NTP is configured correctly on this host {{ $labels.instance }} ok 22.777s ago 2.338ms
alert: Node Disk Is Missing expr: node_btrfs_device_size_bytes == 0 for: 1m labels: severity: critical annotations: summary: '{{ $labels.instance }} has a disk {{ $labels.device }} that is not in the file system' ok 22.775s ago 2.011ms

self.rules

27.161s ago

5.295ms

Rule State Error Last Evaluation Evaluation Time
alert: Prometheus Configuration Failure expr: prometheus_config_last_reload_successful != 1 labels: severity: critical annotations: summary: Prometheus configuration reload failure (instance {{ $labels.instance }}) ok 27.161s ago 427.9us
alert: AlertManager Configuration Failure expr: alertmanager_config_last_reload_successful != 1 labels: severity: critical annotations: summary: AlertManager configuration reload failure (instance {{ $labels.iinstance }}) ok 27.161s ago 295.7us
alert: Prometheus NotConnected To Alertmanager expr: prometheus_notifications_alertmanagers_discovered < 1 labels: severity: critical annotations: summary: Prometheus not connected to alertmanager (instance {{ $labels.instance }}) ok 27.161s ago 304.4us
alert: Prometheus Rule Evaluation Failures expr: increase(prometheus_rule_evaluation_failures_total[3m]) > 0 labels: severity: critical annotations: summary: Prometheus rule evaluation failures (instance {{ $labels.instance }}) ok 27.161s ago 556.4us
alert: Prometheus Template Text Expansion Failures expr: increase(prometheus_template_text_expansion_failures_total[3m]) > 0 labels: severity: critical annotations: summary: Prometheus template text expansion failures (instance {{ $labels.instance }}) ok 27.16s ago 356.3us
alert: Prometheus Rule Evaluation Slow expr: prometheus_rule_group_last_duration_seconds > prometheus_rule_group_interval_seconds for: 5m labels: severity: critical annotations: summary: Prometheus rule evaluation slow (instance {{ $labels.instance }}) ok 27.16s ago 720.3us
alert: Prometheus Target Empty expr: prometheus_sd_discovered_targets == 0 labels: severity: critical annotations: description: Prometheus has no target in service discovery summary: Prometheus target empty (instance {{ $labels.instance }}) ok 27.16s ago 1.662ms
alert: Prometheus Large Scrape expr: increase(prometheus_target_scrapes_exceeded_sample_limit_total[10m]) > 10 for: 5m labels: severity: critical annotations: description: Prometheus has many scrapes that exceed the sample limit summary: Prometheus large scrape (instance {{ $labels.instance }}) ok 27.158s ago 556us
alert: Prometheus Target Scrape Duplicate expr: increase(prometheus_target_scrapes_sample_duplicate_timestamp_total[5m]) > 0 labels: severity: critical annotations: description: Prometheus has many samples rejected due to duplicate timestamps but different values summary: Prometheus target scrape duplicate (instance {{ $labels.instance }}) ok 27.158s ago 376.5us

ups.rules

17.088s ago

8.866ms

Rule State Error Last Evaluation Evaluation Time
alert: UPS not response expr: up{job=~"apcsnmp|upssnmp"} == 0 for: 5m labels: severity: critical annotations: summary: UPS {{ $labels.instance }} not responding for more than 5 minutes. ok 17.088s ago 2.391ms
alert: Server Room In FIRE! expr: iemStatusProbeCurrentTemp{instance="Symmetra.ups"} > 24 for: 10m labels: severity: critical annotations: summary: 'Server room too hot: {{ $value }}C' ok 17.086s ago 202.2us
alert: There is no ELECTRICITY in the Server Room! expr: upsBasicInputPhase{instance="Symmetra.ups"} < 3 for: 5m labels: severity: critical annotations: summary: 'Working phases in Symmetra at the moment: {{ $value }}' ok 17.086s ago 314.1us
alert: APC UPS too hot! expr: iemStatusProbeCurrentTemp > 30 for: 5m labels: severity: critical annotations: summary: Too hot in {{ $labels.instance }}. Temperatur is {{ $value }}C ok 17.086s ago 278.5us
alert: UPS too hot! expr: upsBatteryTemperature > 45 for: 5m labels: severity: critical annotations: summary: Too hot in {{ $labels.instance }}. Temperatur is {{ $value }}C ok 17.086s ago 1.383ms
alert: APC UPS low Input voltage expr: upsAdvInputLineVoltage < 200 for: 5m labels: severity: critical annotations: summary: 'APC UPS {{ $labels.instance }} has low voltage: {{ $value }}V' ok 17.085s ago 265.5us
alert: UPS low Input voltage expr: upsInputVoltage < 200 for: 5m labels: severity: critical annotations: summary: 'UPS {{ $labels.instance }} has low input voltage: {{ $value }}V' ok 17.085s ago 781.7us
alert: UPS low Output voltage expr: upsOutputVoltage{upsOutputLineIndex="1"} < 200 for: 5m labels: severity: critical annotations: summary: 'UPS {{ $labels.instance }} has low output voltage: {{ $value }}V' ok 17.084s ago 867.2us
alert: UPS big load expr: upsOutputPercentLoad{upsOutputLineIndex="1"} > 75 for: 5m labels: severity: critical annotations: summary: 'UPS {{ $labels.instance }} has big load: {{ $value }}%' ok 17.083s ago 762.5us
alert: Low APC UPS battery capacity expr: upsAdvBatteryCapacity < 75 for: 5m labels: severity: critical annotations: summary: '{{ $labels.instance }} has low battery capacity: {{ $value }}%' ok 17.083s ago 367.2us
alert: Low UPS battery capacity expr: upsEstimatedChargeRemaining < 75 for: 5m labels: severity: critical annotations: summary: '{{ $labels.instance }} has low battery capacity: {{ $value }}%' ok 17.083s ago 1.199ms

windows.rules

22.287s ago

51.39ms

Rule State Error Last Evaluation Evaluation Time
alert: Windows Server Collector Error expr: windows_exporter_collector_success{job="node"} == 0 labels: severity: critical annotations: summary: Windows Server collector Error on {{ $labels.instance }}. Collector {{ $labels.collector }} was not successful ok 22.287s ago 3.115ms
alert: Windows Server Service Status expr: windows_service_status{job="node",status="ok"} != 1 for: 1m labels: severity: critical annotations: summary: Windows Server service Status on {{ $labels.instance }}. {{ $labels.name }} state is not OK ok 22.285s ago 41.11ms
alert: Windows Server Cpu Usage expr: 100 - (avg by(instance) (rate(windows_cpu_time_total{job="node",mode="idle"}[10m])) * 100) > 80 for: 10m labels: severity: critical annotations: summary: Windows Server CPU Usage on {{ $labels.instance }} is more than 80% ok 22.244s ago 4.669ms
alert: Windows Server Memory Usage expr: 100 - ((windows_os_physical_memory_free_bytes{job="node"} / windows_cs_physical_memory_bytes{job="node"}) * 100) > 80 for: 5m labels: severity: critical annotations: summary: Windows Server memory Usage on {{ $labels.instance }} is more than 80% ok 22.239s ago 855us
alert: Windows Server Disk Space Usage expr: 100 - 100 * ((windows_logical_disk_free_bytes{job="node"} / 1024 / 1024) / ((windows_logical_disk_size_bytes{job="node"} / 1024 / 1024) > 20000)) > 90 for: 10m labels: severity: critical annotations: summary: Windows Server disk Space Usage on {{ $labels.instance }} is more than 90% on {{ $labels.volume }} ok 22.239s ago 1.605ms