文章目录
- 一、采集器安装
- 1. Categraf简介
- 2. Categraf部署
- 3. 测试服务器部署
- 4. 系统监控插件
- 5. 显卡监控插件
- 6. 服务监控插件
- 二、监控仪表盘
- 1. 机器列表
- 2. 系统监控
- 3. 服务监控
- 三、告警配置
- 1. 邮件通知
- 2. 告警规则
- 3. 告警自愈
一、采集器安装
1. Categraf简介
Categraf 需要部署到所有需要监控的机器上,因为采集 CPU、内存、进程等指标需要读取操作系统里的信息。
Categraf 推送监控数据到服务端,基于 Prometheus 的 RemoteWrite 协议。
Grafana 仪表盘市场
categraf插件说明
categraf部署文档
categraf下载地址
下载文件例如: categraf-v0.3.45-linux-amd64.tar.gz
2. Categraf部署
有些监控插件,docker部署方式很难配置,所以采用二进制部署Categraf。
- 删除不使用的插件
categraf-v0.3.45-linux-amd64/conf/input.* - 修改插件配置*.toml
- 修改Categraf配置config.toml
[global]
hostname = "机器标签"
[[writers]]
url = "http://192.168.6.226:17000/prometheus/v1/write"
[ibex]
enable = true
servers = ["192.168.6.226:20090"]
[heartbeat]
url = "http://192.168.6.226:17000/v1/n9e/heartbeat"
- 拷贝categraf
拷贝categraf-v0.3.45-linux-amd64内的所有文件/文件夹到要部署的环境 /home/monitor/categraf - 安装启动categraf
cd /home/monitor/categraf && chmod +x categraf && ./categraf --install && ./categraf --start
- 其他命令
# 以service方式安装, 相当于添加service文件+systemctl daemon-reload
sudo ./categraf --install
# 以service方式卸载, 相当于systemctl stop categraf + 删除service文件
# 如果安装过categraf,先卸载
sudo ./categraf --remove
# 以service方式启动categraf ,相当于systemctl start categraf
sudo ./categraf --start
# 以service方式停止categraf,相当于systemctl stop categraf
sudo ./categraf --stop
# 以service方式查看categraf,相当于systemctl status categraf
sudo ./categraf --status
# 采集了哪些 mysql 指标
sudo ./categraf --test --inputs mysql
3. 测试服务器部署
4. 系统监控插件
- cpu 插件:采集本机 CPU 的使用率、空闲率等
input.cpu/cpu.toml,可使用默认配置
# 采集频率
interval = 15
# 是否采集每个单核的指标
collect_per_cpu = false
- 磁盘 插件:采集磁盘利用率、inode利用率等
input.disk/disk.toml,可使用默认配置
# 采集频率
interval = 15
# 统计指定挂载点
# mount_points = ["/"]
# 按文件系统类型忽略挂载点
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs", "nsfs", "CDFS"]
# 忽略挂载点
ignore_mount_points = ["/boot", "/var/lib/kubelet/pods"]
- 磁盘IO 插件:采集磁盘读写IO指标
input.diskio/diskio.toml,可使用默认配置
# 采集频率
interval = 15
# 统计指定设备
# devices = ["sda", "sdb", "vd*"]
- 内核 插件:采集 OS 启动时间,上下文切换的次数等
input.kernel/kernel.toml,可使用默认配置
# 采集频率
interval = 15
- 内存 插件:采集内存利用率等
input.mem/mem.toml,可使用默认配置
# 采集频率
interval = 15
# 是否采集各个平台特有的指标
collect_platform_fields = true
- 网络流量 插件:采集网卡的流量、包量等
input.net/net.toml,可使用默认配置
# 采集频率
interval = 15
# 是否在Linux上收集协议统计信息
# collect_protocol_stats = false
# 统计指定网卡信息
# interfaces = ["eth0"]
- 网络连接 插件:采集有多少 time_wait 连接,多少 established 连接等
input.netstat/netstat.toml,可使用默认配置
# 采集频率
interval = 15
disable_summary_stats = false
# 如果有很多网络连接, 该插件占用系统资源
disable_connection_stats = true
tcp_ext = false
ip_ext = false
- ntp时间 插件:监控机器时间偏移量
input.ntp/ntp.toml
# 采集频率
interval = 15
# ntp 服务器
ntp_servers = ["ntp.aliyun.com"]
# 响应超时时间
timeout = 5
- 进程 插件:采集进程 running 的有多少,sleeping 的有多少,total 有多少
input.processes/processes.toml,可使用默认配置
# 采集频率
interval = 15
# 强制使用ps命令收集
# force_ps = false
# 强制使用/proc收集
# force_proc = false
- system 插件:采集系统负载信息
input.system/system.toml,可使用默认配置
# 采集频率
interval = 15
# 是否收集system_n_users信息
# collect_user_number = false
5. 显卡监控插件
- nvidia显卡 插件:监控nvidia显卡信息
input.nvidia_smi/nvidia_smi.toml
# 采集频率
interval = 15
# 执行本地命令
nvidia_smi_command = "nvidia-smi"
# 可以通过运行`nvidia-smi --help-query-gpus`来查找可能的字段
# `AUTO` 自动检测要查询的字段
query_field_names = "AUTO"
6. 服务监控插件
- docker 插件:docker容器监控
input.docker/docker.toml
# 采集频率
interval = 15
[[instances]]
# interval = global.interval * interval_times
interval_times = 1
## Docker Endpoint
endpoint = "unix:///var/run/docker.sock"
# 包括/排除的容器
container_name_include = []
container_name_exclude = []
gather_services = false
gather_extend_memstats = false
container_id_label_enable = true
container_id_label_short_style = false
timeout = "5s"
perdevice_include = []
total_include = ["cpu", "blkio", "network"]
docker_label_include = []
docker_label_exclude = ["annotation*", "io.kubernetes*", "*description*", "*maintainer*", "*hash", "*author*", "*org_*", "*date*", "*url*", "*docker_compose*"]
- 日志 插件:提取日志内容,转换为监控metrics
input.mtail/mtail.toml
# 采集频率
interval = 15
[[instances]]
progs = "/home/monitor/categraf/conf/input.mtail/prog1" # 日志解析规则配置文件的路径
logs = ["/home/logs/example/all.log"] # 日志文件
labels = { log="6.221-example-log" } # 日志标签
override_timezone = "Asia/Shanghai" # 时区
emit_metric_timestamp = "true" # 时间戳
input.mtail/prog1/rule_error.mtail
gauge error_num
/ERROR.*/ {
error_num++
}
input.mtail/prog1/rule_info.mtail
gauge info_num
/INFO.*/ {
info_num++
}
input.mtail/prog1/rule_login.mtail
gauge login_num
/登录账户.*/ {
login_num++
}
- mysql 插件:连到 mysql 实例,执行一些 sql,解析输出内容,整理为监控数据上报
input.mysql/mysql.toml
# 采集频率
interval = 15
# 定义instance, 一个instance对应一个mysql实例
[[instances]]
address = "192.168.6.200:3306"
username = "root"
password = "123456"
# 是否使用tls 等定制参数
parameters = "tls=false"
- nginx 插件:监控nginx状态,该插件依赖nginx的 **http_stub_status_module
input.nginx/nginx.toml
# 采集频率
interval = 15
[[instances]]
# 设置访问 Nginx stub_status 链接
urls = ["http://192.168.6.223:8080/nginx_status"]
response_timeout = "5s"
nginx服务需要启用http_stub_status_module模块
nginx.conf 配置加上
http {
location /nginx_status {
stub_status on;
access_log off;
allow 192.168.6.226; // 允许IP访问
deny all; // 禁止其他IP访问
}
}
}
http://192.168.6.223:8080/nginx_status
- redis 插件:就是连上 redis,执行 info 命令,解析结果,整理成监控数据上报
input.redis/redis.toml
# 采集频率
interval = 15
# 定义instance, 一个instance对应一个redis实例
[[instances]]
address = "192.168.6.223:6379"
username = ""
password = ""
pool_size = 2
# 是否开启slowlog收集
gather_slowlog = true
# 最多收集少条slowlog
slowlog_max_len = 100
二、监控仪表盘
1. 机器列表
- 仪表盘 JSON
{
"name": "机器列表",
"tags": "",
"ident": "",
"configs": {
"panels": [
{
"type": "table",
"id": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
"layout": {
"h": 11,
"i": "77bf513a-8504-4d33-9efe-75aaf9abc9e4",
"isResizable": true,
"w": 24,
"x": 0,
"y": 5
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "avg(system_uptime{ident=~\"$ident\"}) by (ident)",
"refId": "A",
"legend": "启动时长"
},
{
"expr": "avg(cpu_usage_active{cpu=\"cpu-total\", ident=~\"$ident\"}) by (ident)",
"legend": "CPU使用率",
"refId": "B"
},
{
"expr": "avg(mem_used_percent{ident=~\"$ident\"}) by (ident)",
"legend": "内存使用率",
"refId": "C"
},
{
"expr": "avg(mem_total{ident=~\"$ident\"}) by (ident)",
"legend": "总内存",
"refId": "D"
},
{
"expr": "avg(disk_used_percent{ident=~\"$ident\",path=\"/\"}) by (ident)",
"legend": "硬盘使用率",
"refId": "E"
},
{
"expr": "avg(disk_total{ident=~\"$ident\"}) by (ident)",
"refId": "F",
"legend": "总硬盘"
},
{
"expr": "avg(rate(net_bytes_recv{ident=~\"$ident\"}[1m])) by(ident)",
"refId": "G",
"legend": "网络入流量"
},
{
"expr": "avg(rate(net_bytes_sent{ident=~\"$ident\"}[1m])) by(ident)",
"refId": "H",
"legend": "网络出流量"
},
{
"expr": "avg(nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}) by (ident)",
"refId": "I",
"legend": "GPU使用率"
},
{
"expr": "avg(nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
"refId": "J",
"legend": "显存使用率"
},
{
"expr": "avg(nvidia_smi_memory_total_bytes{ident=~\"$ident\"}) by (ident)",
"refId": "K",
"legend": "总显存"
},
{
"expr": "ntp_offset_ms",
"refId": "L",
"legend": "NTP偏移 ms"
}
],
"transformations": [
{
"id": "organize",
"options": {
"renameByName": {
"ident": "机器"
}
}
}
],
"name": "机器列表",
"maxPerRow": 4,
"custom": {
"showHeader": true,
"colorMode": "background",
"calc": "lastNotNull",
"displayMode": "labelValuesToRows",
"aggrDimension": "ident",
"sortColumn": "ident",
"sortOrder": "ascend",
"linkMode": "cellLink"
},
"options": {
"standardOptions": {}
},
"overrides": [
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "A"
},
"properties": {
"standardOptions": {
"util": "humantimeSeconds"
}
}
},
{
"matcher": {
"id": "byFrameRefID",
"value": "B"
},
"properties": {
"standardOptions": {
"util": "percent",
"decimals": 1
},
"valueMappings": []
}
},
{
"matcher": {
"id": "byFrameRefID",
"value": "C"
},
"properties": {
"standardOptions": {
"util": "percent",
"decimals": 1
},
"valueMappings": []
},
"type": "special"
},
{
"matcher": {
"id": "byFrameRefID",
"value": "D"
},
"properties": {
"standardOptions": {
"decimals": 1,
"util": "bytesIEC"
},
"valueMappings": []
},
"type": "special"
},
{
"matcher": {
"id": "byFrameRefID",
"value": "E"
},
"properties": {
"standardOptions": {
"decimals": 1,
"util": "percent"
},
"valueMappings": []
},
"type": "special"
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "F"
},
"properties": {
"standardOptions": {
"util": "bytesIEC",
"decimals": 0
}
}
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "G"
},
"properties": {
"standardOptions": {
"util": "bytesSecIEC",
"decimals": 1
}
}
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "H"
},
"properties": {
"standardOptions": {
"util": "bytesSecIEC",
"decimals": 1
}
}
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "I"
},
"properties": {
"standardOptions": {
"util": "percentUnit",
"decimals": 1
}
}
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "J"
},
"properties": {
"standardOptions": {
"util": "percentUnit",
"decimals": 1
}
}
},
{
"type": "special",
"matcher": {
"id": "byFrameRefID",
"value": "K"
},
"properties": {
"standardOptions": {
"util": "bytesIEC",
"decimals": 1
}
}
}
]
}
],
"var": [
{
"definition": "prometheus",
"name": "prom",
"type": "datasource"
},
{
"allOption": true,
"datasource": {
"cate": "prometheus",
"value": "${prom}"
},
"definition": "label_values(system_load1,ident)",
"multi": true,
"name": "ident",
"type": "query"
}
],
"version": "3.0.0"
}
}
- 仪表盘 效果
2. 系统监控
- 仪表盘 JSON
{
"name": "系统监控",
"tags": "",
"ident": "",
"configs": {
"panels": [
{
"type": "timeseries",
"id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
"layout": {
"h": 7,
"w": 8,
"x": 0,
"y": 0,
"i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "cpu_usage_active{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-使用率"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "CPU使用率",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "percent",
"min": 0,
"max": 101,
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"min": null,
"max": null,
"decimals": null
}
}
}
]
},
{
"type": "timeseries",
"id": "239aacdf-1982-428b-b240-57f4ce7f946d",
"layout": {
"h": 7,
"w": 8,
"x": 8,
"y": 0,
"i": "239aacdf-1982-428b-b240-57f4ce7f946d",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "mem_used_percent{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-使用率"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "内存使用率",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "percent",
"min": 0,
"max": 101,
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"decimals": null,
"min": null,
"max": null
}
}
}
]
},
{
"type": "timeseries",
"id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
"layout": {
"h": 7,
"w": 8,
"x": 16,
"y": 0,
"i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "rate(diskio_read_bytes{ident=~\"$ident\"}[1m])",
"legend": "{{ident}}-{{name}}-读IO",
"refId": "A"
},
{
"expr": "rate(diskio_write_bytes{ident=~\"$ident\"}[1m])",
"legend": "{{ident}}-{{name}}-写IO",
"refId": "B"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "磁盘IO",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "bytesIEC",
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
"layout": {
"h": 7,
"w": 8,
"x": 0,
"y": 7,
"i": "f2ee5d32-737c-4095-b6b7-b15b778ffdb9",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "rate(net_bytes_recv{ident=~\"$ident\"}[1m])",
"legend": "{{ident}}-入流量",
"refId": "A"
},
{
"expr": "rate(net_bytes_sent{ident=~\"$ident\"}[1m])",
"legend": "{{ident}}-出流量",
"refId": "B"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "网络流量",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "bytesIEC",
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "6be9a2be-1d4c-488d-b695-aa1d82df3a3c",
"layout": {
"h": 7,
"w": 8,
"x": 8,
"y": 7,
"i": "e164a7cb-394c-4670-b83c-e9321a08cbe6",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "nvidia_smi_utilization_gpu_ratio{ident=~\"$ident\"}",
"legend": "{{ident}}-使用率",
"refId": "A"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "显卡使用率",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "percentUnit",
"min": 0,
"max": 1.01,
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "7873f825-1e41-45e9-a1ee-792a87fd4351",
"layout": {
"h": 7,
"w": 8,
"x": 16,
"y": 7,
"i": "37ced102-b020-4e3f-8247-6b2c9240a762",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "nvidia_smi_memory_used_bytes/nvidia_smi_memory_total_bytes{ident=~\"$ident\"}",
"legend": "{{ident}}-使用率",
"refId": "A"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "显存使用率",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"util": "percentUnit",
"min": 0,
"max": 1.01,
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
}
],
"var": [
{
"definition": "prometheus",
"name": "prom",
"type": "datasource"
},
{
"allOption": true,
"datasource": {
"cate": "prometheus",
"value": "${prom}"
},
"definition": "label_values(system_load1,ident)",
"multi": true,
"name": "ident",
"type": "query"
}
],
"version": "3.0.0"
}
}
- 仪表盘 效果
3. 服务监控
- 仪表盘 JSON
{
"name": "服务监控",
"tags": "",
"ident": "",
"configs": {
"panels": [
{
"type": "timeseries",
"id": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
"layout": {
"h": 6,
"w": 8,
"x": 0,
"y": 0,
"i": "043c26de-d19f-4fe8-a615-2b7c10ceb828",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "mysql_global_status_threads_connected{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-当前连接数"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "MySQL 连接数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"min": null,
"max": null,
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"min": null,
"max": null,
"decimals": null
}
}
}
]
},
{
"type": "timeseries",
"id": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
"layout": {
"h": 6,
"w": 8,
"x": 8,
"y": 0,
"i": "bbd1ebda-99f6-419c-90a5-5f84973976dd",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "mysql_global_status_slow_queries{ident=~\"$ident\"}",
"legend": "{{ident}}-慢查询",
"refId": "A"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "MySQL 慢查询数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "3ca8db64-b25e-4e72-8dac-187cec4886ae",
"layout": {
"h": 6,
"w": 8,
"x": 16,
"y": 0,
"i": "7174939f-2742-47bd-a023-5d1d3698bf76",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "mtail_login_num{ident=~\"$ident\"}",
"legend": "{{ident}}-登录",
"refId": "A",
"time": {
"start": "now-24h",
"end": "now"
}
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "登录 日志数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "093b192e-e991-4590-ab4b-aa768159e00f",
"layout": {
"h": 6,
"w": 8,
"x": 0,
"y": 6,
"i": "a18a3bd3-8c2b-4fa2-81f3-7b0d00b49cc9",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "redis_connected_clients{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-当前连接数"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Redis 连接数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"min": null,
"max": null,
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0.01,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"min": null,
"max": null,
"decimals": null
}
}
}
]
},
{
"type": "timeseries",
"id": "2674442f-937f-4027-806b-10b2286b14f6",
"layout": {
"h": 6,
"w": 8,
"x": 8,
"y": 6,
"i": "c8c061df-894d-458e-a89d-86a8428c52c9",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "redis_used_memory{ident=~\"$ident\"}",
"legend": "{{ident}}-内存",
"refId": "A"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Redis 使用内存",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "d26e8bc3-16a0-4a60-9aa9-36d71b85abc5",
"layout": {
"h": 6,
"w": 8,
"x": 16,
"y": 6,
"i": "0a3310ea-74ca-48fa-8c18-52c1b0f71235",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "mtail_error_num{ident=~\"$ident\"}",
"legend": "{{ident}}-错误",
"refId": "A",
"time": {
"start": "now-24h",
"end": "now"
}
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Error 日志数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
},
{
"type": "timeseries",
"id": "7fa2cdbe-b782-4b71-bd7e-2cdba7455e77",
"layout": {
"h": 6,
"w": 8,
"x": 0,
"y": 12,
"i": "9a2e4d49-7a4f-4627-b2f6-cbe0e4ab04b1",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "nginx_active{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-活跃连接"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Nginx 活跃连接数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"min": null,
"max": null,
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"min": null,
"max": null,
"decimals": null
}
}
}
]
},
{
"type": "timeseries",
"id": "0cb01432-ea29-41f4-8e6f-e6b9b71e90ab",
"layout": {
"h": 6,
"w": 8,
"x": 8,
"y": 12,
"i": "8bf97e38-e840-4804-a686-28bb65fec78d",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "docker_n_containers_running{ident=~\"$ident\"}",
"refId": "A",
"legend": "{{ident}}-启动容器"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Docker 启动容器数",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"min": null,
"max": null,
"decimals": null
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off",
"standardOptions": {
"min": null,
"max": null,
"decimals": null
}
}
}
]
},
{
"type": "timeseries",
"id": "936b934b-6340-4743-8c12-821c63210fd6",
"layout": {
"h": 6,
"w": 8,
"x": 16,
"y": 12,
"i": "c6da1998-c1e3-4486-a24c-58e26d349206",
"isResizable": true
},
"version": "3.0.0",
"datasourceCate": "prometheus",
"datasourceValue": "${prom}",
"targets": [
{
"expr": "docker_container_mem_usage{ident=~\"$ident\"}",
"legend": "{{ident}}-{{container_name}}-内存",
"refId": "A"
}
],
"transformations": [
{
"id": "organize",
"options": {}
}
],
"name": "Docker 内存使用率",
"maxPerRow": 4,
"options": {
"tooltip": {
"mode": "all",
"sort": "desc"
},
"legend": {
"displayMode": "hidden",
"behaviour": "showItem"
},
"standardOptions": {
"decimals": 0
},
"thresholds": {
"steps": [
{
"color": "#634CD9",
"value": null,
"type": "base"
}
]
}
},
"custom": {
"drawStyle": "lines",
"lineInterpolation": "smooth",
"spanNulls": false,
"lineWidth": 2,
"fillOpacity": 0,
"gradientMode": "none",
"stack": "off",
"scaleDistribution": {
"type": "linear"
}
},
"overrides": [
{
"matcher": {
"id": "byFrameRefID"
},
"properties": {
"rightYAxisDisplay": "off"
}
}
]
}
],
"var": [
{
"definition": "prometheus",
"name": "prom",
"type": "datasource"
},
{
"allOption": true,
"datasource": {
"cate": "prometheus",
"value": "${prom}"
},
"definition": "label_values(system_load1,ident)",
"multi": true,
"name": "ident",
"type": "query"
}
],
"version": "3.0.0"
}
}
- 仪表盘 效果
三、告警配置
1. 邮件通知
- 配置 SMTP
- 配置 用户邮箱
- 配置 邮件通知模板
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="ie=edge">
<title>夜莺告警通知</title>
<style type="text/css">
.wrapper {
background-color: #f8f8f8;
padding: 15px;
height: 100%;
}
.main {
width: 600px;
padding: 30px;
margin: 0 auto;
background-color: #fff;
font-size: 12px;
font-family: verdana,'Microsoft YaHei',Consolas,'Deja Vu Sans Mono','Bitstream Vera Sans Mono';
}
header {
border-radius: 2px 2px 0 0;
}
header .title {
font-size: 14px;
color: #333333;
margin: 0;
}
header .sub-desc {
color: #333;
font-size: 14px;
margin-top: 6px;
margin-bottom: 0;
}
hr {
margin: 20px 0;
height: 0;
border: none;
border-top: 1px solid #e5e5e5;
}
em {
font-weight: 600;
}
table {
margin: 20px 0;
width: 100%;
}
table tbody tr{
font-weight: 200;
font-size: 12px;
color: #666;
height: 32px;
}
.succ {
background-color: green;
color: #fff;
}
.fail {
background-color: red;
color: #fff;
}
.succ th, .succ td, .fail th, .fail td {
color: #fff;
}
table tbody tr th {
width: 80px;
text-align: right;
}
.text-right {
text-align: right;
}
.body {
margin-top: 24px;
}
.body-text {
color: #666666;
-webkit-font-smoothing: antialiased;
}
.body-extra {
-webkit-font-smoothing: antialiased;
}
.body-extra.text-right a {
text-decoration: none;
color: #333;
}
.body-extra.text-right a:hover {
color: #666;
}
.button {
width: 200px;
height: 50px;
margin-top: 20px;
text-align: center;
border-radius: 2px;
background: #2D77EE;
line-height: 50px;
font-size: 20px;
color: #FFFFFF;
cursor: pointer;
}
.button:hover {
background: rgb(25, 115, 255);
border-color: rgb(25, 115, 255);
color: #fff;
}
footer {
margin-top: 10px;
text-align: right;
}
.footer-logo {
text-align: right;
}
.footer-logo-image {
width: 108px;
height: 27px;
margin-right: 10px;
}
.copyright {
margin-top: 10px;
font-size: 12px;
text-align: right;
color: #999;
-webkit-font-smoothing: antialiased;
}
</style>
</head>
<body>
<div class="wrapper">
<div class="main">
<header>
<h3 class="title">{{.RuleName}}</h3>
<p class="sub-desc"></p>
</header>
<hr>
<div class="body">
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
{{if .IsRecovered}}
<tr class="succ">
<th>级别状态:</th>
<td>S{{.Severity}} Recovered</td>
</tr>
{{else}}
<tr class="fail">
<th>级别状态:</th>
<td>S{{.Severity}} Triggered</td>
</tr>
{{end}}
{{if not .IsRecovered}}
<tr>
<th>触发时值:</th>
<td>{{.TriggerValue}}</td>
</tr>
{{end}}
{{if .TargetIdent}}
<tr>
<th>监控对象:</th>
<td>{{.TargetIdent}}</td>
</tr>
{{end}}
<tr>
<th>监控指标:</th>
<td>{{.TagsJSON}}</td>
</tr>
{{$time_duration := sub now.Unix .FirstTriggerTime }}
{{if .IsRecovered}}
<tr>
<th>持续时间:</th>
<td>{{humanizeDurationInterface $time_duration}}</td>
</tr>
<tr>
<th>恢复时间:</th>
<td>{{timeformat .LastEvalTime}}</td>
</tr>
{{else}}
<tr>
<th>触发时间:</th>
<td>
{{timeformat .TriggerTime}}
</td>
</tr>
{{end}}
</tbody>
</table>
</div>
</div>
</div>
</body>
</html>
2. 告警规则
- CPU 使用率超过90%
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "CPU 使用率超过90%",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 60,
"prom_ql": "",
"rule_config": {
"inhibit": true,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "cpu_usage_active > 90",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- MySQL 1分钟内慢查询数超过10个
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "MySQL 1分钟内慢查询数超过10个",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 120,
"prom_ql": "",
"rule_config": {
"inhibit": false,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "increase(mysql_global_status_slow_queries[1m]) > 10",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- MySQL 连接数超过80%
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "MySQL 连接数超过80%",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 120,
"prom_ql": "",
"rule_config": {
"inhibit": false,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "avg by (instance) (mysql_global_status_threads_connected) / avg by (instance) (mysql_global_variables_max_connections) * 100 > 80",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- 内存 使用率超过85%
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "内存 使用率超过85%",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 60,
"prom_ql": "",
"rule_config": {
"inhibit": true,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "mem_used_percent > 85",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- 硬盘 使用率超过80%
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "硬盘 使用率超过80%",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 60,
"prom_ql": "",
"rule_config": {
"inhibit": true,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "disk_used_percent > 80",
"severity": 1
}
]
},
"prom_eval_interval": 30,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"0",
"1",
"2",
"3",
"4",
"5",
"6"
],
"enable_days_of_weeks": [
[
"0",
"1",
"2",
"3",
"4",
"5",
"6"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- 网络 入流量超过6M/s
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "网络 入流量超过6M/s",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 60,
"prom_ql": "",
"rule_config": {
"inhibit": false,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "rate(net_bytes_recv[1m]) / 1024 / 1024 > 6",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
- 网络 出流量超过6M/s
[
{
"cate": "prometheus",
"datasource_ids": [
0
],
"name": "网络 出流量超过6M/s",
"note": "",
"prod": "metric",
"algorithm": "",
"algo_params": null,
"delay": 0,
"severity": 0,
"severities": [
1
],
"disabled": 0,
"prom_for_duration": 60,
"prom_ql": "",
"rule_config": {
"inhibit": false,
"queries": [
{
"keys": {
"labelKey": "",
"valueKey": ""
},
"prom_ql": "rate(net_bytes_sent[1m]) / 1024 / 1024 > 6",
"severity": 1
}
]
},
"prom_eval_interval": 15,
"enable_stime": "00:00",
"enable_stimes": [
"00:00"
],
"enable_etime": "23:59",
"enable_etimes": [
"23:59"
],
"enable_days_of_week": [
"1",
"2",
"3",
"4",
"5",
"6",
"0"
],
"enable_days_of_weeks": [
[
"1",
"2",
"3",
"4",
"5",
"6",
"0"
]
],
"enable_in_bg": 0,
"notify_recovered": 1,
"notify_channels": [
"email"
],
"notify_repeat_step": 60,
"notify_max_number": 3,
"recover_duration": 60,
"callbacks": [],
"runbook_url": "",
"append_tags": [],
"annotations": {},
"extra_config": null
}
]
3. 告警自愈
- 自愈配置
- 测试告警自愈
告警自愈 > 自愈脚本 > 创建
告警自愈 > 自愈脚本 > test 创建任务 > 保存立刻执行 > 执行历史 > 点击标题下的任务