由于有个地市局的等保测评要求安装监控软件,实操安装普罗米修斯和Zabbix,原本想安装Zabbix在本地安装非常顺利,但是服务器是华为鹏鲲的、ARM架构,Zabbix的有些东西找不到ARM的,所以两个都尝试了下。本篇讲解下Promethus的安装过程。
- 安装go环境
https://dl.google.com/go/go1.19.3.linux-arm64.tar.gz
#解压Go语言环境到 /usr/local
[root@xxxxxx]# tar -C /usr/local -xzf go1.8.3.linux-arm64.tar.gz
#配置系统环境参数
[root@xxxxxx]# vim /etc/profile
#在文件的最后添加如下内容:
export PATH=$PATH:/usr/local/go/bin
#刷新系统配置文件
[root@xxxxxx]# source /etc/profile
#使用 go version 命令来验证安装是否成功!
[root@xxxxxx go]# go version
go version go1.12.12 linux/arm64
#得到以上回显则配置成功!
- 安装Prometheus
https://github.com/prometheus/prometheus/releases/download/v2.40.3/prometheus-2.40.3.linux-arm64.tar.gz
下载完成后上传到服务器
#对软件包进行解压
[root@xxxxxx opt]# tar -zxvf prometheus-2.40.3.linux-arm64.tar.gz
#更改一个简单的名字
[root@xxxxxx opt]# mv /opt/prometheus-2.40.3.linux-arm64 /opt/prometheus
#进入软件目录
[root@xxxxxx opt]# cd /opt/prometheus
#查看软件版本
[root@xxxxxx prometheus]# ./prometheus --version
#得到以下结果
prometheus, version 2.40.3 (branch: HEAD, revision: 881111fec4332c33094a6fb2680c71fffc427275)
build user: root@e7f4371658bf
build date: 20220315-15:03:51
go version: go1.17.8
platform: linux/arm6
- 启动Prometheus:
./prometheus
至此 Prometheus的安装和启动已经完成了~可以查看端口是否启用
- 配置要监控的客户端
- 下载node exporter
https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-arm64.tar.gz
4.2 安装node
#解压文件
tar -xf node_exporter-1.2.2.linux-arm64.tar.gz -C /usr/local
#同步文件
ln -sv /usr/local/node_exporter-1.2.2.linux-amd64/ /usr/local/node_exporter
4.3验证安装版本
/usr/local/node_exporter/node_exporter --version
node_exporter, version 1.2.2 (branch: HEAD, revision: 26645363b486e12be40af7ce4fc91e731a33104e)
build user: root@b9cb4aa2eb17
build date: 20210806-13:44:18
go version: go1.16.7
platform: linux/amd64
4.4创建node-exporter.service文件
vi /lib/systemd/system/node-exporter.service
[Unit]
Description=node_exporter
Documentation=https://prometheus.io
After=network.target
[Service]
type=simple
ExecStart=/usr/local/node_exporter/node_exporter --collector.ntp --collector.mountstats --collector.systemd --collector.tcpstat
ExecReload=/bin/kill -HUP $MAINPID
TimeoutStopSec=20s
Restart=always
[Install]
WantedBy=multi-user.target
4.5node-exporter开机启动
#开机自启动
systemctl enable node-exporter
#开启node-exporter服务
systemctl start node-exporter
#查看node-exporter服务
systemctl status node-exporter
#开放9100端口
firewall-cmd --zone=public --add-port=9100/tcp --permanent
#重启firewalld服务
Systemctl restart firewalld
#检查node-exporter的web页面
4.6绑定客户端
在Server端的服务器
#进入Promethus目录
cd /opt/prometheus
#修改prometheus.yml文件
在scrape_config参数下配置
- job_name: "prometheus-node1" #节点名称
static_configs:
- targets: ["node的IP:9100"]
#重启promethus服务,查看web界面,稍等1分钟状态才能刷新
4.7 Graph使用说明
4.7.1搜索框
点击open metrics explode,可以选择相关监控指标。
选完然后再点Execute搜索就可以查看图表说明
4.8告警规则配置例子
#指定规则组的路径(我这边是新建的文件)
vi prometheus.yml
#新建告警规则
vi myrules.yml
groups:
- name: node_alert
rules:
- alert: cpu_alert
expr: 100 -avg(irate(node_cpu_seconds_total{mode="idle"}[1m])) by (instance)* 100 > 80
for: 5m
labels:
level: warning
annotations:
description: "instance: {{ $labels.instance }} ,cpu usage is too high ! value: {{$value}}"
summary: "cpu usage is too high"
- alert: Memoryusage
expr: 100 - (node_memory_MemFree_bytes + node_memory_Cached_bytes + node_memory_Buffers_bytes)/ node_memory_MemTotal_bytes * 100 > 80
for: 5m
labels:
status: critical
annotations:
summary: " Memory usage high"
description: "Memory usage above 80%.( current usage:{{$value}})"
- alert: Diskusage
expr: 100 - (((node_filesystem_size_bytes{fstype=~"xfs|ext4"} - node_filesystem_free_bytes{fstype=~"xfs|ext4"}) / node_filesystem_size_bytes{fstype=~"xfs|ext4"}) * 100) > 80
for: 5m
labels:
status: critical
annotations:
summary: "Disk usage high"
description: "Disk usage above 80% ( current usage:{{$value}})"
- alert: http_alert
expr: count(prometheus_http_requests_total{code="302"})+count(prometheus_http_requests_total{code="400"})+count(prometheus_http_requests_total{code="503"})>1
for: 5m
labels:
status: critical
annotations:
summary: "响应请求失败次数大于1"
description: "响应请求失败次数大于1"
#这里有三=四个规则
- cpu_alert:CPU占用率大于80%则告警
- Memoryusage:内存占用率大于80%则告警
- Diskusage:磁盘占用率大于80%则告警
- http_alert:http请求失败的次数大于1则告警
配置成功重启服务器即可在web界面进行查看