一、Prometheus介绍
Prometheus是一款开源的监控系统,主要用于收集、存储和查询时间序列数据,以便于对系统进行监控和分析
Prometheus的架构由四个主要组件组成:
1、Prometheus Server :Prometheus Server是Prometheus的核心组件,主要负责从各个目标(target)中收集指标(metrics)数据,并对这些数据进行存储、聚合和查询。
2、Client Libraries :Prometheus提供了多种客户端库,用于在应用程序中嵌入Prometheus的指标收集功能。
3、Exporters :Exporters是用于将第三方系统的监控数据导出为Prometheus格式的组件。Prometheus支持多种Exporters,例如Node Exporter、MySQL Exporter、HAProxy Exporter等。
4、Alertmanager:Alertmanager是Prometheus的告警组件,用于根据用户定义的规则对监控数据进行告警。
Prometheus的特点
1、灵活的数据模型:Prometheus采用的是key-value对的形式存储指标数据,每个指标都可以包含多个标签(labels),这样可以更加灵活地描述指标数据
2、高效的存储和查询:Prometheus使用自己的时间序列数据库,可以高效地存储和查询大量的指标数据。
3、强大的可视化和告警功能:Prometheus提供了Web界面和API,可以方便地展示和查询监控数据。
4、可扩展性强:Prometheus的架构非常灵活,可以根据需要选择合适的组件进行配置。
CNCF的成员项目:Prometheus作为CNCF的项目之一,得到了广泛的关注和支持,并且得到了来自全球各地的贡献者的积极参与和开发.
二、Prometheus部署搭建
1、node_exporter部署搭建
1、下载
wget https://github.com/prometheus/node_exporter/releases/download/v1.8.2/node_exporter-1.8.2.linux-amd64.tar.gz
2、解压部署启动
tar -xf node_exporter-1.8.2.linux-amd64.tar.gz
ln -s node_exporter-1.8.2.linux-amd64 /usr/local/node_exporter
3、设置启动脚本
vim start_noder.sh
/usr/local/node_exporter/node_exporter \
--collector.textfile.directory=/usr/local/node_exporter/tmp/ \
--web.config.file=config.yml \
--web.listen-address=0.0.0.0:19100
4、附录config.yml文件配置(账号密码admin/123456 此文档中所有都是使用的该信息)
cat config.yml
basic_auth_users:
admin: $2y$12$Y9/tZwO8FJC2I.IPt47ufOwFZRNrjSOPk0rUtOhB97cXNdvCikFDW
2、proc_exporter部署
1、此处需要使用到pyton3,推荐使用anaconda3进行安装,此处略,对应网址
https://www.anaconda.com/download
2、prometheus_client安装
python3 -m pip install client_python-0.13.1.tar.gz
3、设置开机自启动脚本
vim /usr/lib/systemd/system/proc_exporter.service
[Unit]
Description=proc_exporter
After=network.target
[Service]
Type=simple
ExecStart=/usr/bin/python3 /usr/local/proc_exporter/proc_exporter.py -c /usr/local/proc_exporter/proc_exporter.ini
Restart=on-failure
[Install]
WantedBy=multi-user.target
4、配置文件调整修改,按照如下格式进行业务模块添加删除
vim proc_exportter.ini
## 进程配置, 修改后生效, 不需要重启
[node_exporter]
## 进程名: 能够唯一标识进程的关键字, 如: node_exporter
name = node_exporter
## 进程模块: 进程所归属的子系统或模块, 如: prometheus,
moudle = prometheus
## 进程负责人: 当进程出现异常, 需要介入处理的开发人员
manager =
## core文件目录, 配置绝对路径, 如不需要检测core文件则配空
directory =
## core文件名前缀
prefix =
5、启动
systemctl daemon-reload
systemctl enable proc_exporter.service
systemctl restart proc_exporter.service
3、Alertmanager部署
1、下载
wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz
2、解压部署
tar -xf alertmanager-0.27.0.linux-amd64.tar.gz
ln -s alertmanager-0.27.0.linux-amd64 /usr/local/alertmanager
3、编写启动脚本
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
Type=simple
User=root
Group=root
Restart=on-abnormal
ExecStart=/usr/local/alertmanager/alertmanager \
--config.file=/usr/local/alertmanager/alertmanager.yml \
--web.listen-address=0.0.0.0:19093 \
--web.config.file=/usr/local/alertmanager/config.yml \
[Install]
WantedBy=multi-user.target
4、配置文件调整
vim alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.mail.139.com:25' # 邮箱smtp服务器
smtp_from: 'hly12599-alarm@139.com' # 发送邮箱名称
smtp_auth_username: 'hly12599-alarm@139.com' # 邮箱地址
smtp_auth_password: '23bb4dee88805e0fb400' # 邮箱密码
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 5m
repeat_interval: 3m
receiver: 'alert-receiver'
routes:
- receiver: 'data'
continue: true
templates:
- './templates/*.tmpl'
receivers:
- name: 'data'
webhook_configs:
- url: 'http://192.168.10.139:5000/alertinfo'
- name: 'alert-receiver'
email_configs:
- to: 15901283579@139.com
send_resolved: true
inhibit_rules:
- source_match:
severity: 'warning'
target_match:
severity: 'warning'
equal: ['job', 'instance','severity']
####检查配置:./amtool check-config alertmanager.yml
5、启动
systemctl daemon-reload
systemctl enable alertmanager.service
systemctl restart alertmanager.service
4、pushgateway部署
1、下载
wget https://github.com/prometheus/pushgateway/releases/download/v1.9.0/pushgateway-1.9.0.linux-amd64.tar.gz
2、解压部署
tar -xf pushgateway-1.9.0.linux-amd64.tar.gz
ln -s pushgateway-1.9.0.linux-amd64 /usr/local/pushgateway
3、编写启动文件
vim /usr/lib/systemd/system/pushgateway.service
[Unit]
Description=pushgateway
Wants=network-online.target
After=network-online.target
[Service]
Type=simple
User=root
Group=root
Restart=always
ExecStart=/usr/local/pushgateway/pushgateway \
--web.listen-address=0.0.0.0:19091 \
--web.config.file=/usr/local/pushgateway/config.yml
[Install]
WantedBy=multi-user.target
4、启动
systemctl daemon-reload
systemctl enable pushgateway.service
systemctl restart pushgateway.service
5、prometheus部署
1、下载
wget https://github.com/prometheus/prometheus/releases/download/v2.53.2/prometheus-2.53.2.linux-amd64.tar.gz
2、解压部署
tar -xf prometheus-2.53.2.linux-amd64.tar.gz
ln -s prometheus-2.53.2.linux-amd64 /usr/local/prometheus
3、编写启动脚本
vim /usr/lib/systemd/system/prometheus.service
[Unit]
Description=Prometheus server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
Type=simple
User=root
Group=root
Restart=on-abnormal
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--web.listen-address=0.0.0.0:19090 \
--web.config.file=/usr/local/prometheus/config.yml \
--storage.tsdb.path=/usr/local/prometheus/data \
--storage.tsdb.retention.time=180d \
--web.console.templates=/usr/local/monitor/prometheus/consoles \
--web.console.libraries=/usr/local/monitor/prometheus/console_libraries \
--web.max-connections=512 \
--web.enable-lifecycle
[Install]
WantedBy=multi-user.target
4、启动
systemctl daemon-reload
systemctl enable prometheus.service
systemctl restart prometheus.service
5、修改配置文件,添加主机监控和进程监控
vim prometheus.yml
global:
scrape_interval: 60s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 30s # Evaluate rules every 15 seconds. The default is every 1 minute.
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "./rules/*.yml"
scrape_configs:
- job_name: "node_host"
basic_auth:
username: admin
password: 123456
scrape_interval: 1m
static_configs:
- targets: ["192.168.10.139:19100"]
- job_name: "proc_host"
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
static_configs:
- targets: ["192.168.10.140:19001"]
- job_name: "alertmanager"
basic_auth:
username: admin
password: 123456
static_configs:
- targets: ["192.168.10.139:19093"]
- job_name: "pushgateway_server"
basic_auth:
username: admin
password: 123456
honor_labels: true
scrape_interval: 1m
scrape_timeout: 1m
static_configs:
- targets: ["192.168.10.139:9091"]
6、加载生效
curl -X POST -u admin:123456 http://192.168.10.139:9090/-/reload
6、Grafana部署
1、下载地址
wget https://dl.grafana.com/oss/release/grafana-10.3.7-1.x86_64.rpm
2、安装部署启动
rpm -Uvh grafana-10.3.7-1.x86_64.rpm
3、修改配置文件端口,然后启动即可
echo "http_port = 13000" >> /etc/grafana/grafana.ini
systemctl daemon-reload
systemctl enable grafana-server.service
systemctl restart grafana-server.service
4、通过web浏览器即可打开对应的web界面