前提环境:
- Docker环境
涉及参考文档:
- 安装Prometheus
- 开始 Prometheus
- node_exporter Agent组件
一、部署Prometheus
1、启动容器将文件拷贝出来
docker run -d prom/prometheus
2、容器将文件拷贝出来
docker cp 容器ID:/usr/share/prometheus/console_libraries /usr/share/prometheus/
docker cp 容器ID:/usr/share/prometheus/consoles/ /usr/share/prometheus/
docker cp 容器ID:/etc/prometheus /data/docker_data/Promthues/conf
docker cp 容器ID:/prometheus /data/docker_data/Promthues/data
3、修改数据存储权限。 默认容器中是nobody:nobody 只能修改为777
chmod 777 /data/docker_data/Promthues/data
4、启动docker
docker run --name prometheus -d \
-v /data/docker_data/Promthues/data:/prometheus \
-v /data/docker_data/Promthues/conf:/etc/prometheus \
-p 9090:9090 prom/prometheus \
--config.file=/etc/prometheus/prometheus.yml \
--web.listen-address="0.0.0.0:9090" \
--storage.tsdb.path=/prometheus \
--web.console.libraries=/usr/share/prometheus/console_libraries \
--web.console.templates=/usr/share/prometheus/consoles \
--storage.tsdb.retention=30d \
--web.enable-lifecycle
二、访问WEB控制台
访问URL : IP+9090
三、安装node_exporter 组件
docker run -d --name node_exporter \
--restart=always \
--net="host" \
--pid="host" \
-v "/proc:/host/proc:ro" \
-v "/sys:/host/sys:ro" \
-v "/:/rootfs:ro" \
prom/node-exporter \
--path.procfs=/host/proc \
--path.rootfs=/rootfs \
--path.sysfs=/host/sys \
--collector.textfile.directory=/data/docker_data/Promthues/prom \
--collector.filesystem.ignored-mount-points='^/(sys|proc|dev|host|etc)($$|/)'
四、配置 Prometheus 以监视自身
global: #全局
scrape_interval: 15s # 默认情况下即拉取业务监控数据的间隔时间,默认一分钟
scrape_timeout: 10s # 即拉取业务监控数据接口的超时时间
evaluation_interval: 15s # 默认评估规则的频率,默认一分钟
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
labels:
name: Prometheus
五、控制台抓取数据
1、重新查看当前Prometheus的标签
2、重新查看当前Prometheus的标签
avg by (job, instance, mode) (rate(node_cpu_seconds_total{instance="10.1.32.231"}[5m]))
3、修改Prometheus 主配置文件
mkdir -p /data/docker_data/Promthues/conf/rules
global: #全局
scrape_interval: 15s # 默认情况下即拉取业务监控数据的间隔时间,默认一分钟
scrape_timeout: 10s # 即拉取业务监控数据接口的超时时间
evaluation_interval: 15s # 默认评估规则的频率,默认一分钟
# Attach these labels to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
monitor: 'codelab-monitor'
rule_files:
- "rules/*.yml" # 指定自定义规则文件存放目录
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
labels:
name: Prometheus
4、配置Prometheus记录规则
vim /data/docker_data/Promthues/conf/rules/cpu-node.yml
groups:
- name: cpu-node
rules:
- record: job_instance_mode:node_cpu_seconds:avg_rate5m
expr: avg by (job, instance, mode) (rate(node_cpu_seconds_total{instance="10.1.32.231"}[5m]))
5、热加载Prometheus 服务
检查语法是否符合
docker exec -it prometheus promtool check config /etc/prometheus/prometheus.ym
热加载配置文件
curl -XPOST http://localhost:9090/-/reload
6、重新登录Web控制台
点击匹配规则,自动跳转到Web 查询界面
匹配规则作为告警阈值进行钉钉通知告警
, 后面篇章会记录学习笔记。