基于docker的prometheus+grafana+altermanager+prometheus-webhook-dingtalk钉钉报警

news2024/10/6 4:10:11

一、各软件功能简介

prometheus:Prometheus(是由go语言(golang)开发)是一套开源的监控&报警&时间序列数 据库的组合。主要优点:外部依赖安装使用超简单、系统集成 多等
grafana:Grafana 是一款采用 go 语言编写的开源应用,主要用于大规模指标数据的 可视化展现,是网络架构和应用分析中最流行的时序数据展示工具,目前已经支 持绝大部分常用的时序数据库。主要优点:展示方便、数据源种类多、内置通知提醒功能
altermanager:AlterManager是一个基于开源框架Prometheus和Grafana的告警管理系统。它可以帮助我们轻松地实现监控告警功能,并支持多种告警方式。主要优点:告警方式多样
prometheus-webhook-dingtalk:prometheus-webhook-dingtalk 是一个用于将 Prometheus 告警通知发送到钉钉群组的 webhook 模块。它提供了一种与钉钉无缝集成的方式,使监控团队能够及时接收和处理告警通知,并进行有效的团队协作。

二、基础准备工作

docker安装:这个太简单了我就不介绍了,后期有时间再出一篇专门安装docker的
服务器:由于我是测试就是用了一台机器centos7 ip:10.10.30.34(后期中配置文件需要用到)
安装目录:创建目录mkdir -p /data/prometheus 切换目录cd /data/prometheus/,主要用于各种软件的配置和数据文件存放

三、prometheus准备工作

3.1、prometheus配置文件和数据文件

数据文件准备:创建目录mkdir -p prometheus/data 修改权限chmod 777 prometheus/data实际应用可以根据具体使用情况设置权限
配置文件准备:vim prometheus/prometheus.yml

global:
  scrape_interval:     15s    # 多久收集一次数据
  evaluation_interval: 15s    # 多久评估一次规则
  scrape_timeout:      10s    # 每次收集数据的超时时间

scrape_configs:               #收集数据配置列表
  - job_name: prometheus            # 必须配置, 自动附加的job labels, 必须唯一
    static_configs:
      - targets: ['10.10.30.34:9090']       # 指定prometheusip端口
        labels:
          instance: prometheus                 #标签

  - job_name: ehospital-exploit-database #监控客户端
    static_configs:
      - targets: ['10.10.30.34:9100']
        labels:
          instance: eehospital-exploit-database


alerting:                         #Alertmanager相关的配置
  alertmanagers:
  - static_configs:
    - targets:
      - 10.10.30.34:9093         #指定告警模块

rule_files:                      #告警规则文件, 可以使用通配符 
  - "/etc/prometheus/rules/*.yml"

3.2、prometheus告警配置

创建目录:mkdir rules 用于存放告警和触发文件
通用规则:vim rules/alert-rules.yml

groups:
  - name: prometheus-alert
    rules:
    - alert: prometheus-down
      expr: prometheus:up == 0
      for: 1m
      labels:
        severity: 'critical'
      annotations:
        summary: "instance: {{ $labels.instance }} 宕机了"
        description: "instance: {{ $labels.instance }} \n- job: {{ $labels.job }} 关机了, 时间已经1分钟了。"
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"



    - alert: prometheus-cpu-high
      expr:  prometheus:cpu:total:percent > 80
      for: 3m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} cpu 使用率高于 {{ $value }}"
        description: "instance: {{ $labels.instance }} \n- job: {{ $labels.job }} CPU使用率已经持续一分钟高过80% 。"
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"

    - alert: prometheus-cpu-iowait-high
      expr:  prometheus:cpu:iowait:percent >= 12
      for: 3m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} cpu iowait 使用率高于 {{ $value }}"
        description: "instance: {{ $labels.instance }} \n- job: {{ $labels.job }} cpu iowait使用率已经持续三分钟高过12%"
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"



    - alert: prometheus-load-load1-high
      expr:  (prometheus:load:load1) > (prometheus:cpu:count) * 1.2
      for: 3m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} load1 使用率高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-memory-high
      expr:  prometheus:memory:used:percent > 85
      for: 3m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} memory 使用率高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-disk-high
      expr:  prometheus:disk:used:percent > 80
      for: 10m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} disk 使用率高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-disk-read:count-high
      expr:  prometheus:disk:read:count:rate > 2000
      for: 2m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} iops read 使用率高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-disk-write-count-high
      expr:  prometheus:disk:write:count:rate > 2000
      for: 2m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} iops write 使用率高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-disk-read-mb-high
      expr:  prometheus:disk:read:mb:rate > 60
      for: 2m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 读取字节数 高于 {{ $value }}"
        description: ""
        instance: "{{ $labels.instance }}"
        value: "{{ $value }}"


    - alert: prometheus-disk-write-mb-high
      expr:  prometheus:disk:write:mb:rate > 60
      for: 2m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 写入字节数 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-filefd-allocated-percent-high
      expr:  prometheus:filefd_allocated:percent > 80
      for: 10m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 打开文件描述符 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-network-netin-error-rate-high
      expr:  prometheus:network:netin:error:rate > 4
      for: 1m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 包进入的错误速率 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-network-netin-packet-rate-high
      expr:  prometheus:network:netin:packet:rate > 35000
      for: 1m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 包进入速率 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-network-netout-packet-rate-high
      expr:  prometheus:network:netout:packet:rate > 35000
      for: 1m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 包流出速率 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-network-tcp-total-count-high
      expr:  prometheus:network:tcp:total:count > 40000
      for: 1m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} tcp连接数量 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-process-zoom-total-count-high
      expr:  prometheus:process:zoom:total:count > 10
      for: 10m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} 僵死进程数量 高于 {{ $value }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"


    - alert: prometheus-time-offset-high
      expr:  prometheus:time:offset > 0.03
      for: 2m
      labels:
        severity: info
      annotations:
        summary: "instance: {{ $labels.instance }} {{ $labels.desc }}  {{ $value }} {{ $labels.unit }}"
        description: ""
        value: "{{ $value }}"
        instance: "{{ $labels.instance }}"

细化规则:vim rules/record-rules.yml

groups:
  - name: prometheus-record
    rules:
    - expr: up{job!="prometheus"}
      record: prometheus:up
      labels:
        desc: "节点是否在线, 在线1,不在线0"
        unit: " "
        job: "prometheus"
    - expr: time() - node_boot_time_seconds{}
      record: prometheus:node_uptime
      labels:
        desc: "节点的运行时间"
        unit: "s"
        job: "prometheus"

    - expr: (1 - avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode="idle"}[5m])))  * 100
      record: prometheus:cpu:total:percent
      labels:
        desc: "节点的cpu总消耗百分比"
        unit: "%"
        job: "prometheus"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode="idle"}[5m])))  * 100
      record: prometheus:cpu:idle:percent
      labels:
        desc: "节点的cpu idle百分比"
        unit: "%"
        job: "prometheus"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode="iowait"}[5m])))  * 100
      record: prometheus:cpu:iowait:percent
      labels:
        desc: "节点的cpu iowait百分比"
        unit: "%"
        job: "prometheus"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode="system"}[5m])))  * 100
      record: prometheus:cpu:system:percent
      labels:
        desc: "节点的cpu system百分比"
        unit: "%"
        job: "prometheus"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode="user"}[5m])))  * 100
      record: prometheus:cpu:user:percent
      labels:
        desc: "节点的cpu user百分比"
        unit: "%"
        job: "prometheus"

    - expr: (avg by (environment,instance) (irate(node_cpu_seconds_total{job!="prometheus",mode=~"softirq|nice|irq|steal"}[5m])))  * 100
      record: prometheus:cpu:other:percent
      labels:
        desc: "节点的cpu 其他的百分比"
        unit: "%"
        job: "prometheus"

    - expr: node_memory_MemTotal_bytes{job!="prometheus"}
      record: prometheus:memory:total
      labels:
        desc: "节点的内存总量"
        unit: byte
        job: "prometheus"

    - expr: node_memory_MemFree_bytes{job!="prometheus"}
      record: prometheus:memory:free
      labels:
        desc: "节点的剩余内存量"
        unit: byte
        job: "prometheus"

    - expr: node_memory_MemTotal_bytes{job!="prometheus"} - node_memory_MemFree_bytes{job!="prometheus"}
      record: prometheus:memory:used
      labels:
        desc: "节点的已使用内存量"
        unit: byte
        job: "prometheus"

    - expr: node_memory_MemTotal_bytes{job!="prometheus"} - node_memory_MemAvailable_bytes{job!="prometheus"}
      record: prometheus:memory:actualused
      labels:
        desc: "节点用户实际使用的内存量"
        unit: byte
        job: "prometheus"

    - expr: (1-(node_memory_MemAvailable_bytes{job!="prometheus"} / (node_memory_MemTotal_bytes{job!="prometheus"})))* 100
      record: prometheus:memory:used:percent
      labels:
        desc: "节点的内存使用百分比"
        unit: "%"
        job: "prometheus"

    - expr: ((node_memory_MemAvailable_bytes{job!="prometheus"} / (node_memory_MemTotal_bytes{job!="prometheus"})))* 100
      record: prometheus:memory:free:percent
      labels:
        desc: "节点的内存剩余百分比"
        unit: "%"
        job: "prometheus"

    - expr: sum by (instance) (node_load1{job!="prometheus"})
      record: prometheus:load:load1
      labels:
        desc: "系统1分钟负载"
        unit: " "
        job: "prometheus"

    - expr: sum by (instance) (node_load5{job!="prometheus"})
      record: prometheus:load:load5
      labels:
        desc: "系统5分钟负载"
        unit: " "
        job: "prometheus"

    - expr: sum by (instance) (node_load15{job!="prometheus"})
      record: prometheus:load:load15
      labels:
        desc: "系统15分钟负载"
        unit: " "
        job: "prometheus"

    - expr: node_filesystem_size_bytes{job!="prometheus" ,fstype=~"ext4|xfs"}
      record: prometheus:disk:usage:total
      labels:
        desc: "节点的磁盘总量"
        unit: byte
        job: "prometheus"

    - expr: node_filesystem_avail_bytes{job!="prometheus",fstype=~"ext4|xfs"}
      record: prometheus:disk:usage:free
      labels:
        desc: "节点的磁盘剩余空间"
        unit: byte
        job: "prometheus"

    - expr: node_filesystem_size_bytes{job!="prometheus",fstype=~"ext4|xfs"} - node_filesystem_avail_bytes{job!="prometheus",fstype=~"ext4|xfs"}
      record: prometheus:disk:usage:used
      labels:
        desc: "节点的磁盘使用的空间"
        unit: byte
        job: "prometheus"

    - expr:  (1 - node_filesystem_avail_bytes{job!="prometheus",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{job!="prometheus",fstype=~"ext4|xfs"}) * 100
      record: prometheus:disk:used:percent
      labels:
        desc: "节点的磁盘的使用百分比"
        unit: "%"
        job: "prometheus"

    - expr: irate(node_disk_reads_completed_total{job!="prometheus"}[1m])
      record: prometheus:disk:read:count:rate
      labels:
        desc: "节点的磁盘读取速率"
        unit: "次/秒"
        job: "prometheus"

    - expr: irate(node_disk_writes_completed_total{job!="prometheus"}[1m])
      record: prometheus:disk:write:count:rate
      labels:
        desc: "节点的磁盘写入速率"
        unit: "次/秒"
        job: "prometheus"

    - expr: (irate(node_disk_written_bytes_total{job!="prometheus"}[1m]))/1024/1024
      record: prometheus:disk:read:mb:rate
      labels:
        desc: "节点的设备读取MB速率"
        unit: "MB/s"
        job: "prometheus"

    - expr: (irate(node_disk_read_bytes_total{job!="prometheus"}[1m]))/1024/1024
      record: prometheus:disk:write:mb:rate
      labels:
        desc: "节点的设备写入MB速率"
        unit: "MB/s"
        job: "prometheus"

    - expr:   (1 -node_filesystem_files_free{job!="prometheus",fstype=~"ext4|xfs"} / node_filesystem_files{job!="prometheus",fstype=~"ext4|xfs"}) * 100
      record: prometheus:filesystem:used:percent
      labels:
        desc: "节点的inode的剩余可用的百分比"
        unit: "%"
        job: "prometheus"

    - expr: node_filefd_allocated{job!="prometheus"}
      record: prometheus:filefd_allocated:count
      labels:
        desc: "节点的文件描述符打开个数"
        unit: "%"
        job: "prometheus"

    - expr: node_filefd_allocated{job!="prometheus"}/node_filefd_maximum{job!="prometheus"} * 100
      record: prometheus:filefd_allocated:percent
      labels:
        desc: "节点的文件描述符打开百分比"
        unit: "%"
        job: "prometheus"


    - expr: avg by (environment,instance,device) (irate(node_network_receive_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netin:bit:rate
      labels:
        desc: "节点网卡eth0每秒接收的比特数"
        unit: "bit/s"
        job: "prometheus"

    - expr: avg by (environment,instance,device) (irate(node_network_transmit_bytes_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netout:bit:rate
      labels:
        desc: "节点网卡eth0每秒发送的比特数"
        unit: "bit/s"
        job: "prometheus"

    - expr: avg by (environment,instance,device) (irate(node_network_receive_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netin:packet:rate
      labels:
        desc: "节点网卡每秒接收的数据包个数"
        unit: "个/秒"
        job: "prometheus"


    - expr: avg by (environment,instance,device) (irate(node_network_transmit_packets_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netout:packet:rate
      labels:
        desc: "节点网卡发送的数据包个数"
        unit: "个/秒"
        job: "prometheus"

    - expr: avg by (environment,instance,device) (irate(node_network_receive_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netin:error:rate
      labels:
        desc: "节点设备驱动器检测到的接收错误包的数量"
        unit: "个/秒"
        job: "prometheus"

    - expr: avg by (environment,instance,device) (irate(node_network_transmit_errs_total{device=~"eth0|eth1|ens33|ens37"}[1m]))
      record: prometheus:network:netout:error:rate
      labels:
        desc: "节点设备驱动器检测到的发送错误包的数量"
        unit: "个/秒"
        job: "prometheus"

    - expr: node_tcp_connection_states{job!="prometheus", state="established"}
      record: prometheus:network:tcp:established:count
      labels:
        desc: "节点当前established的个数"
        unit: "个"
        job: "prometheus"

    - expr: node_tcp_connection_states{job!="prometheus", state="time_wait"}
      record: prometheus:network:tcp:timewait:count
      labels:
        desc: "节点timewait的连接数"
        unit: "个"
        job: "prometheus"

    - expr: sum by (environment,instance) (node_tcp_connection_states{job!="prometheus"})
      record: prometheus:network:tcp:total:count
      labels:
        desc: "节点tcp连接总数"
        unit: "个"
        job: "prometheus"

四、grafana配置

创建目录:mkdir -p grafana/grafana-storage
修改权限:chmod 777 grafana/grafana-storage
grafana.ini准备:先启动一个grafana容器docker run -d --name=grafana -p 3000:3000 grafana/grafana,然后拷贝文件docker cp 93ac9e93e97a:/etc/grafana/grafana.ini ./grafana/

五、alertmanager配置

创建文件:mkdir alert
配置文件:vim alert/alertmanager.yml

route:
  group_by: ['dingding'] #根据告警规则名进行分组
  group_wait: 30s  #在组内等待配置时间,如组内30s出现同一报警,在一个组内出现
  group_interval: 1h #告警频率,一条告警消息发送后,等待1h发送第二组报警
  repeat_interval: 1h #报警间隔时间,如果1h内未修复,重新发送告警
  receiver: 'dingding.webhook1'
  routes:
  - receiver: 'dingding.webhook1'
    match_re:
      alertname: ".*"
receivers:
  - name: 'dingding.webhook1' #可设置多个接收方
    webhook_configs:
    - url: 'http://10.10.30.34:8060/dingtalk/webhook1/send'
      send_resolved: true #恢复后收到告警
inhibit_rules:
  - source_match:  #配置了仰制告警
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

六、webhook配置

创建目录:mkdir webhook
配置文件:vim webhook/config.yml

## Request timeout
# timeout: 5s

## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true

## Customizable templates path
templates:
#  - contrib/templates/legacy/template.tmpl
  - /etc/prometheus-webhook-dingtalk/templates/default.tmpl

## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
#  title: '{{ template "legacy.title" . }}'
#  text: '{{ template "legacy.content" . }}'

## Targets, previously was known as "profiles"
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxx  #钉钉机器人路径
    # secret for signature
    secret: SEC74939daa62xxx.xxxxxx   #钉钉机器人加密标签
  webhook2:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
  webhook_legacy:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    # Customize template content
    message:
      # Use legacy template
      title: '{{ template "legacy.title" . }}'
      text: '{{ template "legacy.content" . }}'
  webhook_mention_all:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      all: true
  webhook_mention_users:
    url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
    mention:
      mobiles: ['156xxxx8827', '189xxxx8325']

prometheus-webhook-dingtalk模板
创建目录:mkdir webhook/template
模板创建:vim webhook/template/default.tmpl

{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}
 
 
{{ define "__alert_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
 
**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }}
 
**告警级别**: {{ .Labels.severity }} 
 
**告警主机**: {{ .Labels.instance }} 
 
**告警信息**: {{ index .Annotations "description" }}
 
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
 
{{ define "__resolved_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}

**告警主题**: {{ .Annotations.summary }}

**告警类型**: {{ .Labels.alertname }} 
 
**告警级别**: {{ .Labels.severity }}
 
**告警主机**: {{ .Labels.instance }}
 
**告警信息**: {{ index .Annotations "description" }}
 
**告警时间**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
 
**恢复时间**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
 
 
{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}
 
{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**====侦测到{{ .Alerts.Firing | len  }}个故障====**
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}
 
{{ if gt (len .Alerts.Resolved) 0 }}
**====恢复{{ .Alerts.Resolved | len  }}个故障====**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}

{{/* Following names for compatibility */}}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}

七、docker实例创建

创建yml文件:vim docker-compose.yml

version: '3.2'
services:
  prometheus:
    image: prom/prometheus
    restart: "always"
    ports:
      - 9090:9090
    container_name: "prometheus"
    volumes:
      - "./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml"
      - "./rules:/etc/prometheus/rules"
      - "./prometheus/data:/prometheus"
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'         # 设置yml路径  跟上面挂载对应
      - '--storage.tsdb.path=/prometheus'                     #设置数据路径   跟上面挂载对应


  alertmanager:
    image: prom/alertmanager:latest
    restart: "always"
    ports:
      - 9093:9093
    container_name: "alertmanager"
    volumes:
      - "./alert/alertmanager.yml:/etc/alertmanager/alertmanager.yml"
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'         # 设置yml路径  跟上面挂载对应

  webhook:
    image: timonwong/prometheus-webhook-dingtalk
    restart: "always"
    ports:
      - 8060:8060
    container_name: "webhook"           #token指定钉钉
    volumes:
      - "./webhook/config.yml:/etc/prometheus-webhook-dingtalk/config.yml"
      - "./webhook/template/default.tmpl:/etc/prometheus-webhook-dingtalk/templates/default.tmpl"
    command:
      - '--config.file=/etc/prometheus-webhook-dingtalk/config.yml'         # 设置yml路径  跟上面挂载对应
  
  grafana:
    image: grafana/grafana
    restart: "always"
    ports:
      - 3000:3000
    container_name: "grafana"
    volumes:
      - "./grafana/grafana.ini:/etc/grafana/grafana.ini"              #配置文件自行拷贝出来
      - "./grafana/grafana-storage:/var/lib/grafana"

创建docker:docker-compose -f docker-compose.yml up -d

八、钉钉添加机器人

自己创建一个群呗,至少两个人才能建群呀!然后按下面图片操作就行了,反正点点就行就不细说了
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
后面就是啥名字、链接、加签啥的我懒得打马了就不截图了,还不会的就呵呵了

九、验证

安都安装好了那不得验证一下啊!稍微懂点的人应该已经知道了,我配置了监控服务器的指标,但是我没有启动node-export,那肯定会报警呀!没错报警信息如下啊!
在这里插入图片描述
安装node-exporter:vim node-exporter-compose.yml

version: '3.2'
services:
  node-exporter:
    image: prom/node-exporter
    restart: "always"
    ports:
      - 9100:9100
    container_name: "node-exporter"
    volumes:
      - "/proc:/host/proc:ro"
      - "/sys:/host/sys:ro"
      - "/:/rootfs:ro"

启动node-exporter:docker-compose -f node-exporter-compose.yml up -d
这不启动了节点吗?那肯定有修复告警呀!没错,收到了,只是我设置了1h后才再告警收到的慢了点啊!大家可以根据需求自己设置时间啊!
在这里插入图片描述
ps:水平高的大神自己看官方文档去整啊!小弟这给大家参考参考就行了啊!加油!!!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1898835.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

redis学习(005 java客户端 RedisTemplate学习)

黑马程序员Redis入门到实战教程,深度透析redis底层原理redis分布式锁企业解决方案黑马点评实战项目 总时长 42:48:00 共175P 此文章包含第16p-第p23的内容 文章目录 java客户端jedisSpringDataRedis项目实现hash哈希操作 java客户端 jedis 测试 ps:如果连接不上&…

PLC基础知识

1.PLC中的数据寄存器地址D表示存数据的地方。 2.PLC的物理存储器的规定:PLC存储器以字节为单位(Byte),存储单元以位(Bit)、字节(B,8Bit)、字(W,1…

jenkins搭建部署前端工程 ,从0到1

一.java环境配置 1 安装tomcatjdk17 这个也行 3 安装maven3.3.9 安装教程参考 4 安装Jenkins 下载地址 参考教程 二、相关配置 1 访问http://localhost:8080/jenkins,进入Jenkins初始化页面,第一次启动时间可能有点长,耐心等待。进入成功后会…

软考《信息系统运行管理员》-2.5信息系统运维管理系统与专用工具

2.5信息系统运维管理系统与专用工具 信息系统运维管理系统功能框架 信息系统运维管理系统是站在运维管理的整体视角,基于运维流程,以服务为导向的业务 服务管理和运维管理支撑平台,提供统一管理门户,最终帮助运维对象实现信息系…

(2024)KAN: Kolmogorov–Arnold Networks:评论

KAN: Kolmogorov–Arnold Networks: A review 公和众与号:EDPJ(进 Q 交流群:922230617 或加 VX:CV_EDPJ 进 V 交流群) 目录 0. 摘要 1. MLP 也有可学习的激活函数 2. 标题的意义 3. KAN 是具有样条基激活函数的 M…

代码随想录算法训练营第四十四天|188.买卖股票的最佳时机IV、309.最佳买卖股票时机含冷冻期、714.买卖股票的最佳时机含手续费

188.买卖股票的最佳时机IV 题目链接:188.买卖股票的最佳时机IV 文档讲解:代码随想录 状态:不会 思路: 在股票买卖1使用一维dp的基础上,升级成二维的即可。 定义dp[k1][2],其中 dp[j][0] 表示第j次交易后持…

[C++]——继承 深继承

一、继承概念 (1)、定义 继承(inheritance)机制是面向对象程序设计使代码复用最重要的手段,它允许程序员在保持原有类特性的基础上进行扩展,增加功能。继承呈现了面向对象程序设计的层次结构,体现了由简单到复杂的认知过程,是类…

Hudi 写入流程(图)

前言 主要为之前总结的源码文章补充流程图。总结一下整体流程说明 之前以Java Client为例,总结了 Insert 源码的整体流程及部分源码,由于各种原因,没有总结完。长时间不看这方面的源码,容易忘记,之前没有总结流程图,现在回忆起来比较麻烦,不如看流程图方便快捷。所以先补…

六个步骤轻松将网站从Webflow迁移到WordPress

尽管Webflow和WordPress在网站构建方法上有显著差异,但将网站从 Webflow 迁移到 WordPress 并没有想象中那么复杂。 本教程将逐步指导您完成迁移过程,确保你的网站可以顺利从Webflow过渡到功能更加齐全的WordPress上。 迁移前的准备工作 在开始迁移网…

java: 找不到符号 符号: 方法 builder()

在查看了pom依赖和sdk没问题后 跳转到需要build的类 在前面加上注解即可 一般这几个配套使用

【ABB】原点设定

【ABB】原点设定 操作流程演示 操作流程 操作轴回原点编辑电机校准偏移更新转速计数器 1.首先得了解机器手的轴,这里以6轴作参考。 注意先回456轴,后回123轴。 2.然后需要了解机器人关节运动模式,即选择如下两个模式。 3.注意机器人各轴移动…

无人机5公里WiFi低延迟图传模组,抗干扰、长距离、低延迟,飞睿智能无线通信新标杆

在科技日新月异的今天,我们见证了无数通信技术的飞跃。从开始的电报、电话,到如今的4G、5G网络,再到WiFi的广泛应用,每一次技术的革新都极大地改变了人们的生活方式。飞睿智能5公里WiFi低延迟图传模组,它以其独特的优势…

【CentOS7.6】docker部署EMQX教程,本地镜像直接导入(附下载链接),没法在云服务器上魔法拉取镜像的快来

总览 先把下载链接放在这里吧,这是 EMQX 的 tar 包,能够直接导入 CentOS 的 docker: 链接:https://pan.baidu.com/s/1rSGSLoVvj83ai6d5oolg8Q?pwd0108 提取码:0108 一、安装配置教程 1.将 EMQX-latest.tar 包导入…

七、Linux二进制安装Redis集群

目录 七、Linux二进制安装Redis集群1 安装Redis所需依赖2 单机安装Redis(7.2.4)2.1 下载Redis2.2 安装Redis 3 分布式部署模式(Redis Cluster)3.1 分布式部署模式的配置文件3.2创建集群 4 主从复制模式(Redis Sentinel…

PMP拿证捷径!七大要点让你事半功倍!

正确阅读PMBOK PMP考试的出题基本上来自于PMBOK教材,要对知识点理解透彻,清晰明了,针对考题就需要来灵活运用了。 PMBOK信息量很大,阅读起来,对每个人的耐力都是一种较量。 PMP考试是中英文对照,但教材中…

echarts图表加载显示空白

数据请求了,图表加载显示空白 报错: Error: Initialize failed: invalid dom. at Object.init (echarts.js:2273:1) 方案 1. 通过this.$nexttick(()>{}) , 试过, 还是不行 2、把 this.lineChart2 this.$echarts.init(document.g…

通义灵码入选 2024 世界人工智能大会最高荣誉「镇馆之宝」

7 月 4 日,2024 上海世界人工智能大会正式开幕,并揭晓了今年的「镇馆之宝」名单,通义灵码入选,是首个入围该名单的 AI 编程助手。 镇馆之宝是世界人工智能大会展览的最高荣誉,从科技含量、市场前景、创新性以及社会经济…

全生命周期陪伴,企业成长的最佳伙伴

国际数字影像产业园的全生命周期服务方案致力于为入驻企业提供从初创期到成熟期的各个阶段都能得到的全方位支持和帮助,确保企业能够稳健成长,实现可持续发展。 一、初创期服务 1、孵化器和加速器服务:为初创企业提供先进的硬件设施&#xf…

小白 | Linux安装java8

一、更新包列表 sudo apt update 二、安装 Java 8 sudo apt install openjdk-8-jdk 安装问题 遇见Unable to locate package openjdk-8-jdk错误 1.添加 PPA 存储库 sudo add-apt-repository ppa:openjdk-r/ppa sudo apt update 2.重新尝试安装 sudo apt install openjdk8-jdk…

昇思学习打卡-3-张量Tensor

本章节系统的学习了张量的相关内容,张量是由若干个当坐标系改变时满足转换关系的分量组成的集合。它是一个可用来表示在一些矢量、标量和其他张量之间的线性关系的多线性函数。是一种类似于矩阵的特殊的数据结构。包括 创建张量的方式;张量的属性&#…