k8s集群部署vmalert和prometheusalert实现钉钉告警

news2025/1/11 21:42:38

先决条件

安装以下软件包:git, kubectl, helm, helm-docs,请参阅本教程。

1、安装 helm

wget https://xxx-xx.oss-cn-xxx.aliyuncs.com/helm-v3.8.1-linux-amd64.tar.gz
tar xvzf helm-v3.8.1-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin
rm -rf linux-amd64

2、安装victoria-metrics-alert

(1)使用以下命令添加 helm chart存储库

helm repo add vm https://victoriametrics.github.io/helm-charts/

helm repo update

(2)列出vm/victoria-metrics-alert可供安装的helm版本

helm search repo vm/victoria-metrics-alert -l

(3)victoria-metrics-alert将图表的默认值导出到文件values.yaml

helm show values vm/victoria-metrics-alert > values.yaml

(4)根据环境需要更改values.yaml文件中的值,完整配置参考如下

# Default values for victoria-metrics-alert.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

serviceAccount:
  # Specifies whether a service account should be created
  create: true
  # Annotations to add to the service account
  annotations: {}
  # The name of the service account to use.
  # If not set and create is true, a name is generated using the fullname template
  name:
  # mount API token to pod directly
  automountToken: true

imagePullSecrets: []

rbac:
  create: true
  pspEnabled: true
  namespaced: false
  extraLabels: {}
  annotations: {}

server:
  name: server
  enabled: true
  image:
    repository: victoriametrics/vmalert
    tag: "" # rewrites Chart.AppVersion
    pullPolicy: IfNotPresent
  nameOverride: ""
  fullnameOverride: ""

  ## See `kubectl explain poddisruptionbudget.spec` for more
  ## ref: https://kubernetes.io/docs/tasks/run-application/configure-pdb/
  podDisruptionBudget:
    enabled: false
    # minAvailable: 1
    # maxUnavailable: 1
    labels: {}

  # -- Additional environment variables (ex.: secret tokens, flags) https://github.com/VictoriaMetrics/VictoriaMetrics#environment-variables
  env:
    []
    # - name: VM_remoteWrite_basicAuth_password
    #   valueFrom:
    #     secretKeyRef:
    #       name: auth_secret
    #       key: password

  replicaCount: 1

  # deployment strategy, set to standard k8s default
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%

  # specifies the minimum number of seconds for which a newly created Pod should be ready without any of its containers crashing/terminating
  # 0 is the standard k8s default
  minReadySeconds: 0

  # vmalert reads metrics from source, next section represents its configuration. It can be any service which supports
  # MetricsQL or PromQL.
  datasource:
    url: "http://192.168.47.9:8481/select/0/prometheus/"
    basicAuth:
      username: ""
      password: ""

  remote:
    write:
      url: ""
    read:
      url: ""

  notifier:
    alertmanager:
      url: "http://192.168.112.68:9093"

  extraArgs:
    envflag.enable: "true"
    envflag.prefix: VM_
    loggerFormat: json

  # Additional hostPath mounts
  extraHostPathMounts:
    []
    # - name: certs-dir
    #   mountPath: /etc/kubernetes/certs
    #   subPath: ""
    #   hostPath: /etc/kubernetes/certs
  #   readOnly: true

  # Extra Volumes for the pod
  extraVolumes:
    []
     #- name: example
     #  configMap:
     #    name: example

  # Extra Volume Mounts for the container
  extraVolumeMounts:
    []
    # - name: example
    #   mountPath: /example

  extraContainers:
    []
    #- name: config-reloader
    #  image: reloader-image

  service:
    annotations: {}
    labels: {}
    clusterIP: ""
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []
    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    servicePort: 8880
    type: ClusterIP
    # Ref: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
    # externalTrafficPolicy: "local"
    # healthCheckNodePort: 0

  ingress:
    enabled: false
    annotations: {}
    #   kubernetes.io/ingress.class: nginx
    #   kubernetes.io/tls-acme: 'true'

    extraLabels: {}
    hosts: []
    #   - name: vmselect.local
    #     path: /select
    #     port: http

    tls: []
    #   - secretName: vmselect-ingress-tls
    #     hosts:
    #       - vmselect.local

    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # -- pathType is only for k8s >= 1.1=
    pathType: Prefix

  podSecurityContext: {}
  # fsGroup: 2000

  securityContext:
    {}
    # capabilities:
    #   drop:
    #   - ALL
    # readOnlyRootFilesystem: true
    # runAsNonRoot: true
  # runAsUser: 1000

  resources:
    {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #   cpu: 100m
    #   memory: 128Mi
    # requests:
    #   cpu: 100m
  #   memory: 128Mi

  # Annotations to be added to the deployment
  annotations: {}
  # labels to be added to the deployment
  labels: {}

  # Annotations to be added to pod
  podAnnotations: {}

  podLabels: {}

  nodeSelector: {}

  priorityClassName: ""

  tolerations: []

  affinity: {}

  # vmalert alert rules configuration configuration:
  # use existing configmap if specified
  # otherwise .config values will be used
  configMap: ""
  config:
    alerts:
      groups:
          - name: 磁盘挂载错误
            rules:
            - alert: 磁盘挂载错误
              expr: mount_error == 1
              for: 1m
              labels:
                level: 1
                severity: warning
              annotations:
                description: "{{$labels.job}}链{{$labels.instance}}节点磁盘挂载错误!"

serviceMonitor:
  enabled: false
  extraLabels: {}
  annotations: {}
#    interval: 15s
#    scrapeTimeout: 5s
  # -- Commented. HTTP scheme to use for scraping.
#    scheme: https
  # -- Commented. TLS configuration to use when scraping the endpoint
#    tlsConfig:
#      insecureSkipVerify: true

alertmanager:
  enabled: true
  replicaCount: 1
  podMetadata:
    labels: {}
    annotations: {}
  image: prom/alertmanager
  tag: v0.20.0
  retention: 120h
  nodeSelector: {}
  priorityClassName: ""
  resources: {}
  tolerations: []
  imagePullSecrets: []
  podSecurityContext: {}
  extraArgs: {}
  # key: value

  # external URL, that alertmanager will expose to receivers
  baseURL: ""
  # use existing configmap if specified
  # otherwise .config values will be used
  configMap: ""
  config:
    global:
      resolve_timeout: 5m
    route:
      # default receiver
      receiver: ops_notify
      # tag to group by
      group_by: [alertname]
      # How long to initially wait to send a notification for a group of alerts
      group_wait: 30s
      # How long to wait before sending a notification about new alerts that are added to a group
      group_interval: 60s
      # How long to wait before sending a notification again if it has already been sent successfully for an alert
      repeat_interval: 1h
    receivers:
      - name: ops_notify
        webhook_configs:
        - url: http://192.168.157.59:8080/prometheusalert?type=dd&tpl=prometheus-dd&split=false
          send_resolved: true
    inhibit_rules:
      - source_match:
          severity: 'warning'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'job']

  templates: {}
  #  alertmanager.tmpl: |-
  service:
    annotations: {}
    type: ClusterIP
    port: 9093
    # if you want to force a specific nodePort. Must be use with service.type=NodePort
    # nodePort:
  ingress:
    enabled: false
    annotations: {}
    #   kubernetes.io/ingress.class: nginx
    #   kubernetes.io/tls-acme: 'true'
    extraLabels: {}
    hosts: []
    #   - name: alertmanager.local
    #     path: /
    #     port: web

    tls: []
    #   - secretName: alertmanager-ingress-tls
    #     hosts:
    #       - alertmanager.local

    # For Kubernetes >= 1.18 you should specify the ingress-controller via the field ingressClassName
    # See https://kubernetes.io/blog/2020/04/02/improvements-to-the-ingress-api-in-kubernetes-1.18/#specifying-the-class-of-an-ingress
    # ingressClassName: nginx
    # -- pathType is only for k8s >= 1.1=
    pathType: Prefix
  persistentVolume:
    # -- Create/use Persistent Volume Claim for alertmanager component. Empty dir if false
    enabled: false
    # -- Array of access modes. Must match those of existing PV or dynamic provisioner. Ref: [http://kubernetes.io/docs/user-guide/persistent-volumes/](http://kubernetes.io/docs/user-guide/persistent-volumes/)
    accessModes:
      - ReadWriteOnce
    # -- Persistant volume annotations
    annotations: {}
    # -- StorageClass to use for persistent volume. Requires alertmanager.persistentVolume.enabled: true. If defined, PVC created automatically
    storageClass: ""
    # -- Existing Claim name. If defined, PVC must be created manually before volume will be bound
    existingClaim: ""
    # -- Mount path. Alertmanager data Persistent Volume mount root path.
    mountPath: /data
    # -- Mount subpath
    subPath: ""
    # -- Size of the volume. Better to set the same as resource limit memory property.
    size: 50Mi

(5)使用命令测试安装:

helm install vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics --debug --dry-run

(6)使用以下命令安装

helm install vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics

(7)通过运行以下命令获取 pod 列表

kubectl get pods -A | grep 'alert'

(8)通过运行以下命令获取应用程序

helm list -f vmalert -n victoria-metrics

(9)使用命令查看应用程序版本的历史记录vmalert

helm history vmalert -n victoria-metrics

(10)更新配置

cd /root/vmalert
#修改value.yaml文件
helm upgrade vmalert vm/victoria-metrics-alert -f values.yaml -n victoria-metrics

(11)查看service

kubectl get svc -n victoria-metrics

3、安装prometheusalert

(1)使用helm部署

git clone https://github.com/feiyu563/PrometheusAlert.git
cd PrometheusAlert/example/helm/prometheusalert
#如需修改配置文件,请更新config中的app.conf
helm install -n victoria-metrics prometheus-alert .

(2)values.yaml配置文件参考

cat /root/PrometheusAlert/example/helm/prometheusalert/values.yaml

# Default values for prometheusalert.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

global:
  imagePullSecrets: []
    # - name: "registry-secret"

replicaCount: 1

image:
  # 支持配置自定义模版需要重出镜像,或者使用本人构建镜像:lusson/prometheusalert:v1.0
  repository: feiyu563/prometheus-alert:v4.8
  pullPolicy: IfNotPresent

nameOverride: ""
fullnameOverride: ""

service:
  type: ClusterIP
  port: 8080

ingress:
  enabled: true
  annotations: {}
    # kubernetes.io/ingress.class: nginx
    # kubernetes.io/tls-acme: "true"
  hosts:
    - host: prometheusalert.xxxxx.com
      paths: ["/"]

  tls: []

resources:
  limits:
   cpu: 1000m
   memory: 1024Mi
  requests:
   cpu: 100m
   memory: 128Mi

nodeSelector: {}

tolerations: []

affinity: {}

(3)app.conf配置参考

cat /root/PrometheusAlert/example/helm/prometheusalert/config/app.conf

(4)ingress.yaml配置参考

cat /root/PrometheusAlert/example/helm/prometheusalert/templates/ingress.yaml

{{- if .Values.ingress.enabled -}}
{{- $fullName := include "prometheusalert.fullname" . -}}
{{- $svcPort := .Values.service.port -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ $fullName }}
  labels:
{{ include "prometheusalert.labels" . | indent 4 }}
  {{- with .Values.ingress.annotations }}
  annotations:
    {{- toYaml . | nindent 4 }}
  {{- end }}
spec:
{{- if .Values.ingress.tls }}
  tls:
  {{- range .Values.ingress.tls }}
    - hosts:
      {{- range .hosts }}
        - {{ . | quote }}
      {{- end }}
      secretName: {{ .secretName }}
  {{- end }}
{{- end }}
  rules:
  {{- range .Values.ingress.hosts }}
    - host: {{ .host | quote }}
      http:
        paths:
        {{- range .paths }}
          - path: {{ . }}
            pathType: Prefix
            backend:
              service:
                name: {{ $fullName }}
                port:
                  number: {{ $svcPort }}
        {{- end }}
  {{- end }}
{{- end }}

(5)更新配置

cd /root/PrometheusAlert/example/helm/prometheusalert
helm upgrade -n victoria-metrics prometheus-alert .

(6)重启pod

删除Pod
helm delete prometheus-alert -n victoria-metrics

查看pods和service
kubectl get pods -n victoria-metrics
kubectl get svc -n victoria-metrics

重新安装
helm install -n victoria-metrics prometheus-alert .

查看pods和service
kubectl get pods -n victoria-metrics
kubectl get svc -n victoria-metrics

(1)告警测试

模板内容:

{{ $var := .externalURL}}{{ range $k,$v:=.alerts }}
{{if eq $v.status "resolved"}}
## [巡检恢复信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.level}}
###### 开始时间:{{$v.startsAt}}
###### 故障主机:{{$v.labels.instance}}
##### {{$v.annotations.description}}
{{else}}
## [巡检告警信息]({{$v.generatorURL}})
#### [{{$v.labels.alertname}}]({{$var}})
###### 告警级别:{{$v.labels.level}}
###### 开始时间:{{$v.startsAt}}
###### 故障主机:{{$v.labels.instance}}
##### {{$v.annotations.description}}
{{end}}
{{ end }}

(2)查看日志

(3)查看钉钉告警

参考文档:https://github.com/VictoriaMetrics/helm-charts/tree/master/charts/victoria-metrics-alert

参考文档:https://github.com/feiyu563/PrometheusAlert/tree/master/example/helm/prometheusalert

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/881453.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

从零开始的机械臂yolov5抓取gazebo仿真(二)

使用moveit_setup_assistant配置机械臂(上) 观察机械臂模型 上一节中拿到了sunday_description功能包,将功能包放进工作空间进行编译,可将工作空间路径写进.bashrc文件中,这样就不必每次都source了 例如&#xff1a…

I2C连续读写实现

IIC系列文章: (1)I2C 接口控制器理论讲解 (2)I2C接口控制设计与实现 (3)I2C连续读写实现 文章目录 前言一、 i2c_bit_shift 模块分析二、 i2c_control 模块实现三、 i2c_control 模块仿真测试前言 上文的 i2c_bit_shift 模块说完了,我们发现实现一个字节的写操作还是可以实现…

为什么CAN要采取双绞线布局?

摘要: 在CAN总线应用中,一般建议使用屏蔽双绞线进行组网、布线,本文将详细讲解为什么CAN总线要采取双绞线的布局。 CAN(Controller Area Network)是一种用于实时应用的串行通讯协议总线,它可以使用双绞线来…

【解决】Kafka Exception thrown when sending a message with key=‘null‘ 异常

问题原因: 如下图,kafka 中配置的是监听域名的方式,但程序里使用的是 ip:port 的连接方式。 解决办法: kafka 中配置的是域名的方式,程序里也相应配置成 域名:port 的方式(注意:本地h…

Medical Isolated Power Supply System in Angola

安科瑞 华楠 Abstract: Diagnosis and treatment in modern hospitals are inseparable from advanced medical equipment, which are inseparable from safe and reliable power supply. Many operations often last for several hours, and the consequences of a sudden pow…

js 构造函数

js 构造函数 new Pig() ---- 创建新的空对象 this 指向新对象 this.name name --------修改this,添加新的属性。 最后返回新的对象

C++遍历std::tuple(C++14 ~ C++20)

本文展示了遍历std::tuple的方式&#xff1a; 首先比较容易想到的是利用C14的std::make_index_sequence与std::get结合取值&#xff0c;然后配合std::initializer_list进行包展开&#xff1a; // since C14 class Func0 {template<typename T, typename F, size_t... I>…

bert,transformer架构图及面试题

Transformer详解 - mathor atten之后经过一个全连接层残差层归一化 class BertSelfOutput(nn.Module):def __init__(self, config):super().__init__()self.dense nn.Linear(config.hidden_size, config.hidden_size)self.LayerNorm nn.LayerNorm(config.hidden_size, epscon…

疫情打卡 vue+springboot疾病防控管理系统java jsp源代码

本项目为前几天收费帮学妹做的一个项目&#xff0c;Java EE JSP项目&#xff0c;在工作环境中基本使用不到&#xff0c;但是很多学校把这个当作编程入门的项目来做&#xff0c;故分享出本项目供初学者参考。 一、项目描述 疫情打卡 vuespringboot 系统有1权限&#xff1a;管理…

vue3+ts-tsconfig.json报错Option ‘importsNotUsedAsValues’

vue3ts-tsconfig.json报错Option ‘importsNotUsedAsValues’ is deprecated and will stop functioning in TypeScript 5.5. Specify compilerOption ‘“ignoreDeprecations”: “5.0”’ to silence this error. Use ‘verbatimModuleSyntax’ instead 自我记录 翻译 选项…

网工软考 | 软考到底考哪个证比较好?

最近来咨询软考的同学比较多&#xff0c;对软考有哪些证书&#xff0c;怎么来选择比较困扰&#xff0c;目前刚好是学习的最佳阶段。 本期就统一来解答一下。 01 软考分为五个专业 计算机软件、计算机网络、计算机应用技术、信息系统和信息服务共五个专业&#xff0c;并在各专…

股市杠杆操作是什么意思?从三方面分析

股市杠杆操作是指投资者通过借用资金进行证券交易&#xff0c;以放大投资回报的一种金融工具。这种操作可以使投资者借用额外的资金进行交易&#xff0c;增加投资收益的潜力&#xff0c;但也伴随着更高的风险。下面从三个方面对股市杠杆操作进行分析。 1. 操作原理和优势&#…

Nginx的块、变量以及重定向

目录 绪论 1、location匹配 1.1 常见的Nginx正则表达式 1.2 正则表达式&#xff1a;匹配的是文件内容 1.3 location匹配uri 1.4 location常用的匹配规则 1.5 location优先级 1.6 匹配小结 1.7 生产环境中的匹配规则 2、nginx的内置变量 3、rewrite 3.1 rewrite作用 …

Spring中bean生命周期的PostProcessor的每个方法的作用

可结合这个博客看 https://blog.csdn.net/riemann_/article/details/118500805、https://cloud.tencent.com/developer/article/1409315、https://blog.csdn.net/qq_43631716/article/details/120239438 本篇内容借鉴于chatgtp,应该有错误&#xff0c;仅作方法应用的参考&#…

python爬虫——爬取天气预报信息

在本文中&#xff0c;我们将学习如何使用代理IP爬取天气预报信息。我们将使用 Python 编写程序&#xff0c;并使用 requests 和 BeautifulSoup 库来获取和解析 HTML。此外&#xff0c;我们还将使用代理服务器来隐藏我们的 IP 地址&#xff0c;以避免被目标网站封禁。 1. 安装必…

数据结构与算法-栈(LIFO)(经典面试题)

一&#xff1a;面试经典 1. 如何设计一个括号匹配的功能&#xff1f;比如给你一串括号让你判断是否符合我们的括号原则&#xff0c; 栈 力扣 2. 如何设计一个浏览器的前进和后退功能&#xff1f; 思想&#xff1a;两个栈&#xff0c;一个栈存放前进栈&…

【BERTopic应用 03/3】:微调参数

一、说明 一般来说&#xff0c;BERTopic 在开箱即用的模型中工作得很好。但是&#xff0c;当您有数百万个数据要处理时&#xff0c;使用基本模型处理数据可能需要一些时间。在这篇文章中&#xff0c;我将向您展示如何微调BERTopic中的一些参数并比较它们的结果。让我们潜入。 二…

简单学生信息管理系统springboot,ssm,ssh学生教师java jsp源代码

本项目为前几天收费帮学妹做的一个项目&#xff0c;Java EE JSP项目&#xff0c;在工作环境中基本使用不到&#xff0c;但是很多学校把这个当作编程入门的项目来做&#xff0c;故分享出本项目供初学者参考。 一、项目描述 简单学生信息管理系统springboot,ssm,ssh 系统有1权限…

【C++】移动构造函数

2023年8月15日 概述 移动构造函数是一个特殊的构造函数&#xff0c;用于从一个对象中移动&#xff08;转移&#xff09;资源到另一个对象&#xff0c;而不是进行复制操作。它通常与右值引用一起使用&#xff0c;以实现高效的资源转移&#xff0c;提高性能。 语法 class MyCla…

操作系统-笔记-第二章

目录 二、第二章——【进程】 1、进程的概念 &#xff08;1&#xff09;PID & PCD 进程控制块 &#xff08;2&#xff09;程序段 & 数据段 &#xff08;3&#xff09;特征 &#xff08;特性&#xff09; property &#xff08;4&#xff09;总结 2、进程的状态 …