一、下载对应的kube-prometheus源码
github地址:GitHub - prometheus-operator/kube-prometheus: Use Prometheus to monitor Kubernetes and applications running on Kubernetes
1)进入目录
[root@k8s-master ~]# cd kube-prometheus
[root@k8s-master kube-prometheus]# ls
build.sh docs jsonnet manifests
CHANGELOG.md example.jsonnet jsonnetfile.json README.md
code-of-conduct.md examples jsonnetfile.lock.json RELEASE.md
CONTRIBUTING.md experimental kustomization.yaml scripts
DCO go.mod LICENSE sync-to-internal-registry.jsonnet
developer-workspace go.sum Makefile tests
2)可以看到有个manifests目录这里面是我们所需的yaml,并且先运行manifests目录下setup中的yaml文件
[root@k8s-master kube-prometheus]# cd manifests/
[root@k8s-master manifests]# ls
会看到一个setup的文件夹
我们先执行这个文件夹这个里面会为我们创建命名空间
和一些基础清单
[root@k8s-master manifests]# kubectl create -f setup/
3)修改prometheus,grafana,alertmanager的yaml文件修改端口暴露为nodeport模式 为了能从外网访问
修改prometheus-service.yaml,添加NodePort类型和端口
修改prometheus-service.yaml文件,添加NodePort类型和端口
修改alertmanager-service.yaml文件添加NodePort类型和端口
访问测试
通过以下命令查看相应的服务:
kubectl get svc -n monitoring
其中红框圈出的是比较关键的服务以及其对应的访问端口,但现在还无法访问grafan、prometheus以及alertmanger,因为prometheus operator内部默认配置了NetworkPolicy,需要删除其对应的资源,才可以通过外网访问:
cd .. #需要到manifests的同级目录下运行
kubectl delete -f manifests/prometheus-networkPolicy.yaml
kubectl delete -f manifests/grafana-networkPolicy.yaml
kubectl delete -f manifests/alertmanager-networkPolicy.yaml
删除后,通过服务器ip:服务端口的形式,即可访问对应的服务了,在此,kube-prometheus的部署彻底完成。
grafana的默认账号和密码:admin/admin
alertmanager的登录界面
prometheus的登录界面
问题:
镜像:registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.12.0替换为:v5cn/prometheus-adapter:v0.12.0
registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.13.0替换为:
quay.io/coreos/kube-state-metrics:latest
二、安装钉钉,并且配置钉钉
1)创建钉钉机器人
群聊设置中【添加机器人】-【自定义】
安全设置中勾选加签
需要保存Webhook和加签的秘钥,后面k3s往钉钉机器人群聊中发信息需要。
2)自定义机器人的监控配置文件
#cat dingtalk-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: dingtalk-config
namespace: monitoring
data:
config.yml: |-
templates:
- /data/lai/prometheus-webhook-dingtalk/contrib/k8s/config/template.tmpl #这个文件需要存在
targets:
webhook:
url: https://oapi.dingtalk.com/robot/send?access_token=cbc36a81873b58b2374becf8a33f9053e02692a114ac7ecc1cc451caf19792a6 #这个根据自己的钉钉配置
secret: SEC5d83c04905da4d00454782242d3e5d36857f6088ee284523041521d6cc025b0d #根据自己的钉钉配置
mention:
all: true #@所有人
webhook2:
url: https://oapi.dingtalk.com/robot/send?access_token=4df2745e8df1de6d0429e35caf15e03
secret: SECe079af795abd316a7e1f431ee8ebcf082cc0b0611a859da
template.tmpl: |-
{{ define "__subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}] {{ .GroupLabels.SortedPairs.Values | join " " }} {{ if gt (len .CommonLabels) (len .GroupLabels) }}({{ with .CommonLabels.Remove .GroupLabels.Names }}{{ .Values | join " " }}{{ end }}){{ end }}{{ end }}
{{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver }}{{ end }}
{{ define "__text_alert_list" }}{{ range . }}
**Labels**
{{ range .Labels.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Annotations**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**Source:** [{{ .GeneratorURL }}]({{ .GeneratorURL }})
{{ end }}{{ end }}
{{ define "default.__text_alert_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}
**运营团队:** {{ .Labels.team | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{ define "default.__text_alertresovle_list" }}{{ range . }}
---
**告警级别:** {{ .Labels.severity | upper }}
**运营团队:** {{ .Labels.team | upper }}
**触发时间:** {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**结束时间:** {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
**事件信息:**
{{ range .Annotations.SortedPairs }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}
**事件标签:**
{{ range .Labels.SortedPairs }}{{ if and (ne (.Name) "severity") (ne (.Name) "summary") (ne (.Name) "team") }} - {{ .Name }}: {{ .Value | markdown | html }}
{{ end }}{{ end }}
{{ end }}
{{ end }}
{{/* Default */}}
{{ define "default.title" }}{{ template "__subject" . }}{{ end }}
{{ define "default.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ if gt (len .Alerts.Firing) 0 -}}
{{ template "default.__text_alert_list" .Alerts.Firing }}
{{- end }}
{{ if gt (len .Alerts.Resolved) 0 -}}
{{ template "default.__text_alertresovle_list" .Alerts.Resolved }}
{{- end }}
{{- end }}
{{/* Legacy */}}
{{ define "legacy.title" }}{{ template "__subject" . }}{{ end }}
{{ define "legacy.content" }}#### \[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}\] **[{{ index .GroupLabels "alertname" }}]({{ template "__alertmanagerURL" . }})**
{{ template "__text_alert_list" .Alerts.Firing }}
{{- end }}
{{/* Following names for compatibility */}}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
#cat dingtalk-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: monitoring
labels:
app: dingtalk
annotations:
prometheus.io/scrape: 'false'
spec:
selector:
app: dingtalk
ports:
- name: dingtalk
port: 8060
protocol: TCP
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: dingtalk
template:
metadata:
name: dingtalk
labels:
app: dingtalk
spec:
containers:
- name: dingtalk
image: timonwong/prometheus-webhook-dingtalk:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8060
volumeMounts:
- name: config
mountPath: /etc/prometheus-webhook-dingtalk
volumes:
- name: config
configMap:
name: dingtalk-config
3、启动
kubectl apply -f dingtalk-config.yaml -f dingtalk-deployment.yaml
kubectl get pod -n monitoring
4、配置alertmanager-secret.yaml
#cat alertmanager-secret.yaml
apiVersion: v1
kind: Secret
metadata:
labels:
app.kubernetes.io/component: alert-router
app.kubernetes.io/instance: main
app.kubernetes.io/name: alertmanager
app.kubernetes.io/part-of: kube-prometheus
app.kubernetes.io/version: 0.23.0
name: alertmanager-main
namespace: monitoring
stringData:
alertmanager.yaml: |-
"global":
"resolve_timeout": "5m"
"receivers":
- "name": "Webhook"
"webhook_configs":
- "url": "http://dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook/send"
"route":
"group_by":
- "namespace"
"group_wait": "30s" #组告警等待时间,也就是告警产生后等待30s,如果有同一组告警一起发出
"receiver": "Webhook"
"repeat_interval": "2m" #重复告警的间隔时间,减少报警发送频率
"routes":
- "matchers":
- "alertname = Webhook"
"receiver": "Webhook"
type: Opaque
5.启动
kubectl apply -f alertmanager-secret.yaml
参考:
kube-prometheus部署(无坑版)-CSDN博客
Prometheus监控K8S集群并实现告警
bilibili视频:
让你快速入门Prometheus监控并实现邮箱报警_哔哩哔哩_bilibili
k8s 1.23.1 部署 prometheus 钉钉推送 自定义监控配置 promql基础语法_kube-prometheus 钉钉-CSDN博客