文章目录
- 前言
- 创建用户
- 复制Token
- 配置文件
- 全局配置
- Master节点发现
- Node节点发现
- Namespace Pod发现
- 自定义Pod发现
前言
在上一章节介绍了 8-5 在Prometheus实现Kubernetes-apiserver及Coredns服务发现 基于K8s集群内部安装的Prometheus,添加服务发现时更加方便。Prometheus的安装方式有多种,详情参考 8-1 基于Operator和二进制安装Prometheus系统。
对于二进制部署的Prometheus,即集群外部的监控系统。配置服务发现时涉及到创建用户,授权,添加job,重写标签等。
创建用户
创建用户prometheus和密码:
---
# 创建用户
apiVersion: v1
kind: ServiceAccount
metadata:
name: prometheus
namespace: monitoring
---
# 创建密码
apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
name: monitoring-token
namespace: monitoring
annotations:
kubernetes.io/service-account.name: "prometheus"
设置权限,并将用户与权限绑定:
---
# 设置权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
# 对于基本资源可读可观察
verbs:
- get
- list
- watch
- apiGroups:
- "extensions"
resources:
- ingresses
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
# 配置资源只读
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
---
# 绑定权限
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: prometheus
subjects:
- kind: ServiceAccount
name: prometheus
namespace: monitoring
复制Token
在K8s集群内查看secret 资源,复制token的值:
sudo kubectl describe secret monitoring-token -n monitoring
Name: monitoring-token
Namespace: monitoring
Labels: <none>
Annotations: kubernetes.io/service-account.name: prometheus
kubernetes.io/service-account.uid: da94b15f-55bb-4eba-9d20-52f0b33a9852
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1302 bytes
namespace: 10 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkF3Y3h6QklHbXh3S1g3Nl9LNlBIcTNTVEQ3MWpJRU9NcEdJM2hTZDU4SzgifQ.eyJpc3
MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5poTd1njdbdMYlmaUuAPIT_5hY5D3pRgabQ6tysWc0QuFN_mn6U-E
nbBlka6ZUB3gjlvk4XBKZJutqHyFHtkc6RYN98kKSPeRBCXFd8vZROx9PsOjL1uIseox4IeaZ8BvGje3RkGHiyTp_djmc8eyBBA6DwtKKldsd
3hhuD0eX2hbbg2YZVbiYOkLK976gL5pX_8BPQeZ66McDTCPlaoiYOIcegVGwZs49kA4YlYV_A5bO8WUSvnKQfPK_74qLy0BGp-rx0gjTc7w
到K8s集群外的Prometheus服务器,粘贴token的值:
vim /apps/prometheus/k8s.token
eyJhbGciOiJSUzI1NiIsImtpZCI6IkF3Y3h6QklHbXh3S1g3Nl9LNlBIcTNTVEQ3MWpJRU9NcEdJM2hTZDU4SzgifQ.eyJpc3MiOiJrdWJlcm5l
GVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5poTd1njdbdMYlmaUuAPIT_5hY5D3pRgabQ6tysWc0QuFN_mn6U-EnbBlka6ZUB3gjlvk4
XBKZJutqHyFHtkc6RYN98kKSPeRBCXFd8vZROx9PsOjL1uIseox4IeaZ8BvGje3RkGHiyTp_djmc8eyBBA6DwtKKldsd3hhuD0eX2hbbg2YZVbi
OkLK976gL5pX_8BPQeZ66McDTCPlaoiYOIcegVGwZs49kA4YlYV_A5bO8WUSvnKQfPK_74qLy0BGp-rx0gjTc7w
配置文件
修改Prometheus全局配置,再依次添加收集node,pod,service,endpoint等工作。
全局配置
在二进制部署的Prometheus服务器,找出配置文件并修改:
vim /apps/prometheus/prometheus.yml
# my global config
global:
# 每15秒收集一次信息
scrape_interval: 15s
# 每15秒刷新一次规则
evaluation_interval: 15s
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'
# 示例工作,收集服务器自己的运行指数:
static_configs:
- targets: ["localhost:9090"]
Master节点发现
在上述通用配置文件prometheus.yml,末尾添加API Server工作:
# API Serevr 节点发现
- job_name: 'kubernetes-apiservers-monitor'
kubernetes_sd_configs:
- role: endpoints
# 填写一个master即可,会自动发现三个。
api_server: https://192.168.100.191:6443
# 连续该master所需的token
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
scheme: https
# 连续其它master所需的token
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
relabel_configs:
# 配置这些类型保留采集
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default; kubernetes; https
# 替换发现的服务端口、协议等
- source_labels: [__address__]
regex: '(.*):6443'
replacement: '${1}:9100'
target_label: __address__
action: replace
- source_labels: [__scheme__]
regex: https
replacement: http
target_label: __scheme__
action: replace
可以看到服务端口和协议变成了9100和http:
Node节点发现
在通用配置文件prometheus.yml,末尾添加Node节点发现:
# node 节点发现
- job_name: 'kubernetes-nodes-monitor'
# 通过连接master,获取集群node信息
kubernetes_sd_configs:
- role: node
api_server: https://192.168.100.192:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
# 连接api-server所需的token
scheme: http
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
relabel_configs:
- source_labels: [__address__]
# 10250是kubelet端口,即node节点。
regex: '(.*):10250'
# 转换成exporter端口9100,采集节点信息。
replacement: '${1}:9100'
target_label: __address__
action: replace
- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
regex: '(.*)'
replacement: '${1}'
action: replace
target_label: LOC
- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
regex: '(.*)'
replacement: 'NODE'
action: replace
target_label: Type
- source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
regex: '(.*)'
replacement: 'K8S-test'
action: replace
target_label: Env
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
master也有kubelet,所以集群有6个节点:
Namespace Pod发现
在通用配置文件prometheus.yml,末尾添加Namespace Pod发现:
#指定namespace的pod
- job_name: 'kubernetes-namespace-pod'
kubernetes_sd_configs:
- role: pod
api_server: https://192.168.100.193:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
# 选择命名空间为monitoring
namespaces:
names:
- monitoring
relabel_configs:
# 保留这些标签和值
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
# 更换标签
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
可以发现7个实例有6个可采集,有1个pod是down因为没有安装cadvisor。
自定义Pod发现
要发现自定义Pod,首先创建Pod时要添加annotation_prometheus_io_scrape,值为true:
apiVersion: apps/v1
kind: Deployment
...
spec:
template:
metadata:
annotations:
prometheus.io/scrape: 'true'
...
然后在通用配置文件prometheus.yml,末尾添加自定义Pod发现:
# 自定义Pod发现
- job_name: 'kubernetes-condition-pod'
kubernetes_sd_configs:
- role: pod
api_server: https://192.168.100.191:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
relabel_configs:
# 开启scrape的Pod才保留监控
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- action: labelmap
# 保留以下开头的标签和值
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
# 修改标签
action: replace
target_label: kubernetes_pod_name
- source_labels: [__meta_kubernetes_pod_label_pod_template_hash]
regex: '(.*)'
replacement: 'K8S-test'
action: replace
target_label: Env
Prometheus已经收集符合条件的6个pod,但状态都是down。还需要在pod安装Prometheus插件,监控才能正常显示。