问题
书接上回,对EKS(AWS云k8s)启用AMP(AWS云Prometheus)监控+AMG(AWS云 grafana),上次我们只是配通了EKS+AMP+AMG的监控路径。这次使用一位大卫老师的grafana的面板,具体地址如下:
https://grafana.com/grafana/dashboards/15757-kubernetes-views-global/
安装kube-state-metrics
为了想Prometheus暴露一些有用的性能指标,需要在k8s集群中,安装kube-state-metrics。
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install kube-state-metrics prometheus-community/kube-state-metrics -n kube-system
测试验证:
kubectl port-forward svc/kube-state-metrics -n kube-system 8080:8080
使用PromQL测试:
count(kube_pod_status_ready{condition="false"}) by (namespace, pod)
prometheus配置
scrape_configs:
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
安装 prometheus-node-exporter
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install prometheus-node-exporter prometheus-community/prometheus-node-exporter -n kube-system
测试:
export POD_NAME=$(kubectl get pods --namespace kube-system -l "app.kubernetes.io/name=prometheus-node-exporter,app.kubernetes.io/instance=prometheus-node-exporter" -o jsonpath="{.items[0].metadata.name}")
kubectl port-forward --namespace kube-system $POD_NAME 9100
prometheus配置
scrape_configs:
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: replace
source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
整体prometheus配置
global:
scrape_interval: 30s
# external_labels:
# clusterArn: <REPLACE_ME>
scrape_configs:
# pod metrics
- job_name: pod_exporter
kubernetes_sd_configs:
- role: pod
# container metrics
- job_name: cadvisor
scheme: https
authorization:
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubernetes.default.svc:443
target_label: __address__
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
# apiserver metrics
- bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
job_name: kubernetes-apiservers
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- action: keep
regex: default;kubernetes;https
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_service_name
- __meta_kubernetes_endpoint_port_name
scheme: https
# kube proxy metrics
- job_name: kube-proxy
honor_labels: true
kubernetes_sd_configs:
- role: pod
relabel_configs:
- action: keep
source_labels:
- __meta_kubernetes_namespace
- __meta_kubernetes_pod_name
separator: '/'
regex: 'kube-system/kube-proxy.+'
- source_labels:
- __address__
action: replace
target_label: __address__
regex: (.+?)(\\:\\d+)?
replacement: $1:10249
# kube-state-metrics
- job_name: kube-state-metrics
honor_timestamps: true
scrape_interval: 1m
scrape_timeout: 1m
metrics_path: /metrics
scheme: http
static_configs:
- targets:
- kube-state-metrics.kube-system.svc.cluster.local:8080
# node-exporter
- job_name: 'node-exporter'
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: replace
source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
这里需要重新创建一个抓取程序。
效果
参考
- grafana-dashboards-kubernetes
- kube-state-metrics
- Monitoring Kubernetes Clusters with kube-state-metrics
- kube-state-metrics公共指标
- Kubernetes 对象状态的指标
- helm-charts/charts/kube-state-metrics
- Prometheus 结合 Node Exporter 监控 Kubernetes 集群节点