k8s 1.28.2 集群部署 Thanos 对接 MinIO 实现 Prometheus 数据长期存储

news2024/11/26 12:17:52

文章目录

    • @[toc]
    • 什么是 Thanos
    • Thanos 的主要功能
    • Thanos 的架构组件
    • Thanos 部署架构
      • Sidecar
      • Receive
      • 架构选择
    • 开始部署
      • 部署架构
      • 创建 namespace
      • node-exporter 部署
      • kube-state-metrics 部署
      • Prometheus + Thanos-Sidecar 部署
        • 固定节点创建 label
        • 生成 secret
          • MinIO 配置
          • etcd 证书
        • 启动 Prometheus + Thanos-Sidecar
      • Thanos-store-gateway 部署
      • Thanos-compact 部署
      • Thanos-query 部署
      • Thanos-query-globle 部署
      • Thanos-query-frontend 部署
      • Grafana 部署
      • 增加 Thanos 和 MinIO 监控
    • Grafana dashboard
      • coredns
      • etcd
      • Thanos
      • node-exporter
    • 最后

什么是 Thanos

  • Thanos 官网
  • Thanos quay.io 镜像仓库
  • Thanos Github

Thanos 是一个强大的 Prometheus 扩展解决方案,能够解决 Prometheus 在大规模环境下的存储、扩展性和高可用性问题。

它非常适合大规模集群监控需求,尤其是需要长期存储监控数据和全局查询。

Thanos 的主要功能

  • 全局查询(Global Query View)
    • 通过其 Querier 组件提供从多个 Prometheus 实例查询的能力,并能对跨多个数据源进行全局去重查询
    • 即使在大规模集群中运行多个 Prometheus 实例,用户也可以从一个接口统一查询所有的监控数据
  • 长期存储(Unlimited Retention)
    • Prometheus 默认只适用于短期数据存储,而 Thanos 提供了将监控数据推送到长期存储(如 Amazon S3、Google Cloud Storage、MinIO 等对象存储)的能力
  • Prometheus 集成(Prometheus Compatible)
    • Grafana 和其他支持 Prometheus 查询 API 的工具都可以通过 Thanos 查询 Prometheus 数据
  • 数据压缩与去重(Downsampling & Compaction)
    • Thanos 的 Compactor 组件会定期对存储在对象存储中的数据进行压缩、去重和优化,以减少存储开销并提高查询性能

Thanos 的架构组件

遵循 KISS 和 Unix 理念,Thanos 由一组组件组成,每个组件都扮演一个特定的角色

  • Sidecar
    • 与每个 Prometheus 实例一起部署,负责将数据推送到对象存储,并暴露出 Prometheus 的数据给 Querier
  • Store Gateway
    • 简称为 Store,专门用于从对象存储(如 AWS S3、Google Cloud Storage、MinIO 等)中检索历史监控数据的组件
  • Compactor
    • 负责对存储在对象存储中的数据进行压缩、去重和优化,提升查询性能并减少存储开销
  • Receiver
    • 专门用于接收和存储 Prometheus 实例通过 Remote Write 发送数据的组件(强烈建议使用 Prometheus v2.13.0+,因为它的远程读取功能得到了改进。)
  • Ruler/Rule
    • 类似 Prometheus 的 Alertmanager,它允许用户基于存储的数据执行告警和规则评估
  • Querier/Query
    • 一个用于全局查询的组件,能够从多个 Prometheus 实例和对象存储中提取数据,并提供统一的查询接口
  • Query Frontend
    • Query 的前端页面,通过查询分片缓存请求队列等机制,加速复杂查询,并提升查询在高负载环境下的响应速度

Thanos 部署架构

Sidecar

Sidecar 使用 Prometheus 的 reload 接口。确保 Prometheus 启用 --web.enable-lifecycle 参数

在这里插入图片描述

  • 优点
    • 轻量级:Sidecar 是一个轻量的代理,只需要运行在 Prometheus 实例旁边即可,无需对 Prometheus 进行大的改动。
    • 实时数据访问:Sidecar 允许 Thanos 直接访问 Prometheus 的实时监控数据,保证了最新监控信息的可查询性。
    • 长期存储集成:可以将 Prometheus 的数据定期上传到对象存储,解决了 Prometheus 原生不具备长期存储的缺陷。
  • 缺点
    • 依赖 Prometheus:Sidecar 必须依赖于运行的 Prometheus 实例,如果 Prometheus 实例宕机,Sidecar 也无法提供数据查询功能。
    • 水平扩展有限:Sidecar 并不设计用于大规模数据接收,它主要是作为 Prometheus 的配套组件,无法像 Receiver 那样水平扩展来处理大量的数据。

Receive

在这里插入图片描述

  • 优点

    • 大规模数据接收:Receiver 能够高效接收大量来自 Prometheus 实例的数据,适用于大规模部署。
    • 多租户支持:可以处理和隔离多个租户的数据,在需要监控多个独立环境时非常有用。
    • 水平扩展:通过数据分片和扩展 Receiver 实例,能够处理越来越多的数据接收任务。
    • 去重和高可用性:Receiver 能够通过去重机制,确保多实例高可用性,并避免重复数据存储。
  • 缺点

    • 无直接查询功能:Receiver 本身不具备查询功能,接收到的数据需要依赖其他 Thanos 组件(如 Querier 和 Store)进行查询和分析。

      实时性较低:相比直接从 Prometheus 实例查询数据,Receiver 可能在数据处理和查询时存在一定的延迟。

Sidecar 与 Receiver 的区别对比(抄自 ChatGPT)

特性Thanos SidecarThanos Receiver
主要功能集成 Prometheus 实例,提供实时数据访问和长期存储接收 Prometheus 实例的远程写入数据,并存储
数据源直接从 Prometheus 获取数据Prometheus 的 Remote Write 数据
数据存储方式定期上传 Prometheus 数据块到对象存储将接收到的数据存储在本地或对象存储中
水平扩展性无法扩展,只与单个 Prometheus 实例集成可以通过增加实例水平扩展
实时数据查询支持 Prometheus 实时数据查询无法直接查询数据
多租户支持不支持支持,适用于多租户环境
高可用性依赖 Prometheus 实例支持高可用部署和去重机制
适用场景与现有 Prometheus 实例集成,长期存储数据大规模、多租户环境的数据接收和存储

架构选择

  • 多集群 thanos 监控告警实践
  • 打造云原生大型分布式监控系统 (三): Thanos 部署与实践
  • 以下的建议取自这两个博客,具体的架构选择,也只能大家根据自己的实际情况验证和判断
  • Sidecar 与 Receiver 的最主要的区分就是最新数据的查询方式不同
    • Sidecar 最新数据直接读取 Promethues 数据目录
    • Receiver 的所有数据都在存储服务里面(S3 等存储服务)
  • Prometheus 集群不大,采集的服务不多的情况下,即使 Sidecar 和 全局查询的 Query 不在一个机房,只要都是国内的,查询延迟一般不会太高
  • Prometheus 集群很大,要采集的数据也非常多的情况下,尽可能还是选择 Sidecar 架构,因为数据一旦激增,Receiver 的压力会非常非常大,需要很大的资源,也需要很强大的存储性能
  • 除非主要目的是针对指标历史做分析使用,或者 Prometheus 有某些特殊场景无法持久化数据,这些以外的场景,建议使用 Sidecar

开始部署

采用 sidecar 模式部署

部署架构

考虑用 Prometheus 自带的 rule 做告警,这边没打算部署 Thanos-rule

k8s 集群 Ak8s 集群 B
Prometheus:v2.54.1Prometheus:v2.54.1
node-exporter:v1.8.2node-exporter:v1.8.2
kube-state-metrics:v2.11.0kube-state-metrics:v2.11.0
Thanos-sidecar:v0.36.1Thanos-sidecar:v0.36.1
Thanos-query:v0.36.1Thanos-query:v0.36.1
Thanos-store-gateway:v0.36.1Thanos-store-gateway:v0.36.1
Thanos-compact:v0.36.1
Thanos-query-globle:v0.36.1
Thanos-query-frontend:v0.36.1
Grafana

MinIO 部署可以看我之前的博客:k8s 1.28.2 集群部署 MinIO 分布式集群,先提前准备好 MinIO 集群

创建 namespace

以下所有的 k 命令都代表 kubectl 命令,部署这块只展示一个环境的,我这边是两套 k8s 集群,需要部署两套 Prometheus

k create ns monitor

node-exporter 部署

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: node-exporter
  name: node-exporter-svc
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http
    port: 9100
    protocol: TCP
  selector:
    app.kubernetes.io/name: node-exporter
  type: ClusterIP
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    app.kubernetes.io/name: node-exporter
  name: node-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: node-exporter
  template:
    metadata:
      labels:
        app.kubernetes.io/name: node-exporter
    spec:
      containers:
      - args:
        - --path.rootfs=/rootfs
        - --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$
        image: docker.m.daocloud.io/prom/node-exporter:v1.8.2
        name: node-exporter
        ports:
        - containerPort: 9100
          hostPort: 9100
          name: http
        volumeMounts:
        - mountPath: /rootfs
          name: root
          readOnly: true
      hostIPC: true
      hostNetwork: true
      hostPID: true
      volumes:
      - hostPath:
          path: /
        name: root

kube-state-metrics 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
  name: kube-state-metrics-sa
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
  name: kube-state-metrics
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - secrets
  - nodes
  - pods
  - services
  - serviceaccounts
  - resourcequotas
  - replicationcontrollers
  - limitranges
  - persistentvolumeclaims
  - persistentvolumes
  - namespaces
  - endpoints
  verbs:
  - list
  - watch
- apiGroups:
  - apps
  resources:
  - statefulsets
  - daemonsets
  - deployments
  - replicasets
  verbs:
  - list
  - watch
- apiGroups:
  - batch
  resources:
  - cronjobs
  - jobs
  verbs:
  - list
  - watch
- apiGroups:
  - autoscaling
  resources:
  - horizontalpodautoscalers
  verbs:
  - list
  - watch
- apiGroups:
  - authentication.k8s.io
  resources:
  - tokenreviews
  verbs:
  - create
- apiGroups:
  - authorization.k8s.io
  resources:
  - subjectaccessreviews
  verbs:
  - create
- apiGroups:
  - policy
  resources:
  - poddisruptionbudgets
  verbs:
  - list
  - watch
- apiGroups:
  - certificates.k8s.io
  resources:
  - certificatesigningrequests
  verbs:
  - list
  - watch
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs:
  - list
  - watch
- apiGroups:
  - storage.k8s.io
  resources:
  - storageclasses
  - volumeattachments
  verbs:
  - list
  - watch
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - list
  - watch
- apiGroups:
  - networking.k8s.io
  resources:
  - networkpolicies
  - ingressclasses
  - ingresses
  verbs:
  - list
  - watch
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - list
  - watch
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - clusterrolebindings
  - clusterroles
  - rolebindings
  - roles
  verbs:
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
  name: kube-state-metrics
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
  - name: telemetry
    port: 8081
    targetPort: telemetry
  selector:
    app.kubernetes.io/name: kube-state-metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
  name: kube-state-metrics
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-state-metrics
  template:
    metadata:
      labels:
        app.kubernetes.io/name: kube-state-metrics
    spec:
      automountServiceAccountToken: true
      containers:
      - image: docker.m.daocloud.io/registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.11.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          httpGet:
            path: /livez
            port: http-metrics
          initialDelaySeconds: 5
          timeoutSeconds: 5
        name: kube-state-metrics
        ports:
        - containerPort: 8080
          name: http-metrics
        - containerPort: 8081
          name: telemetry
        readinessProbe:
          httpGet:
            path: /readyz
            port: telemetry
          initialDelaySeconds: 5
          timeoutSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
      nodeSelector:
        kubernetes.io/os: linux
      serviceAccountName: kube-state-metrics-sa

Prometheus + Thanos-Sidecar 部署

固定节点创建 label
k label node 192.168.22.125 prometheus=true
生成 secret
MinIO 配置

因为包含 MinIO 的 access_key 和 secret_key,尽量别用 configmap 去明文读取,用 secret 读取,一会输出的内容,合并成一行后,需要放到下面的 secret 里面去替换掉

cat <<EOF | base64 -
type: S3
config:
  bucket: "prom-thanos-sidecar"
  endpoint: "minio.api.devops.icu"
  access_key: "gsl2dzAHviNzabSn0ikw"
  secret_key: "82zQ0UMDlOo3LxCQM9TqSygEYrMuxSSRYQdO1KXF"
  insecure: true
EOF
etcd 证书

我是 kubeadm 部署的 k8s 集群,我的证书路径是 /etc/kubernetes/pki/etcd,我直接把本地文件生成 secret

certs_dir=/etc/kubernetes/pki/etcd; \
k create secret generic etcd-pki -n monitoring \
--from-file=ca=${certs_dir}/ca.crt \
--from-file=cert=${certs_dir}/server.crt \
--from-file=key=${certs_dir}/server.key
启动 Prometheus + Thanos-Sidecar

Prometheus 的数据存储用的是本地 hostpath 的方式,由于 Thanos 需要读取 Prometheus 的数据,所以要保持用户一致,不然会因为权限问题,Thanos 没法读取数据,也没法将数据上传到 MinIO,具体的报错参考:ts=2024-10-21T06:09:16.284378709Z caller=sidecar.go:410 level=warn err="upload 01JAP2JAZ0AQT8BEYFY30A4VVD: hard link block: hard link file chunks/000001: link /etc/prometheus/data/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001 /etc/prometheus/data/thanos/upload/01JAP2JAZ0AQT8BEYFY30A4VVD/chunks/000001: operation not permitted" uploaded=0

  • Prometheus 参数简介
    • --storage.tsdb.min-block-duration=2h:最小2小时生成一次新的数据块
    • --storage.tsdb.max-block-duration=2h:最大2小时生成一次新的数据块
    • --storage.tsdb.retention.time=6h:Prometheus 本地数据保留时长,默认是15天,这个可以自己根据实际磁盘情况调整
    • --storage.tsdb.wal-compression:启用 WAL 日志压缩,减少 WAL 文件的大小,降低存储空间的需求
    • --storage.tsdb.no-lockfile:禁用锁文件,避免影响 Thanos 上传数据块到 MinIO
    • --web.enable-lifecycle:支持热更新 localhost:9090/-/reload 热加载配置文件
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-sa
  namespace: monitoring
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
- kind: ServiceAccount
  name: prometheus-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: prometheus
  name: prometheus-svc
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 9090
    targetPort: 9090
  - name: grpc
    port: 10901
    targetPort: 10901
  selector:
    app: prometheus
  type: ClusterIP
---
apiVersion: v1
data:
  prometheus.yml: |
    global:
      scrape_interval: 30s
      evaluation_interval: 30s
      scrape_timeout: 10s
      external_labels:
        cluster: devops
        replica: $(POD_NAME)
    rule_files:
    - /etc/prometheus/rules/*.yml
    scrape_configs:
    - job_name: prometheus
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_app]
        regex: prometheus
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9090
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: kube-apiserver
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: kubelet
      metrics_path: /metrics/cadvisor
      scheme: https
      tls_config:
        insecure_skip_verify: true
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [instance]
        action: replace
        target_label: node
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: etcd
      kubernetes_sd_configs:
      - role: pod
      scheme: https
      tls_config:
        ca_file: /etc/prometheus/etcd-ssl/ca
        cert_file: /etc/prometheus/etcd-ssl/cert
        key_file: /etc/prometheus/etcd-ssl/key
        insecure_skip_verify: false
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_component]
        regex: etcd
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:2379
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: coredns
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_label_k8s_app]
        regex: kube-dns
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9153
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: node-exporter
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - source_labels: [__meta_kubernetes_node_address_InternalIP]
        action: replace
        target_label: ip
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: kube-state-metrics
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        regex: monitoring;kube-state-metrics
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:8080
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
kind: ConfigMap
metadata:
  name: prometheus-cm
  namespace: monitoring
---
apiVersion: v1
data:
  config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:
  labels:
    app.kubernetes.io/name: prometheus
  name: thanos-config
  namespace: monitoring
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app: prometheus
  name: prometheus
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: prometheus
                operator: In
                values:
                - "true"
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - prometheus
            topologyKey: kubernetes.io/hostname
      containers:
      - args:
        - --config.file=/etc/prometheus/config/prometheus.yml
        - --storage.tsdb.path=/etc/prometheus/data
        - --storage.tsdb.min-block-duration=2h
        - --storage.tsdb.max-block-duration=2h
        - --storage.tsdb.retention.time=6h
        - --storage.tsdb.wal-compression
        - --storage.tsdb.no-lockfile
        - --web.enable-lifecycle
        command:
        - /bin/prometheus
        env:
        - name: TZ
          value: Asia/Shanghai
        image: quay.io/prometheus/prometheus:v2.54.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 60
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: http
          timeoutSeconds: 1
        name: prometheus
        ports:
        - containerPort: 9090
          name: http
        readinessProbe:
          failureThreshold: 60
          initialDelaySeconds: 5
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: http
          timeoutSeconds: 1
        resources:
          limits:
            cpu: 500m
            memory: 1024Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - mountPath: /etc/prometheus/data
          name: prometheus-home
        - mountPath: /etc/prometheus/config
          name: prometheus-config
        - mountPath: /etc/prometheus/etcd-ssl
          name: etcd-ssl
      - args:
        - sidecar
        - --log.level=info
        - --log.format=logfmt
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --tsdb.path=/etc/prometheus/data
        - --prometheus.url=http://localhost:9090
        - --objstore.config-file=/etc/thanos/config/thanos-sidecar.yml
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        name: thanos-sidecar
        ports:
        - containerPort: 10901
          name: grpc
        volumeMounts:
        - mountPath: /etc/prometheus/data
          name: prometheus-home
        - mountPath: /etc/thanos/config/thanos-sidecar.yml
          name: thanos-config
          readOnly: true
          subPath: config
      imagePullSecrets:
      - name: harbor-secret
      initContainers:
      - command:
        - sh
        - -c
        - '[ -d /etc/prometheus/data/thanos ] || chown -R 65534:65534 /etc/prometheus/data'
        image: quay.io/prometheus/prometheus:v2.54.1
        imagePullPolicy: IfNotPresent
        name: init-dir
        securityContext:
          runAsUser: 0
        volumeMounts:
        - mountPath: /etc/prometheus/data
          name: prometheus-home
      securityContext:
        runAsUser: 65534
      serviceAccount: prometheus-sa
      terminationGracePeriodSeconds: 0
      volumes:
      - hostPath:
          path: /approot/k8s_data/prometheus
          type: DirectoryOrCreate
        name: prometheus-home
      - configMap:
          name: prometheus-cm
        name: prometheus-config
      - name: thanos-config
        secret:
          secretName: thanos-config
      - name: etcd-ssl
        secret:
          secretName: etcd-pki

Thanos-store-gateway 部署

secret 里面涉及的内容,和 sidecar 里面的是一样的,记得替换成自己的

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway-sa
  namespace: monitoring
---
apiVersion: v1
data:
  config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-objstore-config
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway-headless
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-store-gateway
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-store-gateway
  serviceName: thanos-store-gateway-headless
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-store-gateway
    spec:
      containers:
      - args:
        - store
        - --log.level=info
        - --log.format=logfmt
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --no-cache-index-header
        - --objstore.config-file=/etc/thanos/objstore.yaml
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-store-gateway
        ports:
        - containerPort: 10901
          name: grpc
          protocol: TCP
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /etc/thanos/objstore.yaml
          name: objstore-config
          readOnly: true
          subPath: config
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-store-gateway-sa
      volumes:
      - name: objstore-config
        secret:
          secretName: thanos-objstore-config
      - emptyDir:
          sizeLimit: 100Mi
        name: data

Thanos-compact 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway-sa
  namespace: monitoring
---
apiVersion: v1
data:
  config: dHlwZTogUzMKY29uZmlnOgogIGJ1Y2tldDogInByb20tdGhhbm9zLXNpZGVjYXIiCiAgZW5kcG9pbnQ6ICJtaW5pby5hcGkuZGV2b3BzLmljdSIKICBhY2Nlc3Nfa2V5OiAiZ3NsMmR6QUh2aU56YWJTbjBpa3ciCiAgc2VjcmV0X2tleTogIjgyelEwVU1EbE9vM0x4Q1FNOVRxU3lnRVlyTXV4U1NSWVFkTzFLWEYiCiAgaW5zZWN1cmU6IHRydWUK
kind: Secret
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-objstore-config
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway-headless
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-store-gateway
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: thanos-store-gateway
  name: thanos-store-gateway
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-store-gateway
  serviceName: thanos-store-gateway-headless
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-store-gateway
    spec:
      containers:
      - args:
        - store
        - --log.level=info
        - --log.format=logfmt
        - --data-dir=/var/thanos/store
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --no-cache-index-header
        - --objstore.config-file=/etc/thanos/objstore.yaml
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-store-gateway
        ports:
        - containerPort: 10901
          name: grpc
          protocol: TCP
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /etc/thanos/objstore.yaml
          name: objstore-config
          readOnly: true
          subPath: config
        - mountPath: /var/thanos/store
          name: data
          readOnly: false
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-store-gateway-sa
      volumes:
      - name: objstore-config
        secret:
          secretName: thanos-objstore-config
      - emptyDir:
          sizeLimit: 100Mi
        name: data
root@dream:/approot/chen2ha/kubetpl 13:58:08 # cat output/thanos-compact.yaml
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-compact
  name: thanos-compact-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-compact
  name: thanos-compact-headless
  namespace: monitoring
spec:
  clusterIP: None
  ports:
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-compact
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: thanos-compact
  name: thanos-compact
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-compact
  serviceName: thanos-compact-headless
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-compact
    spec:
      containers:
      - args:
        - compact
        - --wait
        - --log.level=info
        - --log.format=logfmt
        - --data-dir=/var/thanos/compact
        - --http-address=0.0.0.0:10902
        - --objstore.config-file=/etc/thanos/objstore.yaml
        - --compact.enable-vertical-compaction
        - --deduplication.replica-label=replica
        - --deduplication.func=penalty
        - --delete-delay=1d
        - --retention.resolution-raw=7d
        - --retention.resolution-5m=15d
        - --retention.resolution-1h=30d
        env:
        - name: NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-compact
        ports:
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /etc/thanos/objstore.yaml
          name: objstore-config
          readOnly: true
          subPath: config
        - mountPath: /var/thanos/compact
          name: data
          readOnly: false
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-compact-sa
      volumes:
      - name: objstore-config
        secret:
          secretName: thanos-objstore-config
      - emptyDir:
          sizeLimit: 100Mi
        name: data

Thanos-query 部署

  • --query.replica-label 参数指定依据哪个标签做数据的去重,在 Prometheus 的 external_labels 里面配置的
  • 给 Thanos-query 的 gRPC 端口配一个独立的 svc ,通过 nodeport 的方式暴露端口,再由一个全局的 Thanos-query 来注册各个集群的 Thanos-query,最终通过 Thanos-query-frontend 来查询
    • 当然,如果资源足够,也完全可以每个集群再多部署一个 Thanos-query 来当作全局查询,内外查询做一个分流
---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-query
  name: thanos-query-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-query
  name: thanos-query-svc
  namespace: monitoring
spec:
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-query
  type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-query
  name: thanos-query-np-svc
  namespace: monitoring
spec:
  ports:
  - name: grpc
    nodePort: 31901
    port: 10901
    targetPort: grpc
  selector:
    app.kubernetes.io/name: thanos-query
  type: NodePort
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: thanos-query
  name: thanos-query
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-query
    spec:
      containers:
      - args:
        - query
        - --log.level=info
        - --log.format=logfmt
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --query.replica-label=replica
        - --endpoint=dnssrv+_grpc._tcp.thanos-store-gateway-headless.monitoring.svc.cluster.local
        - --endpoint=dnssrv+_grpc._tcp.prometheus-svc.monitoring.svc.cluster.local
        env:
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-query
        ports:
        - containerPort: 10901
          name: grpc
          protocol: TCP
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-query-sa

Thanos-query-globle 部署

--endpoint 我是两个集群各挑了两个节点

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-globle
  name: thanos-query-globle-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-globle
  name: thanos-query-globle-svc
  namespace: monitoring
spec:
  ports:
  - name: grpc
    port: 10901
    targetPort: grpc
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-query-globle
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-globle
  name: thanos-query-globle
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-query-globle
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-query-globle
    spec:
      containers:
      - args:
        - query
        - --log.level=info
        - --log.format=logfmt
        - --grpc-address=0.0.0.0:10901
        - --http-address=0.0.0.0:10902
        - --query.replica-label=replica
        - --endpoint=192.168.22.112:31901
        - --endpoint=192.168.22.113:31901
        - --endpoint=192.168.22.122:31901
        - --endpoint=192.168.22.123:31901
        env:
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-query-globle
        ports:
        - containerPort: 10901
          name: grpc
          protocol: TCP
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-query-globle-sa

Thanos-query-frontend 部署

---
apiVersion: v1
automountServiceAccountToken: false
kind: ServiceAccount
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-frontend
  name: thanos-query-frontend-sa
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-frontend
  name: thanos-query-frontend-svc
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 10902
    protocol: TCP
    targetPort: http
  selector:
    app.kubernetes.io/name: thanos-query-frontend
  type: ClusterIP
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/name: thanos-query-frontend
  name: thanos-query-frontend
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: thanos-query-frontend
  template:
    metadata:
      labels:
        app.kubernetes.io/name: thanos-query-frontend
    spec:
      containers:
      - args:
        - query-frontend
        - --log.level=info
        - --log.format=logfmt
        - --http-address=0.0.0.0:10902
        - --query-frontend.downstream-url=http://thanos-query-globle-svc.monitoring.svc.cluster.local:10902
        env:
        - name: HOST_IP_ADDRESS
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        image: quay.io/thanos/thanos:v0.36.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 4
          httpGet:
            path: /-/healthy
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 1
        name: thanos-query-frontend
        ports:
        - containerPort: 10902
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 20
          httpGet:
            path: /-/ready
            port: http
            scheme: HTTP
          initialDelaySeconds: 0
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          runAsGroup: 65532
          runAsNonRoot: true
          runAsUser: 65534
          seccompProfile:
            type: RuntimeDefault
      securityContext:
        fsGroup: 65534
        runAsGroup: 65532
        runAsNonRoot: true
        runAsUser: 65534
        seccompProfile:
          type: RuntimeDefault
      serviceAccountName: thanos-query-frontend-sa
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: thanos-query-frontend
  namespace: monitoring
spec:
  ingressClassName: nginx
  rules:
  - host: thanos.devops.icu
    http:
      paths:
      - backend:
          service:
            name: thanos-query-frontend-svc
            port:
              number: 10902
        path: /
        pathType: Prefix

Grafana 部署

这边采用了 nfs 针对 dashboard 的 json 文件做了持久化,有修改或者增加就比较方便,直接上传到 nfs 就可以了

---
apiVersion: v1
data:
  grafana.ini: |
    provisioning = /etc/grafana/provisioning
kind: ConfigMap
metadata:
  name: grafana-cm
  namespace: monitoring
---
apiVersion: v1
data:
  prometheus.yaml: |
    apiVersion: 1
    datasources:
      - name: Prometheus
        type: prometheus
        access: proxy
        url: http://thanos-query-globle-svc.monitoring.svc.cluster.local:10902
kind: ConfigMap
metadata:
  name: grafana-datasource
  namespace: monitoring
---
apiVersion: v1
data:
  dashboards.yaml: |
    apiVersion: 1
    providers:
    - name: 'a unique provider name'
      orgId: 1
      folder: ''
      folderUid: ''
      type: file
      disableDeletion: false
      editable: true
      updateIntervalSeconds: 10
      allowUiUpdates: true
      options:
        # <string, required> path to dashboard files on disk. Required
        path: /etc/grafana/provisioning/dashboards/views
kind: ConfigMap
metadata:
  name: grafana-dashboard
  namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: grafana
  name: grafana-svc
  namespace: monitoring
spec:
  ports:
  - port: 3000
    protocol: TCP
    targetPort: http-grafana
  selector:
    app.kubernetes.io/name: grafana
  type: ClusterIP
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/name: grafana
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: grafana
  template:
    metadata:
      labels:
        app.kubernetes.io/name: grafana
    spec:
      containers:
      - env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: docker.m.daocloud.io/grafana/grafana:11.3.0
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 30
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 3000
          timeoutSeconds: 1
        name: grafana
        ports:
        - containerPort: 3000
          name: http-grafana
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /robots.txt
            port: 3000
            scheme: HTTP
          initialDelaySeconds: 10
          periodSeconds: 30
          successThreshold: 1
          timeoutSeconds: 2
        resources:
          limits:
            cpu: 1000m
            memory: 1024Mi
          requests:
            cpu: 250m
            memory: 750Mi
        volumeMounts:
        - mountPath: /etc/grafana/grafana.ini
          name: grafana-config
          subPath: grafana.ini
        - mountPath: /etc/grafana/provisioning/datasources/prometheus.yaml
          name: grafana-datasource
          subPath: prometheus.yaml
        - mountPath: /etc/grafana/provisioning/dashboards/grafana-dashboard.yaml
          name: grafana-dashboard
          subPath: dashboards.yaml
        - mountPath: /etc/grafana/provisioning/dashboards/views
          name: grafana
          subPathExpr: $(POD_NAME)
      securityContext:
        fsGroup: 472
        supplementalGroups:
        - 0
      volumes:
      - configMap:
          name: grafana-cm
        name: grafana-config
      - configMap:
          name: grafana-datasource
        name: grafana-datasource
      - configMap:
          name: grafana-dashboard
        name: grafana-dashboard
  volumeClaimTemplates:
  - metadata:
      name: grafana
    spec:
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          storage: 5Gi
      storageClassName: nfs-client
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: monitoring
spec:
  ingressClassName: nginx
  rules:
  - host: grafana.devops.icu
    http:
      paths:
      - backend:
          service:
            name: grafana-svc
            port:
              number: 3000
        path: /
        pathType: Prefix

增加 Thanos 和 MinIO 监控

Prometheus 采集 MinIO 指标需要鉴权,需要通过 mc 命令配置 JWT 认证,可以查看官方文档:mc admin prometheus generate

或者 MinIO 配置 MINIO_PROMETHEUS_AUTH_TYPE=public 参数,需要重启 MinIO 生效,使 Prometheus 可以直接访问 metrics api

    - job_name: minio
      metrics_path: /minio/v2/metrics/cluster
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        regex: storage;minio-svc
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:9000
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: thanos-query
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        regex: monitoring;thanos-query-svc
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:10902
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: thanos-store-gateway
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        regex: monitoring;thanos-store-gateway-headless
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:10902
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

    - job_name: thanos-compact
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name]
        regex: monitoring;thanos-compact-headless
        action: keep
      - source_labels: [__meta_kubernetes_pod_ip]
        regex: (.+)
        target_label: __address__
        replacement: ${1}:10902
      - source_labels: [__meta_kubernetes_endpoints_name]
        action: replace
        target_label: endpoint
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: service
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace

Grafana dashboard

记录几个我这边配置的 dashboard id,因为我这边是双 k8s 集群,所以要加上 cluster 这个变量,大部分都需要自己再细调一下

coredns

14981

在这里插入图片描述

etcd

用的官方给的模板:grafana.json

在这里插入图片描述

Thanos

12937

在这里插入图片描述

node-exporter

12633 或者 21902

在这里插入图片描述

16098

在这里插入图片描述

最后

yaml 和 dashboard 的 json 文件可以从 gitee 自取:https://gitee.com/chen2ha/yaml_for_kubernetes/tree/master/thanos

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2228271.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

13、基于AT89C52轮询方式扫描按键proteus仿真

一、仿真原理图: 二、仿真效果: 三、相关代码: 1、main主函数: void main(void) { unsigned char ll; VariableInit(); while(1) { value = keyScan(); display(); …

春季测试 2023 我的题解

T1 涂色游戏 这道题目还是比较简单的 容易发现&#xff0c;位于 ( x i , y i ) (x_i,y_i) (xi​,yi​) 的格子的颜色只取决于 ​ x i x_i xi​ 行与 y i y_i yi​ 列的颜色。 这时候可以想到开两个数组&#xff0c;分别存储列与行的绘画信息&#xff0c;然后发现前后的互相…

magic-api简单使用二:自定义返回结果

背景 在上一章 中我们学习了搭建项目和导入文件&#xff0c; 这二天稍微有点时间&#xff0c;研究下这个magic-api的写法。 后续如果需要维护或者更改&#xff0c;也能在项目中尽快上手。 今天我们主要学习自定义返回结果&#xff0c;当然也可以使用官方的。不需要任何更改。…

“启动智能制造引擎 推动新质生产力发展”高峰论坛在京成功举办

在智能制造技术的助力下,北京经开区的智能制造产业稳步发展。2024年10月25日,北京电子科技职业学院图书馆暨北京经济技术开发区公共图书馆举办一场以“启动智能制造引擎 推动新质生产力发展”为主题的高峰论坛于北京亦庄经开区盛元书院圆满落幕。此次论坛为北京经开区智能制造产…

Java 异常处理(6/30)

目录 Java 异常处理 1. 什么是异常 2. 异常处理的关键字 2.1 try-catch 语句 2.2 多个 catch 块 2.3 finally 块 2.4 throw 和 throws 3. 自定义异常 4. 异常处理的最佳实践 总结与后续 Java 异常处理 在软件开发中&#xff0c;异常处理&#xff08;Exception Handl…

ubuntu22.04安装向日葵

1、下载deb安装包 进入官网下载图形版本&#xff1a;https://sunlogin.oray.com/download/linux?typepersonal 2、命令行安装 sudo chmod x 文件名.deb sudo dpkg -i 文件名.deb 3、开始报错的看这里&#xff01; 首先展示一下安装成功的效果图&#xff1a; 接下来是我安…

小米手机投屏到Windows笔记本电脑的3个方法,随便选一个

方法一&#xff1a;Windows系统自带的投屏功能 Windows系统本身的功能很多&#xff0c;其中一项就是接收别的电脑或手机的投屏。 操作步骤一&#xff0c;在Windows电脑里进入【设置】&#xff0c;点击【系统】&#xff0c;往下翻页&#xff0c;找到【投影到此电脑】。 如果这…

漏洞挖掘 | 通过域混淆绕过实现账户接管

由于这是一个私有项目&#xff0c;我将使用 example.com 来代替。 很长一段时间以来&#xff0c;我一直想在漏洞赏金项目中找到一个账户接管&#xff08;ATO&#xff09;漏洞。于是&#xff0c;我开始探索项目范围内的 account.example.com。 我做的第一件事就是注册一个新账…

加密软件有什么功能?

加密软件是一种专门用于保护数据安全的工具&#xff0c;它通过复杂的加密算法来确保数据在传输和存储过程中的机密性和完整性。以下是加密软件通常具备的主要功能&#xff1a; 一、数据加密 文件加密&#xff1a;能够对单个文件或整个文件夹进行加密&#xff0c;确保只有授权…

Python bs4 结合 Scrapy,进行数据爬取和处理

Python bs4 结合 Scrapy&#xff0c;进行数据爬取和处理 在现代数据分析和机器学习领域&#xff0c;数据爬取是获取网页数据的常用方法。Python 提供了许多工具来进行网页爬取&#xff0c;其中 Scrapy 和 BeautifulSoup&#xff08;bs4&#xff09;是最常用的两个库。Scrapy 是…

【p2p、分布式,区块链笔记 Torrent】webtorrent.min.js的实现之appendTo()函数

官方给出的示例通过appendTo函数渲染文件对象&#xff0c;通过torrent.files的元素对象调用&#xff1a; const WebTorrent require(webtorrent)const client new WebTorrent()// Sintel, a free, Creative Commons movie const torrentId magnet:?xturn:btih:08ada5a7a61…

ALIGN_ Tuning Multi-mode Token-level Prompt Alignment across Modalities

文章汇总 当前的问题 目前的工作集中于单模提示发现&#xff0c;即一种模态只有一个提示&#xff0c;这可能不足以代表一个类[17]。这个问题在多模态提示学习中更为严重&#xff0c;因为视觉和文本概念及其对齐都需要推断。此外&#xff0c;仅用全局特征来表示图像和标记是不…

MySQL数据集成至金蝶云星空的解决方案

MySQL数据集成至金蝶云星空的解决方案 SYB生产用料清单新增-深圳天一-半成品-好&#xff1a;MySQL数据集成到金蝶云星空的技术实现 在企业信息化系统中&#xff0c;数据的高效流转和准确对接是确保业务顺畅运行的关键。本文将聚焦于一个具体案例——如何将MySQL中的生产用料清…

javaweb----VS code

前端开发神器&#xff1a;VS Code → 速度快、体积小、插件多 VS Code 安装官网&#xff1a;https://code.visualstudio.com/download VS Code一些必备的插件安装&#xff1a; 1、Chinese (Simplified) 简体中文 2、Code Spell Checker 检查拼写 3、HTML CSS Support 4…

【新闻转载】“假冒 LockBit”来袭:勒索软件借助 AWS S3 偷窃数据,威胁升级

关键要点 Trend团队发现了一些利用 Amazon S3&#xff08;简单存储服务&#xff09;传输加速功能的 Golang 勒索软件样本&#xff0c;用于窃取受害者的文件并上传至攻击者控制的 S3 存储桶。 这些样本中硬编码的 Amazon Web Services (AWS) 凭证被用于追踪与恶意活动关联的 AW…

第十五章 Vue工程化开发及Vue CLI脚手架

目录 一、引言 二、Vue CLI 基本介绍 三、安装Vue CLI 3.1. 安装npm和yarn 3.2. 安装Vue CLI 3.3. 查看 Vue 版本 四、创建启动工程 4.1. 创建项目架子 4.2. 启动工程 五、脚手架目录文件介绍 六、核心文件讲解 6.1. index.html 6.2. main.js 6.3. App.vue 一、…

BEV:针孔相机坐标转换

一 、背景 自动驾驶中经常涉及到不同坐标系之间的坐标转换&#xff0c;在BEV方案中用的比较多的是自车坐标到图像坐标的转换&#xff0c;系统整理了一下坐标转换过程流程。 二 、方法 旋转矩阵计算方法&#xff1a; translation: 平移参数[‘x’, ‘y’, ‘z’] 高阶畸变模型…

开关灯问题(c语言)

样例&#xff1a;10 10 &#xff0c;输出&#xff1a;1&#xff0c;4&#xff0c;9 5 5 &#xff0c;输出&#xff1a;1&#xff0c;4 代码如下 #include<stdio.h> //引入bool值的概念 #include<stdbool.h> int main() {int n 0;//n为灯的数量int m 0;…

centos虚拟机部署opengauss数据库

一、基本信息 1、虚拟机安装的centos版本 2、opengauss版本 地址&#xff1a;https://opengauss.org/zh/download/ 3、opengauss和gaussdb的区别 高斯数据库&#xff08;GaussDB&#xff09;是云数据库&#xff0c;需要购买。 openGaussDB是开源数据库&#xff0c;可以免费…

搜索TV 1.2.4 | 适用于TV端的浏览器应用,设计简洁,功能强大

Klonsdif搜索TV版是一款专为TV端设计的浏览器应用&#xff0c;界面简洁&#xff0c;操作简单&#xff0c;保留最纯粹的浏览体验。支持使用百度、必应、360、搜狗、秘塔AI搜索、360AI搜索、bilibili等内置搜索引擎&#xff0c;也可以直接输入网址访问。全免费、无广告&#xff0…