【求助】【建议放弃】【谷粒商城版】Kubernetes

本文作者： slience_me

文章目录

Kubernetes【谷粒商城版】【建议放弃】
- 1. docker安装
- 2. kubernetes安装前
- 3. kubeadm,kubelet,kubectl
- - 3.1 简介
  - - kubeadm
    - kubelet
    - kubectl
    - - 常用指令
  - 3.2 安装
  - 3.3 kubeadm初始化
  - 3.4 加入从节点(工作节点)
  - 3.5 安装Pod网络插件（CNI）
  - 3.6 KubeSphere安装
  - 3.7 卡住
- 配置文件
- - openebs-operator-1.5.0.yaml
  - kubesphere-minimal.yaml
- 【Q&A】问题汇总
- - 01-coredns工作不正常
  - 02-kubeSphere安装异常
  - 03-kubeSphere相关node不正常

Kubernetes【谷粒商城版】【建议放弃】

官方网站 Kubernetes

下面为kubernetes的使用流程，从安装开始

版本号：

kubectl version

Client Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.3”, GitCommit:“06ad960bfd03b39c8310aaf92d1e7c12ce618213”, GitTreeState:“clean”, BuildDate:“2020-02-11T18:14:22Z”, GoVersion:“go1.13.6”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“17”, GitVersion:“v1.17.3”, GitCommit:“06ad960bfd03b39c8310aaf92d1e7c12ce618213”, GitTreeState:“clean”, BuildDate:“2020-02-11T18:07:13Z”, GoVersion:“go1.13.6”, Compiler:“gc”, Platform:“linux/amd64”}

helm version

Client: &version.Version{SemVer:“v2.16.3”, GitCommit:“1ee0254c86d4ed6887327dabed7aa7da29d7eb0d”, GitTreeState:“clean”}
Server: &version.Version{SemVer:“v2.16.3”, GitCommit:“1ee0254c86d4ed6887327dabed7aa7da29d7eb0d”, GitTreeState:“dirty”}

PS：常言道，一坑更比一坑高，我把全部的坑都踩一边，做过这个后，整个人都练就了强大的内心，有了随时重头再来的勇气！

在这个教程之前，已经完成了虚拟机集群的构建，如果没有完成，见基于VMware虚拟机集群搭建教程

1. docker安装

我使用的是ubuntu-20.04.6-live-server-amd64版本 docker安装教程

注意：本实验建议docker版本为 19.03, 其余的同以上教程

sudo apt install -y docker-ce=5:19.03.15~3-0~ubuntu-$(lsb_release -cs) \
                    docker-ce-cli=5:19.03.15~3-0~ubuntu-$(lsb_release -cs) \
                    containerd.io

2. kubernetes安装前

在安装之前，需要对系统做一些调整，实现更好的性能

关闭防火墙

sudo systemctl stop ufw
sudo systemctl disable ufw

关闭 SELinux

sudo systemctl stop apparmor
sudo systemctl disable apparmor

关闭 Swap

sudo swapoff -a  # 临时关闭 swap
sudo sed -i '/swap/d' /etc/fstab  # 永久禁用 swap

free -g  # Swap 需要为 0 | 验证 Swap 是否关闭

配置主机名和 Hosts 映射

sudo hostnamectl set-hostname <newhostname>  # 修改主机名
sudo vim /etc/hosts  # 添加 IP 与主机名的映射

192.168.137.130 k8s-node1
192.168.137.131 k8s-node2
192.168.137.132 k8s-node3

让 IPv4 流量正确传递到 iptables

# 启用桥接流量
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system

处理只读文件系统问题

# 如果遇到 "只读文件系统（read-only filesystem）" 问题，可以尝试重新挂载：
sudo mount -o remount,rw /

同步时间

# 如果时间不同步，可能会影响 Kubernetes 组件的运行，可以使用 ntpdate 进行时间同步：
sudo apt update
sudo apt install -y ntpdate
sudo ntpdate time.windows.com

sudo timedatectl set-local-rtc 1   # 修改硬件时钟
sudo timedatectl set-timezone Asia/Shanghai  # 修改时钟

3. kubeadm,kubelet,kubectl

3.1 简介

在 Kubernetes（K8s）集群的安装和管理过程中，kubeadm、kubelet 和 kubectl 是三个最重要的组件，它们分别承担不同的职责：

组件	作用	适用对象
`kubeadm`	初始化和管理 Kubernetes 集群	集群管理员
`kubelet`	运行在每个节点上，管理 Pod 和容器	Kubernetes 节点
`kubectl`	用于操作 Kubernetes 资源的命令行工具	开发者 & 运维

kubeadm

Kubernetes 集群初始化工具

kubeadm 是官方提供的 快速部署和管理 Kubernetes 集群 的工具，主要用于：

# 初始化 Kubernetes 控制平面（主节点）
kubeadm init
# 加入新的工作节点（Worker Node）
kubeadm join <master-ip>:6443 --token <token> --discovery-token-ca-cert-hash sha256:<hash>
# 升级 Kubernetes 版本
kubeadm upgrade apply v1.28.0

特点:

✅ 官方推荐方式，简化集群安装
✅ 自动生成证书、配置文件、Pod 网络
✅ 可升级 Kubernetes（但不管理工作负载）

📌 注意：kubeadm 仅用于 集群初始化和管理，不会持续运行，而是一次性命令。

kubelet

运行在每个节点上，管理 Pod 和容器

kubelet 是 Kubernetes 关键组件，在 每个节点（Master & Worker） 上运行，负责：

与 API Server 通信，获取调度任务
管理节点上的 Pod（创建、监控、重启）
与容器运行时（如 Docker, containerd）交互
健康检查，自动恢复失败的 Pod

启动 kubelet

在每个节点上，kubelet 作为系统服务运行：

systemctl enable --now kubelet

📌 注意：

kubelet 不会自己创建 Pod，它只负责运行 API Server 指定的 Pod。
如果 kubelet 退出或崩溃，节点上的所有 Pod 可能会停止工作。

kubectl

Kubernetes 命令行工具

kubectl 是 Kubernetes 的命令行客户端，用于与 Kubernetes API 交互，管理集群中的资源。

常用指令

kubectl cluster-info    # 查看集群信息
kubectl get nodes       # 查看集群中的节点信息
kubectl get namespaces  # 查看所有命名空间
kubectl get pods        # 查看所有 Pod
kubectl get svc         # 查看所有服务
kubectl get deployments # 查看所有部署
kubectl get replicasets # 查看所有 ReplicaSets
kubectl get configmaps  # 查看所有 ConfigMap
kubectl get secrets     # 查看所有 Secrets

kubectl apply -f <file>.yaml    										# 使用 YAML 文件创建或更新资源
kubectl create deployment <deployment-name> --image=<image-name>    	# 创建一个新的 Deployment

kubectl set image deployment/<deployment-name> <container-name>=<new-image>    # 更新 Deployment 镜像
kubectl apply -f <updated-file>.yaml    			# 更新资源配置

kubectl delete pod <pod-name>   				 	# 删除指定的 Pod
kubectl delete deployment <deployment-name>    		# 删除指定的 Deployment
kubectl delete svc <service-name>    				# 删除指定的 Service
kubectl delete pod --all    						# 删除所有 Pod（例如：已退出的容器）

kubectl logs <pod-name>    							# 查看指定 Pod 的日志
kubectl logs <pod-name> -c <container-name>    		# 查看指定容器的日志
kubectl logs <pod-name> --previous    				# 查看最近已删除 Pod 的日志

kubectl exec -it <pod-name> -- /bin/bash    		# 在 Pod 中执行命令（进入容器 shell）
kubectl exec -it <pod-name> -c <container-name> -- /bin/bash    # 执行命令到指定容器

kubectl describe pod <pod-name>    					# 查看 Pod 的详细信息
kubectl describe deployment <deployment-name>    	# 查看 Deployment 的详细信息

kubectl top nodes    								# 查看节点的资源使用情况
kubectl top pods     								# 查看 Pod 的资源使用情况

kubectl port-forward pod/<pod-name> <local-port>:<pod-port>    			# 本地端口转发到 Pod 内部端口
kubectl port-forward svc/<service-name> <local-port>:<service-port>    	# 本地端口转发到 Service

kubectl get events    								# 查看集群中的事件
kubectl get events --field-selector involvedObject.kind=Node    		# 查看节点的事件

kubectl delete pod <pod-name> --force --grace-period=0    				# 强制删除未响应的 Pod

kubectl get pods -o yaml    # 获取 Pod 的详细 YAML 输出
kubectl get pods -o json    # 获取 Pod 的详细 JSON 输出
kubectl get pods -o wide    # 获取 Pod 的详细输出

kubectl help    			# 获取 kubectl 命令帮助信息
kubectl get pod --help    	# 获取特定命令（如 Pod）帮助信息
kubectl get all -o wide		# 获取集群中所有资源的详细信息，包括节点、Pod、Service 等

📌 注意：

kubectl 需要配置 kubeconfig 文件才能连接 Kubernetes API Server。
常见 kubectl 配置文件路径：~/.kube/config。

总结

组件	作用	运行环境
`kubeadm`	初始化 & 管理 Kubernetes 集群	仅在主节点（Master）上运行一次
`kubelet`	运行和管理 Pod	所有节点（Master & Worker）长期运行
`kubectl`	操作 Kubernetes 资源	运维 & 开发者本地工具

👉 kubeadm 负责安装，kubelet 负责运行，kubectl 负责操作。 🚀

3.2 安装

列出已安装的所有软件包apt list --installed | grep kube

查找可用的 Kubernetes 相关包apt search kube

#使apt支持ssl传输
sudo apt-get install -y apt-transport-https

#下载gpg密钥
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

#添加apt源
sudo apt-add-repository "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main"

#查看可安装版本
sudo apt-cache madison kubectl

#安装指定版本
sudo apt-get install kubelet=1.17.3-00 kubeadm=1.17.3-00 kubectl=1.17.3-00

#阻止自动更新
sudo apt-mark hold kubelet kubeadm kubectl

3.3 kubeadm初始化

处理一下异常 [WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver

sudo tee /etc/docker/daemon.json <<EOF
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

sudo systemctl restart docker
docker info | grep -i cgroup

这个指令只在master节点(控制平面节点 (Control Plane Node))执行，就是主节点执行

# apiserver-advertise-address 集群某一个master节点
kubeadm init \
--apiserver-advertise-address=192.168.137.130 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--kubernetes-version v1.17.3 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16

打印的内容为：

# 正常日志
# W0317 21:29:46.144585   79912 validation.go:28] Cannot validate kube-proxy config - no validator is available
# W0317 21:29:46.144788   79912 validation.go:28] Cannot validate kubelet config - no validator is available
# ...............
# Your Kubernetes control-plane has initialized successfully!

# To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

# You should now deploy a pod network to the cluster.
# Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
#   https://kubernetes.io/docs/concepts/cluster-administration/addons/

# Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.137.130:6443 --token mkaq9m.k8d544uq97ppg3jd \
    --discovery-token-ca-cert-hash sha256:f18f2a44ba2dec0495d0be020e2e2c28ab66d02ae4b39b496720ff43a5657150

执行一下这个

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 检测所有节点
kubectl get nodes

# NAME        STATUS     ROLES    AGE     VERSION
# k8s-node1   NotReady   master   3m43s   v1.17.3

3.4 加入从节点(工作节点)

在从节点(工作节点)执行该指令

kubeadm join 192.168.137.130:6443 --token mkaq9m.k8d544uq97ppg3jd \
    --discovery-token-ca-cert-hash sha256:f18f2a44ba2dec0495d0be020e2e2c28ab66d02ae4b39b496720ff43a5657150

# 正常日志
# W0317 21:35:27.501099   64808 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
# [preflight] Running pre-flight checks
# [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
# [preflight] Reading configuration from the cluster...
# [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
# [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
# [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
# [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
# [kubelet-start] Starting the kubelet
# [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

# This node has joined the cluster:
# * Certificate signing request was sent to apiserver and a response was received.
# * The Kubelet was informed of the new secure connection details.
# Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

# 去master节点查一下
kubectl get nodes

# 正常日志
# NAME        STATUS     ROLES    AGE    VERSION
# k8s-node1   NotReady   master   7m4s   v1.17.3
# k8s-node2   NotReady   <none>   96s    v1.17.3
# k8s-node3   NotReady   <none>   96s    v1.17.3

# 查一下全部名称空间的节点情况
kubectl get pods --all-namespaces

# 正常日志
# NAMESPACE     NAME                                READY   STATUS    RESTARTS   AGE
# kube-system   coredns-7f9c544f75-67njd            0/1     Pending   0          7m46s
# kube-system   coredns-7f9c544f75-z82nl            0/1     Pending   0          7m46s
# kube-system   etcd-k8s-node1                      1/1     Running   0          8m1s
# kube-system   kube-apiserver-k8s-node1            1/1     Running   0          8m1s
# kube-system   kube-controller-manager-k8s-node1   1/1     Running   0          8m1s
# kube-system   kube-proxy-fs2p6                    1/1     Running   0          2m37s
# kube-system   kube-proxy-x7rkp                    1/1     Running   0          2m37s
# kube-system   kube-proxy-xpbvt                    1/1     Running   0          7m46s
# kube-system   kube-scheduler-k8s-node1            1/1     Running   0          8m1s

3.5 安装Pod网络插件（CNI）

kubectl apply -f \
	https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
	
# 正常日志
# namespace/kube-flannel created
# clusterrole.rbac.authorization.k8s.io/flannel created
# clusterrolebinding.rbac.authorization.k8s.io/flannel created
# serviceaccount/flannel created
# configmap/kube-flannel-cfg created
# daemonset.apps/kube-flannel-ds created

# 查一下全部名称空间的节点情况
kubectl get pods --all-namespaces

# 正常日志
# NAMESPACE      NAME                                READY   STATUS    RESTARTS   AGE
# kube-flannel   kube-flannel-ds-66bcs               1/1     Running   0          93s
# kube-flannel   kube-flannel-ds-ntwwx               1/1     Running   0          93s
# kube-flannel   kube-flannel-ds-vb6n9               1/1     Running   0          93s
# kube-system    coredns-7f9c544f75-67njd            1/1     Running   0          12m
# kube-system    coredns-7f9c544f75-z82nl            1/1     Running   0          12m
# kube-system    etcd-k8s-node1                      1/1     Running   0          12m
# kube-system    kube-apiserver-k8s-node1            1/1     Running   0          12m
# kube-system    kube-controller-manager-k8s-node1   1/1     Running   0          12m
# kube-system    kube-proxy-fs2p6                    1/1     Running   0          7m28s
# kube-system    kube-proxy-x7rkp                    1/1     Running   0          7m28s
# kube-system    kube-proxy-xpbvt                    1/1     Running   0          12m
# kube-system    kube-scheduler-k8s-node1            1/1     Running   0          12m

# 去master节点查一下
kubectl get nodes

# 正常日志
# NAME        STATUS   ROLES    AGE     VERSION
# k8s-node1   Ready    master   14m     v1.17.3
# k8s-node2   Ready    <none>   8m52s   v1.17.3
# k8s-node3   Ready    <none>   8m52s   v1.17.3

# 获取集群中所有资源的详细信息，包括节点、Pod、Service 等
kubectl get all -o wide

# 正常日志
# NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE   SELECTOR
# service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   17m   <none>

3.6 KubeSphere安装

官方网站 kubesphere.io

先安装helm（master节点执行）

Helm是Kubernetes的包管理器。包管理器类似于我们在Ubuntu中使用的apt、Centos 中使用的yum或者Python中的pip一样，能快速查找、下载和安装软件包。Helm由客户端组件helm和服务端组件Tiller组成,能够将一组K8S资源打包统一管理,是查找、共享和使用为Kubernetes构建的软件的最佳方式。

# 删除旧版
sudo rm -rf /usr/local/bin/helm
sudo rm -rf /usr/local/bin/tiller
kubectl get deployments -n kube-system
kubectl delete deployment tiller-deploy -n kube-system
kubectl get svc -n kube-system
kubectl delete service tiller-deploy -n kube-system
kubectl get serviceaccounts -n kube-system
kubectl delete serviceaccount tiller -n kube-system
kubectl get clusterrolebindings
kubectl delete clusterrolebinding tiller
kubectl get configmap -n kube-system
kubectl delete configmap tiller-config -n kube-system
kubectl get pods -n kube-system
kubectl get all -n kube-system

# 安装(master执行)
HELM_VERSION=v2.16.3 curl -L https://git.io/get_helm.sh | bash # 版本不同

# 用这个指令这个版本，否则会出错
curl -LO https://get.helm.sh/helm-v2.16.1-linux-amd64.tar.gz
tar -zxvf helm-v2.16.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm
sudo mv linux-amd64/tiller /usr/local/bin/tiller
sudo rm -rf linux-amd64

#  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
#                                  Dload  Upload   Total   Spent    Left  Speed
# 100 24.0M  100 24.0M    0     0  1594k      0  0:00:15  0:00:15 --:--:-- 1833k
# root@k8s-node1:/home/slienceme# tar -zxvf helm-v2.16.3-linux-amd64.tar.gz
# linux-amd64/
# linux-amd64/README.md
# linux-amd64/LICENSE
# linux-amd64/tiller
# linux-amd64/helm

# 验证版本(master执行)
helm version

# 正常日志
# Client: &version.Version{SemVer:"v2.16.3", GitCommit:"dd2e5695da88625b190e6b22e9542550ab503a47", GitTreeState:"clean"}
# Error: could not find tiller

# 更新 Helm 权限
sudo chmod +x /usr/local/bin/helm
sudo chmod +x /usr/local/bin/tiller

# 创建权限(master执行) 文件见下面
vim helm-rbac.yaml

helm-rbac.yaml内容

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system

# 继续上面的操作
# 应用配置
kubectl apply -f helm-rbac.yaml

# 正常日志
# serviceaccount/tiller created
# clusterrolebinding.rbac.authorization.k8s.io/tiller created

# 安装Tiller(master执行)
helm init --service-account=tiller --tiller-image=jessestuart/tiller:v2.16.3 --history-max 300

# 默认的 stable chart 仓库不再维护 手动更换
helm init --service-account=tiller --tiller-image=jessestuart/tiller:v2.16.3 --history-max 300 --stable-repo-url https://charts.helm.sh/stable

# 正常日志
# Creating /root/.helm
# Creating /root/.helm/repository
# Creating /root/.helm/repository/cache
# Creating /root/.helm/repository/local
# Creating /root/.helm/plugins
# Creating /root/.helm/starters
# Creating /root/.helm/cache/archive
# Creating /root/.helm/repository/repositories.yaml
# Adding stable repo with URL: https://charts.helm.sh/stable
# Adding local repo with URL: http://127.0.0.1:8879/charts
# $HELM_HOME has been configured at /root/.helm.
# 
# Tiller (the Helm server-side component) has been installed into your Kubernetes Cluster.
# 
# Please note: by default, Tiller is deployed with an insecure 'allow unauthenticated users' policy.
# To prevent this, run `helm init` with the --tiller-tls-verify flag.
# For more information on securing your installation see: https://v2.helm.sh/docs/securing_installation/

# 可能报错
# Adding stable repo with URL: https://kubernetes-charts.storage.googleapis.com
# Error: error initializing: Looks like "https://kubernetes-charts.storage.googleapis.com" is not a valid chart repository or cannot be reached: Failed to fetch https://kubernetes-charts.storage.googleapis.com/index.yaml : 404 Not Found	
# 默认的 stable chart 仓库不再维护 手动更换

# 验证安装成功：回车后有帮助菜单算安装成功
helm
tiller

kubectl get pods --all-namespaces

# 正常日志
# NAMESPACE      NAME                                READY   STATUS             RESTARTS   AGE
# kube-flannel   kube-flannel-ds-66bcs               1/1     Running            0          11h
# kube-flannel   kube-flannel-ds-ntwwx               1/1     Running            0          11h
# kube-flannel   kube-flannel-ds-vb6n9               1/1     Running            0          11h
# kube-system    coredns-7f9c544f75-67njd            0/1     CrashLoopBackOff   9          12h
# kube-system    coredns-7f9c544f75-z82nl            0/1     CrashLoopBackOff   9          12h
# kube-system    etcd-k8s-node1                      1/1     Running            0          12h
# kube-system    kube-apiserver-k8s-node1            1/1     Running            0          12h
# kube-system    kube-controller-manager-k8s-node1   1/1     Running            0          12h
# kube-system    kube-proxy-fs2p6                    1/1     Running            0          11h
# kube-system    kube-proxy-x7rkp                    1/1     Running            0          11h
# kube-system    kube-proxy-xpbvt                    1/1     Running            0          12h
# kube-system    kube-scheduler-k8s-node1            1/1     Running            0          12h
# kube-system    tiller-deploy-6ffcfbc8df-mzwd7      1/1     Running            0          2m6s

kubectl get nodes -o wide

# 正常日志
# NAME STATUS ROLES  AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE        KERNEL-VERSION     CONTAINER-RUNTIME
# k8s-node1 Ready  master 12h v1.17.3 192.168.137.130 <none>  Ubuntu 20.04.6 LTS  5.4.0-208-generic  docker://19.3.15
# k8s-node2 Ready  <none> 11h v1.17.3 192.168.137.131 <none>  Ubuntu 20.04.6 LTS  5.4.0-208-generic  docker://19.3.15
# k8s-node3 Ready  <none> 11h v1.17.3 192.168.137.132 <none>  Ubuntu 20.04.6 LTS  5.4.0-208-generic  docker://19.3.15

# 确定master节点有taint
kubectl describe node k8s-node1 | grep Taint  

# 正常日志
# Taints:             node-role.kubernetes.io/master:NoSchedule  

# 有则取消taint
kubectl taint nodes k8s-node1 node-role.kubernetes.io/master:NoSchedule-

# 正常日志
# node/k8s-node1 untainted

OpenEBS 是一个开源的 容器化存储解决方案，它为 Kubernetes 提供了 持久存储（Persistent Storage） 支持。OpenEBS 可以帮助 Kubernetes 提供 块存储 和 文件存储，并且支持动态存储卷的创建和管理。它通过在 Kubernetes 上运行的 CStor、Jiva、Mayastor 等存储引擎，来为容器提供持久存储卷。

OpenEBS 的主要功能：

持久化存储管理：在 Kubernetes 集群中为容器提供持久化存储卷，保证容器的状态在重启或迁移后得以保留。
动态存储卷管理：OpenEBS 支持根据需求动态创建存储卷，避免手动管理存储的复杂性。
高可用性和容错：提供存储的高可用性和容错机制，确保容器的存储不会因为节点故障而丢失数据。
易于扩展：可以根据需求横向扩展存储容量，支持容器化的环境。

# 创建命名空间
kubectl create ns openebs

# 正常日志
# namespace/openebs created

# 查询一下当前全部命名空间
kubectl get ns

# 正常日志
# NAME              STATUS   AGE
# default           Active   13h
# kube-flannel      Active   13h
# kube-node-lease   Active   13h
# kube-public       Active   13h
# kube-system       Active   13h
# openebs           Active   38s

# 直接采用文件形式安装
vim openebs-operator-1.5.0.yaml
kubectl apply -f openebs-operator-1.5.0.yaml

# 正常日志
# Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply namespace/openebs configured
# serviceaccount/openebs-maya-operator created
# clusterrole.rbac.authorization.k8s.io/openebs-maya-operator created
# clusterrolebinding.rbac.authorization.k8s.io/openebs-maya-operator created
# deployment.apps/maya-apiserver created
# service/maya-apiserver-service created
# deployment.apps/openebs-provisioner created
# deployment.apps/openebs-snapshot-operator created
# configmap/openebs-ndm-config created
# daemonset.apps/openebs-ndm created
# deployment.apps/openebs-ndm-operator created
# deployment.apps/openebs-admission-server created
# deployment.apps/openebs-localpv-provisioner created

# 检查一下安装的效果
kubectl get sc -n openebs

# 正常日志
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
# openebs-device            openebs.io/local             Delete  WaitForFirstConsumer   false 62s
# openebs-hostpath          openebs.io/local             Delete  WaitForFirstConsumer   false 62s
# openebs-jiva-default      openebs.io/provisioner-iscsi Delete  Immediate              false 62s
# openebs-snapshot-promoter volumesnapshot.external-storage.k8s.io/snapshot-promoter Delete Immediate false 62s


# 检查一下
kubectl get pods --all-namespaces

# 正常日志
# NAMESPACE      NAME                                           READY   STATUS    RESTARTS   AGE
# kube-flannel   kube-flannel-ds-66bcs                          1/1     Running   2          13h
# kube-flannel   kube-flannel-ds-ntwwx                          1/1     Running   0          13h
# kube-flannel   kube-flannel-ds-vb6n9                          1/1     Running   6          13h
# kube-system    coredns-57dd576fcb-44722                       1/1     Running   1          57m
# kube-system    coredns-57dd576fcb-pvhvv                       1/1     Running   1          57m
# kube-system    etcd-k8s-node1                                 1/1     Running   5          13h
# kube-system    kube-apiserver-k8s-node1                       1/1     Running   5          13h
# kube-system    kube-controller-manager-k8s-node1              1/1     Running   4          13h
# kube-system    kube-proxy-fs2p6                               1/1     Running   1          13h
# kube-system    kube-proxy-x7rkp                               1/1     Running   0          13h
# kube-system    kube-proxy-xpbvt                               1/1     Running   4          13h
# kube-system    kube-scheduler-k8s-node1                       1/1     Running   4          13h
# kube-system    tiller-deploy-6ffcfbc8df-mzwd7                 1/1     Running   1          98m
# openebs        maya-apiserver-7f664b95bb-s2zv6                1/1     Running   0          5m8s
# openebs        openebs-admission-server-889d78f96-v5pwd       1/1     Running   0          5m8s
# openebs        openebs-localpv-provisioner-67bddc8568-ms88s   1/1     Running   0          5m8s
# openebs        openebs-ndm-hv64z                              1/1     Running   0          5m8s
# openebs        openebs-ndm-lpv9t                              1/1     Running   0          5m8s
# openebs        openebs-ndm-operator-5db67cd5bb-pl6gc          1/1     Running   1          5m8s
# openebs        openebs-ndm-sqqv2                              1/1     Running   0          5m8s
# openebs        openebs-provisioner-c68bfd6d4-sxpb2            1/1     Running   0          5m8s
# openebs        openebs-snapshot-operator-7ffd685677-x9wzj     2/2     Running   0          5m8s

# 将openebs-hostpath设置为默认的StorageClass：
kubectl patch storageclass openebs-hostpath -p \
 '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

# 正常日志
# storageclass.storage.k8s.io/openebs-hostpath patched


# kubectl taint nodes 用于在节点上添加污点，从而控制哪些 Pods 可以调度到这些节点。
# 结合容忍（Toleration），你可以实现精细的资源调度控制，避免某些 Pods 被调度到不适合的节点。

# 重新打上污点
kubectl taint nodes k8s-node1 node-role.kubernetes.io=master:NoSchedule

# 正常日志
# node/k8s-node1 tainted

# 最小化安装kubesphere  kubesphere-minimal.yaml见下面
vim kubesphere-minimal.yaml
kubectl apply -f kubesphere-minimal.yaml 
# 或 网络安装
kubectl apply -f https://raw.githubusercontent.com/kubesphere/ks-installer/v2.1.1/kubesphere-minimal.yaml

# 正常日志
# namespace/kubesphere-system created
# configmap/ks-installer created
# serviceaccount/ks-installer created
# clusterrole.rbac.authorization.k8s.io/ks-installer created
# clusterrolebinding.rbac.authorization.k8s.io/ks-installer created
# deployment.apps/ks-installer created

# 查看安装日志，请耐心等待安装成功
kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f
# 这个日志太长 单独列出

# 正常日志
# 2025-03-18T04:46:27Z INFO     : shell-operator v1.0.0-beta.5
# 2025-03-18T04:46:27Z INFO     : Use temporary dir: /tmp/shell-operator
# 2025-03-18T04:46:27Z INFO     : Initialize hooks manager ...
# 2025-03-18T04:46:27Z INFO     : Search and load hooks ...
# 2025-03-18T04:46:27Z INFO     : Load hook config from '/hooks/kubesphere/installRunner.py'
# 2025-03-18T04:46:27Z INFO     : HTTP SERVER Listening on 0.0.0.0:9115
# 2025-03-18T04:46:27Z INFO     : Initializing schedule manager ...
# 2025-03-18T04:46:27Z INFO     : KUBE Init Kubernetes client
# 2025-03-18T04:46:27Z INFO     : KUBE-INIT Kubernetes client is configured successfully
# 2025-03-18T04:46:27Z INFO     : MAIN: run main loop
# 2025-03-18T04:46:27Z INFO     : MAIN: add onStartup tasks
# 2025-03-18T04:46:27Z INFO     : Running schedule manager ...
# 2025-03-18T04:46:27Z INFO     : QUEUE add all HookRun@OnStartup
# 2025-03-18T04:46:27Z INFO     : MSTOR Create new metric shell_operator_live_ticks
# 2025-03-18T04:46:27Z INFO     : MSTOR Create new metric shell_operator_tasks_queue_length
# 2025-03-18T04:46:27Z INFO     : GVR for kind 'ConfigMap' is /v1, Resource=configmaps
# 2025-03-18T04:46:27Z INFO     : EVENT Kube event 'b85d730a-83a0-4b72-b9a3-0653c27dc89e'
# 2025-03-18T04:46:27Z INFO     : QUEUE add TASK_HOOK_RUN@KUBE_EVENTS kubesphere/installRunner.py
# 2025-03-18T04:46:30Z INFO     : TASK_RUN HookRun@KUBE_EVENTS kubesphere/installRunner.py
# 2025-03-18T04:46:30Z INFO     : Running hook 'kubesphere/installRunner.py' binding 'KUBE_EVENTS' ...
# [WARNING]: No inventory was parsed, only implicit localhost is available
# [WARNING]: provided hosts list is empty, only localhost is available. Note that
# the implicit localhost does not match 'all'
# .................................................................
# .................................................................
# .................................................................
# Start installing monitoring
# **************************************************
# task monitoring status is successful
# total: 1     completed:1
# **************************************************
#####################################################
###              Welcome to KubeSphere!           ###
#####################################################

# Console: http://192.168.137.130:30880
# Account: admin
# Password: P@88w0rd

# NOTES：
#   1. After logging into the console, please check the
#      monitoring status of service components in
#      the "Cluster Status". If the service is not
#      ready, please wait patiently. You can start
#      to use when all components are ready.
#   2. Please modify the default password after login.
# 
########################################################

3.7 卡住

TASK [common : Kubesphere | Deploy openldap] ***********************************
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "/usr/local/bin/helm upgrade --install ks-openldap /etc/kubesphere/openldap-ha -f /etc/kubesphere/custom-values-openldap.yaml --set fullnameOverride=openldap --namespace kubesphere-system\n", "delta": "0:00:00.710229", "end": "2025-03-18 09:15:43.271042", "msg": "non-zero return code", "rc": 1, "start": "2025-03-18 09:15:42.560813", "stderr": "Error: UPGRADE FAILED: \"ks-openldap\" has no deployed releases", "stderr_lines": ["Error: UPGRADE FAILED: \"ks-openldap\" has no deployed releases"], "stdout": "", "stdout_lines": []}

配置文件

openebs-operator-1.5.0.yaml

openebs-operator-1.5.0.yaml内容

# This manifest deploys the OpenEBS control plane components, with associated CRs & RBAC rules
# NOTE: On GKE, deploy the openebs-operator.yaml in admin context

# Create the OpenEBS namespace
apiVersion: v1
kind: Namespace
metadata:
  name: openebs
---
# Create Maya Service Account
apiVersion: v1
kind: ServiceAccount
metadata:
  name: openebs-maya-operator
  namespace: openebs
---
# Define Role that allows operations on K8s pods/deployments
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: openebs-maya-operator
rules:
- apiGroups: ["*"]
  resources: ["nodes", "nodes/proxy"]
  verbs: ["*"]
- apiGroups: ["*"]
  resources: ["namespaces", "services", "pods", "pods/exec", "deployments", "deployments/finalizers", "replicationcontrollers", "replicasets", "events", "endpoints", "configmaps", "secrets", "jobs", "cronjobs"]
  verbs: ["*"]
- apiGroups: ["*"]
  resources: ["statefulsets", "daemonsets"]
  verbs: ["*"]
- apiGroups: ["*"]
  resources: ["resourcequotas", "limitranges"]
  verbs: ["list", "watch"]
- apiGroups: ["*"]
  resources: ["ingresses", "horizontalpodautoscalers", "verticalpodautoscalers", "poddisruptionbudgets", "certificatesigningrequests"]
  verbs: ["list", "watch"]
- apiGroups: ["*"]
  resources: ["storageclasses", "persistentvolumeclaims", "persistentvolumes"]
  verbs: ["*"]
- apiGroups: ["volumesnapshot.external-storage.k8s.io"]
  resources: ["volumesnapshots", "volumesnapshotdatas"]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]
- apiGroups: ["apiextensions.k8s.io"]
  resources: ["customresourcedefinitions"]
  verbs: [ "get", "list", "create", "update", "delete", "patch"]
- apiGroups: ["*"]
  resources: [ "disks", "blockdevices", "blockdeviceclaims"]
  verbs: ["*" ]
- apiGroups: ["*"]
  resources: [ "cstorpoolclusters", "storagepoolclaims", "storagepoolclaims/finalizers", "cstorpoolclusters/finalizers", "storagepools"]
  verbs: ["*" ]
- apiGroups: ["*"]
  resources: [ "castemplates", "runtasks"]
  verbs: ["*" ]
- apiGroups: ["*"]
  resources: [ "cstorpools", "cstorpools/finalizers", "cstorvolumereplicas", "cstorvolumes", "cstorvolumeclaims"]
  verbs: ["*" ]
- apiGroups: ["*"]
  resources: [ "cstorpoolinstances", "cstorpoolinstances/finalizers"]
  verbs: ["*" ]
- apiGroups: ["*"]
  resources: [ "cstorbackups", "cstorrestores", "cstorcompletedbackups"]
  verbs: ["*" ]
- apiGroups: ["coordination.k8s.io"]
  resources: ["leases"]
  verbs: ["get", "watch", "list", "delete", "update", "create"]
- apiGroups: ["admissionregistration.k8s.io"]
  resources: ["validatingwebhookconfigurations", "mutatingwebhookconfigurations"]
  verbs: ["get", "create", "list", "delete", "update", "patch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]
- apiGroups: ["*"]
  resources: [ "upgradetasks"]
  verbs: ["*" ]
---
# Bind the Service Account with the Role Privileges.
# TODO: Check if default account also needs to be there
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: openebs-maya-operator
subjects:
- kind: ServiceAccount
  name: openebs-maya-operator
  namespace: openebs
roleRef:
  kind: ClusterRole
  name: openebs-maya-operator
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: maya-apiserver
  namespace: openebs
  labels:
    name: maya-apiserver
    openebs.io/component-name: maya-apiserver
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: maya-apiserver
      openebs.io/component-name: maya-apiserver
  replicas: 1
  strategy:
    type: Recreate
    rollingUpdate: null
  template:
    metadata:
      labels:
        name: maya-apiserver
        openebs.io/component-name: maya-apiserver
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
      - name: maya-apiserver
        imagePullPolicy: IfNotPresent
        image: quay.io/openebs/m-apiserver:1.5.0
        ports:
        - containerPort: 5656
        env:
        # OPENEBS_IO_KUBE_CONFIG enables maya api service to connect to K8s
        # based on this config. This is ignored if empty.
        # This is supported for maya api server version 0.5.2 onwards
        #- name: OPENEBS_IO_KUBE_CONFIG
        #  value: "/home/ubuntu/.kube/config"
        # OPENEBS_IO_K8S_MASTER enables maya api service to connect to K8s
        # based on this address. This is ignored if empty.
        # This is supported for maya api server version 0.5.2 onwards
        #- name: OPENEBS_IO_K8S_MASTER
        #  value: "http://172.28.128.3:8080"
        # OPENEBS_NAMESPACE provides the namespace of this deployment as an
        # environment variable
        - name: OPENEBS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # OPENEBS_SERVICE_ACCOUNT provides the service account of this pod as
        # environment variable
        - name: OPENEBS_SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        # OPENEBS_MAYA_POD_NAME provides the name of this pod as
        # environment variable
        - name: OPENEBS_MAYA_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        # If OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG is false then OpenEBS default
        # storageclass and storagepool will not be created.
        - name: OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG
          value: "true"
        # OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL decides whether default cstor sparse pool should be
        # configured as a part of openebs installation.
        # If "true" a default cstor sparse pool will be configured, if "false" it will not be configured.
        # This value takes effect only if OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG
        # is set to true
        - name: OPENEBS_IO_INSTALL_DEFAULT_CSTOR_SPARSE_POOL
          value: "false"
        # OPENEBS_IO_CSTOR_TARGET_DIR can be used to specify the hostpath
        # to be used for saving the shared content between the side cars
        # of cstor volume pod.
        # The default path used is /var/openebs/sparse
        #- name: OPENEBS_IO_CSTOR_TARGET_DIR
        #  value: "/var/openebs/sparse"
        # OPENEBS_IO_CSTOR_POOL_SPARSE_DIR can be used to specify the hostpath
        # to be used for saving the shared content between the side cars
        # of cstor pool pod. This ENV is also used to indicate the location
        # of the sparse devices.
        # The default path used is /var/openebs/sparse
        #- name: OPENEBS_IO_CSTOR_POOL_SPARSE_DIR
        #  value: "/var/openebs/sparse"
        # OPENEBS_IO_JIVA_POOL_DIR can be used to specify the hostpath
        # to be used for default Jiva StoragePool loaded by OpenEBS
        # The default path used is /var/openebs
        # This value takes effect only if OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG
        # is set to true
        #- name: OPENEBS_IO_JIVA_POOL_DIR
        #  value: "/var/openebs"
        # OPENEBS_IO_LOCALPV_HOSTPATH_DIR can be used to specify the hostpath
        # to be used for default openebs-hostpath storageclass loaded by OpenEBS
        # The default path used is /var/openebs/local
        # This value takes effect only if OPENEBS_IO_CREATE_DEFAULT_STORAGE_CONFIG
        # is set to true
        #- name: OPENEBS_IO_LOCALPV_HOSTPATH_DIR
        #  value: "/var/openebs/local"
        - name: OPENEBS_IO_JIVA_CONTROLLER_IMAGE
          value: "quay.io/openebs/jiva:1.5.0"
        - name: OPENEBS_IO_JIVA_REPLICA_IMAGE
          value: "quay.io/openebs/jiva:1.5.0"
        - name: OPENEBS_IO_JIVA_REPLICA_COUNT
          value: "3"
        - name: OPENEBS_IO_CSTOR_TARGET_IMAGE
          value: "quay.io/openebs/cstor-istgt:1.5.0"
        - name: OPENEBS_IO_CSTOR_POOL_IMAGE
          value: "quay.io/openebs/cstor-pool:1.5.0"
        - name: OPENEBS_IO_CSTOR_POOL_MGMT_IMAGE
          value: "quay.io/openebs/cstor-pool-mgmt:1.5.0"
        - name: OPENEBS_IO_CSTOR_VOLUME_MGMT_IMAGE
          value: "quay.io/openebs/cstor-volume-mgmt:1.5.0"
        - name: OPENEBS_IO_VOLUME_MONITOR_IMAGE
          value: "quay.io/openebs/m-exporter:1.5.0"
        - name: OPENEBS_IO_CSTOR_POOL_EXPORTER_IMAGE
          value: "quay.io/openebs/m-exporter:1.5.0"
        - name: OPENEBS_IO_HELPER_IMAGE
          value: "quay.io/openebs/linux-utils:1.5.0"
        # OPENEBS_IO_ENABLE_ANALYTICS if set to true sends anonymous usage
        # events to Google Analytics
        - name: OPENEBS_IO_ENABLE_ANALYTICS
          value: "true"
        - name: OPENEBS_IO_INSTALLER_TYPE
          value: "openebs-operator"
        # OPENEBS_IO_ANALYTICS_PING_INTERVAL can be used to specify the duration (in hours)
        # for periodic ping events sent to Google Analytics.
        # Default is 24h.
        # Minimum is 1h. You can convert this to weekly by setting 168h
        #- name: OPENEBS_IO_ANALYTICS_PING_INTERVAL
        #  value: "24h"
        livenessProbe:
          exec:
            command:
            - /usr/local/bin/mayactl
            - version
          initialDelaySeconds: 30
          periodSeconds: 60
        readinessProbe:
          exec:
            command:
            - /usr/local/bin/mayactl
            - version
          initialDelaySeconds: 30
          periodSeconds: 60
---
apiVersion: v1
kind: Service
metadata:
  name: maya-apiserver-service
  namespace: openebs
  labels:
    openebs.io/component-name: maya-apiserver-svc
spec:
  ports:
  - name: api
    port: 5656
    protocol: TCP
    targetPort: 5656
  selector:
    name: maya-apiserver
  sessionAffinity: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openebs-provisioner
  namespace: openebs
  labels:
    name: openebs-provisioner
    openebs.io/component-name: openebs-provisioner
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: openebs-provisioner
      openebs.io/component-name: openebs-provisioner
  replicas: 1
  strategy:
    type: Recreate
    rollingUpdate: null
  template:
    metadata:
      labels:
        name: openebs-provisioner
        openebs.io/component-name: openebs-provisioner
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
      - name: openebs-provisioner
        imagePullPolicy: IfNotPresent
        image: quay.io/openebs/openebs-k8s-provisioner:1.5.0
        env:
        # OPENEBS_IO_K8S_MASTER enables openebs provisioner to connect to K8s
        # based on this address. This is ignored if empty.
        # This is supported for openebs provisioner version 0.5.2 onwards
        #- name: OPENEBS_IO_K8S_MASTER
        #  value: "http://10.128.0.12:8080"
        # OPENEBS_IO_KUBE_CONFIG enables openebs provisioner to connect to K8s
        # based on this config. This is ignored if empty.
        # This is supported for openebs provisioner version 0.5.2 onwards
        #- name: OPENEBS_IO_KUBE_CONFIG
        #  value: "/home/ubuntu/.kube/config"
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: OPENEBS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
        # that provisioner should forward the volume create/delete requests.
        # If not present, "maya-apiserver-service" will be used for lookup.
        # This is supported for openebs provisioner version 0.5.3-RC1 onwards
        #- name: OPENEBS_MAYA_SERVICE_NAME
        #  value: "maya-apiserver-apiservice"
        livenessProbe:
          exec:
            command:
            - pgrep
            - ".*openebs"
          initialDelaySeconds: 30
          periodSeconds: 60
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openebs-snapshot-operator
  namespace: openebs
  labels:
    name: openebs-snapshot-operator
    openebs.io/component-name: openebs-snapshot-operator
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: openebs-snapshot-operator
      openebs.io/component-name: openebs-snapshot-operator
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: openebs-snapshot-operator
        openebs.io/component-name: openebs-snapshot-operator
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
        - name: snapshot-controller
          image: quay.io/openebs/snapshot-controller:1.5.0
          imagePullPolicy: IfNotPresent
          env:
          - name: OPENEBS_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          livenessProbe:
            exec:
              command:
              - pgrep
              - ".*controller"
            initialDelaySeconds: 30
            periodSeconds: 60
        # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
        # that snapshot controller should forward the snapshot create/delete requests.
        # If not present, "maya-apiserver-service" will be used for lookup.
        # This is supported for openebs provisioner version 0.5.3-RC1 onwards
        #- name: OPENEBS_MAYA_SERVICE_NAME
        #  value: "maya-apiserver-apiservice"
        - name: snapshot-provisioner
          image: quay.io/openebs/snapshot-provisioner:1.5.0
          imagePullPolicy: IfNotPresent
          env:
          - name: OPENEBS_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        # OPENEBS_MAYA_SERVICE_NAME provides the maya-apiserver K8s service name,
        # that snapshot provisioner  should forward the clone create/delete requests.
        # If not present, "maya-apiserver-service" will be used for lookup.
        # This is supported for openebs provisioner version 0.5.3-RC1 onwards
        #- name: OPENEBS_MAYA_SERVICE_NAME
        #  value: "maya-apiserver-apiservice"
          livenessProbe:
            exec:
              command:
              - pgrep
              - ".*provisioner"
            initialDelaySeconds: 30
            periodSeconds: 60
---
# This is the node-disk-manager related config.
# It can be used to customize the disks probes and filters
apiVersion: v1
kind: ConfigMap
metadata:
  name: openebs-ndm-config
  namespace: openebs
  labels:
    openebs.io/component-name: ndm-config
data:
  # udev-probe is default or primary probe which should be enabled to run ndm
  # filterconfigs contails configs of filters - in their form fo include
  # and exclude comma separated strings
  node-disk-manager.config: |
    probeconfigs:
      - key: udev-probe
        name: udev probe
        state: true
      - key: seachest-probe
        name: seachest probe
        state: false
      - key: smart-probe
        name: smart probe
        state: true
    filterconfigs:
      - key: os-disk-exclude-filter
        name: os disk exclude filter
        state: true
        exclude: "/,/etc/hosts,/boot"
      - key: vendor-filter
        name: vendor filter
        state: true
        include: ""
        exclude: "CLOUDBYT,OpenEBS"
      - key: path-filter
        name: path filter
        state: true
        include: ""
        exclude: "loop,/dev/fd0,/dev/sr0,/dev/ram,/dev/dm-,/dev/md"
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: openebs-ndm
  namespace: openebs
  labels:
    name: openebs-ndm
    openebs.io/component-name: ndm
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: openebs-ndm
      openebs.io/component-name: ndm
  updateStrategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        name: openebs-ndm
        openebs.io/component-name: ndm
        openebs.io/version: 1.5.0
    spec:
      # By default the node-disk-manager will be run on all kubernetes nodes
      # If you would like to limit this to only some nodes, say the nodes
      # that have storage attached, you could label those node and use
      # nodeSelector.
      #
      # e.g. label the storage nodes with - "openebs.io/nodegroup"="storage-node"
      # kubectl label node <node-name> "openebs.io/nodegroup"="storage-node"
      #nodeSelector:
      #  "openebs.io/nodegroup": "storage-node"
      serviceAccountName: openebs-maya-operator
      hostNetwork: true
      containers:
      - name: node-disk-manager
        image: quay.io/openebs/node-disk-manager-amd64:v0.4.5
        imagePullPolicy: Always
        securityContext:
          privileged: true
        volumeMounts:
        - name: config
          mountPath: /host/node-disk-manager.config
          subPath: node-disk-manager.config
          readOnly: true
        - name: udev
          mountPath: /run/udev
        - name: procmount
          mountPath: /host/proc
          readOnly: true
        - name: sparsepath
          mountPath: /var/openebs/sparse
        env:
        # namespace in which NDM is installed will be passed to NDM Daemonset
        # as environment variable
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # pass hostname as env variable using downward API to the NDM container
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        # specify the directory where the sparse files need to be created.
        # if not specified, then sparse files will not be created.
        - name: SPARSE_FILE_DIR
          value: "/var/openebs/sparse"
        # Size(bytes) of the sparse file to be created.
        - name: SPARSE_FILE_SIZE
          value: "10737418240"
        # Specify the number of sparse files to be created
        - name: SPARSE_FILE_COUNT
          value: "0"
        livenessProbe:
          exec:
            command:
            - pgrep
            - ".*ndm"
          initialDelaySeconds: 30
          periodSeconds: 60
      volumes:
      - name: config
        configMap:
          name: openebs-ndm-config
      - name: udev
        hostPath:
          path: /run/udev
          type: Directory
      # mount /proc (to access mount file of process 1 of host) inside container
      # to read mount-point of disks and partitions
      - name: procmount
        hostPath:
          path: /proc
          type: Directory
      - name: sparsepath
        hostPath:
          path: /var/openebs/sparse
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openebs-ndm-operator
  namespace: openebs
  labels:
    name: openebs-ndm-operator
    openebs.io/component-name: ndm-operator
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: openebs-ndm-operator
      openebs.io/component-name: ndm-operator
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: openebs-ndm-operator
        openebs.io/component-name: ndm-operator
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
        - name: node-disk-operator
          image: quay.io/openebs/node-disk-operator-amd64:v0.4.5
          imagePullPolicy: Always
          readinessProbe:
            exec:
              command:
                - stat
                - /tmp/operator-sdk-ready
            initialDelaySeconds: 4
            periodSeconds: 10
            failureThreshold: 1
          env:
            - name: WATCH_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            # the service account of the ndm-operator pod
            - name: SERVICE_ACCOUNT
              valueFrom:
                fieldRef:
                  fieldPath: spec.serviceAccountName
            - name: OPERATOR_NAME
              value: "node-disk-operator"
            - name: CLEANUP_JOB_IMAGE
              value: "quay.io/openebs/linux-utils:1.5.0"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openebs-admission-server
  namespace: openebs
  labels:
    app: admission-webhook
    openebs.io/component-name: admission-webhook
    openebs.io/version: 1.5.0
spec:
  replicas: 1
  strategy:
    type: Recreate
    rollingUpdate: null
  selector:
    matchLabels:
      app: admission-webhook
  template:
    metadata:
      labels:
        app: admission-webhook
        openebs.io/component-name: admission-webhook
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
        - name: admission-webhook
          image: quay.io/openebs/admission-server:1.5.0
          imagePullPolicy: IfNotPresent
          args:
            - -alsologtostderr
            - -v=2
            - 2>&1
          env:
            - name: OPENEBS_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: ADMISSION_WEBHOOK_NAME
              value: "openebs-admission-server"
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openebs-localpv-provisioner
  namespace: openebs
  labels:
    name: openebs-localpv-provisioner
    openebs.io/component-name: openebs-localpv-provisioner
    openebs.io/version: 1.5.0
spec:
  selector:
    matchLabels:
      name: openebs-localpv-provisioner
      openebs.io/component-name: openebs-localpv-provisioner
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        name: openebs-localpv-provisioner
        openebs.io/component-name: openebs-localpv-provisioner
        openebs.io/version: 1.5.0
    spec:
      serviceAccountName: openebs-maya-operator
      containers:
      - name: openebs-provisioner-hostpath
        imagePullPolicy: Always
        image: quay.io/openebs/provisioner-localpv:1.5.0
        env:
        # OPENEBS_IO_K8S_MASTER enables openebs provisioner to connect to K8s
        # based on this address. This is ignored if empty.
        # This is supported for openebs provisioner version 0.5.2 onwards
        #- name: OPENEBS_IO_K8S_MASTER
        #  value: "http://10.128.0.12:8080"
        # OPENEBS_IO_KUBE_CONFIG enables openebs provisioner to connect to K8s
        # based on this config. This is ignored if empty.
        # This is supported for openebs provisioner version 0.5.2 onwards
        #- name: OPENEBS_IO_KUBE_CONFIG
        #  value: "/home/ubuntu/.kube/config"
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: OPENEBS_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        # OPENEBS_SERVICE_ACCOUNT provides the service account of this pod as
        # environment variable
        - name: OPENEBS_SERVICE_ACCOUNT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: OPENEBS_IO_ENABLE_ANALYTICS
          value: "true"
        - name: OPENEBS_IO_INSTALLER_TYPE
          value: "openebs-operator"
        - name: OPENEBS_IO_HELPER_IMAGE
          value: "quay.io/openebs/linux-utils:1.5.0"
        livenessProbe:
          exec:
            command:
            - pgrep
            - ".*localpv"
          initialDelaySeconds: 30
          periodSeconds: 60
---

kubesphere-minimal.yaml

---
apiVersion: v1
kind: Namespace
metadata:
  name: kubesphere-system

---
apiVersion: v1
data:
  ks-config.yaml: |
    ---

    persistence:
      storageClass: ""

    etcd:
      monitoring: False
      endpointIps: 192.168.0.7,192.168.0.8,192.168.0.9
      port: 2379
      tlsEnable: True

    common:
      mysqlVolumeSize: 20Gi
      minioVolumeSize: 20Gi
      etcdVolumeSize: 20Gi
      openldapVolumeSize: 2Gi
      redisVolumSize: 2Gi

    metrics_server:
      enabled: False

    console:
      enableMultiLogin: False  # enable/disable multi login
      port: 30880

    monitoring:
      prometheusReplicas: 1
      prometheusMemoryRequest: 400Mi
      prometheusVolumeSize: 20Gi
      grafana:
        enabled: False

    logging:
      enabled: False
      elasticsearchMasterReplicas: 1
      elasticsearchDataReplicas: 1
      logsidecarReplicas: 2
      elasticsearchMasterVolumeSize: 4Gi
      elasticsearchDataVolumeSize: 20Gi
      logMaxAge: 7
      elkPrefix: logstash
      containersLogMountedPath: ""
      kibana:
        enabled: False

    openpitrix:
      enabled: False

    devops:
      enabled: False
      jenkinsMemoryLim: 2Gi
      jenkinsMemoryReq: 1500Mi
      jenkinsVolumeSize: 8Gi
      jenkinsJavaOpts_Xms: 512m
      jenkinsJavaOpts_Xmx: 512m
      jenkinsJavaOpts_MaxRAM: 2g
      sonarqube:
        enabled: False
        postgresqlVolumeSize: 8Gi

    servicemesh:
      enabled: False

    notification:
      enabled: False

    alerting:
      enabled: False

kind: ConfigMap
metadata:
  name: ks-installer
  namespace: kubesphere-system

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: ks-installer
  namespace: kubesphere-system

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: null
  name: ks-installer
rules:
- apiGroups:
  - ""
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - extensions
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - batch
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - rbac.authorization.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - apiregistration.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - tenant.kubesphere.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - certificates.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - devops.kubesphere.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - monitoring.coreos.com
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - logging.kubesphere.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - jaegertracing.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - storage.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - '*'
  verbs:
  - '*'

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: ks-installer
subjects:
- kind: ServiceAccount
  name: ks-installer
  namespace: kubesphere-system
roleRef:
  kind: ClusterRole
  name: ks-installer
  apiGroup: rbac.authorization.k8s.io

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    app: ks-install
spec:
  replicas: 1
  selector:
    matchLabels:
      app: ks-install
  template:
    metadata:
      labels:
        app: ks-install
    spec:
      serviceAccountName: ks-installer
      containers:
      - name: installer
        image: kubesphere/ks-installer:v2.1.1
        imagePullPolicy: "Always"

【Q&A】问题汇总

01-coredns工作不正常

# 问题描述
# root@k8s-node1:/home/slienceme# kubectl get pods --all-namespaces
# NAMESPACE      NAME                                READY   STATUS             RESTARTS   AGE
# kube-flannel   kube-flannel-ds-66bcs               1/1     Running            0          12h
# kube-flannel   kube-flannel-ds-ntwwx               1/1     Running            0          12h
# kube-flannel   kube-flannel-ds-vb6n9               1/1     Running            5          12h
# kube-system    coredns-7f9c544f75-67njd            0/1     CrashLoopBackOff   17         12h   // [!code focus:8]
# kube-system    coredns-7f9c544f75-z82nl            0/1     CrashLoopBackOff   17         12h
# kube-system    etcd-k8s-node1                      1/1     Running            4          12h
# kube-system    kube-apiserver-k8s-node1            1/1     Running            4          12h
# kube-system    kube-controller-manager-k8s-node1   1/1     Running            3          12h
# kube-system    kube-proxy-fs2p6                    1/1     Running            0          12h
# kube-system    kube-proxy-x7rkp                    1/1     Running            0          12h
# kube-system    kube-proxy-xpbvt                    1/1     Running            3          12h
# kube-system    kube-scheduler-k8s-node1            1/1     Running            3          12h
# kube-system    tiller-deploy-6ffcfbc8df-mzwd7      1/1     Running            0          39m

通过对问题的追踪，在github issue找到 CoreDNS pod goes to CrashLoopBackOff State

解决流程如下：

你需要修改 Kubernetes 中的 CoreDNS 配置，具体操作步骤如下：

步骤 1: 修改 CoreDNS 配置ConfigMap

kubectl edit configmap coredns -n kube-system

步骤 2: 修改 Corefile 配置

修改 Corefile 中的 forward 配置，当前配置为：

forward . /etc/resolv.conf

你需要将其修改为指向外部 DNS 服务器，比如 Google 的公共 DNS 服务器 8.8.8.8 或 1.1.1.1（Cloudflare DNS）。例如，你可以修改为：

forward . 8.8.8.8

或者如果你希望使用多个 DNS 服务器，可以配置多个地址：

forward . 8.8.8.8 8.8.4.4

步骤 3: 保存并退出

步骤 4: 验证配置生效

验证 CoreDNS 配置是否生效，可以查看 CoreDNS Pod 是否正常运行，并且配置是否正确生效。

kubectl get pods -n kube-system -l k8s-app=kube-dns

如果需要，也可以重启 CoreDNS Pods，以确保新的配置生效：

kubectl rollout restart deployment coredns -n kube-system

通过这些步骤，你就能避免 CoreDNS 发生循环请求，确保 DNS 请求被转发到外部的 DNS 服务器，而不是 CoreDNS 本身。

02-kubeSphere安装异常

要彻底清理 KubeSphere 之前的安装并重新进行安装，建议按照以下步骤操作：

步骤 1: 删除之前的 KubeSphere 安装资源

使用 kubectl delete 命令删除所有相关资源。这将删除 KubeSphere 安装过程中创建的所有对象（例如 Deployment、Pod、ConfigMap、Service 等）。

kubectl delete -f kubesphere-minimal.yaml

如果你使用了其他 YAML 文件来安装 KubeSphere，还需要分别删除它们。

删除 KubeSphere 的命名空间：

如果不再需要 kubesphere-system 命名空间，可以删除该命名空间，它包含所有 KubeSphere 相关资源。

kubectl delete namespace kubesphere-system

注意：删除命名空间后，该命名空间中的所有资源都会被删除。

步骤 2: 重新部署 KubeSphere

确保所有资源已清理干净：

你可以通过以下命令检查 KubeSphere 的资源是否已经完全删除：

kubectl get namespaces
kubectl get all -n kubesphere-system

确保 KubeSphere 的相关资源已经不再列出。

重新应用 kubesphere-minimal.yaml 配置文件：

重新部署 KubeSphere，运行以下命令：

kubectl apply -f kubesphere-minimal.yaml
# 或
kubectl apply -f https://raw.githubusercontent.com/kubesphere/ks-installer/v2.1.1/kubesphere-minimal.yaml

这将重新启动 KubeSphere 安装过程，确保你有一个干净的安装环境。

步骤 3: 监控安装过程

你可以使用以下命令监控安装过程的日志，特别是监控安装器 Pod 的日志：

kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

步骤 4: 验证安装

检查 KubeSphere 是否成功部署：

安装完成后，检查 KubeSphere 的组件是否正常运行：

kubectl get pods -n kubesphere-system

确保所有 Pod 的状态为 Running 或 Completed。

访问 KubeSphere 控制台：

如果你配置了 ingress 或 NodePort 进行外部访问，可以通过浏览器访问 KubeSphere 控制台，确保安装完成并能够登录。

03-kubeSphere相关node不正常

ks-apigateway-78bcdc8ffc-z7cgm

kubectl get pods -n kubesphere-system

# NAMESPACE                      NAME                                          READY   STATUS      RESTARTS   AGE
# kubesphere-system              ks-account-596657f8c6-l8qzq                    0/1     Init:0/2   0          11m
# kubesphere-system              ks-apigateway-78bcdc8ffc-z7cgm                 0/1     Error      7          11m
# kubesphere-system              ks-apiserver-5b548d7c5c-gj47l                  1/1     Running    0          11m
# kubesphere-system              ks-console-78bcf96dbf-64j7h                    1/1     Running    0          11m
# kubesphere-system              ks-controller-manager-696986f8d9-kd5bl         1/1     Running    0          11m
# kubesphere-system              ks-installer-75b8d89dff-mm5z5                  1/1     Running    0          12m
# kubesphere-system              redis-6fd6c6d6f9-x2l5w                         1/1     Running    0          12m


# 先看看日志
kubectl logs -n kubesphere-system ks-apigateway-78bcdc8ffc-z7cgm

# 2025/03/18 04:59:10 [INFO][cache:0xc0000c1130] Started certificate maintenance routine
# [DEV NOTICE] Registered directive 'authenticate' before 'jwt'
# [DEV NOTICE] Registered directive 'authentication' before 'jwt'
# [DEV NOTICE] Registered directive 'swagger' before 'jwt'
# Activating privacy features... done.
# E0318 04:59:15.962304       1 redis.go:51] unable to reach redis hostdial tcp 10.96.171.117:6379: i/o timeout 
# 2025/03/18 04:59:15 dial tcp 10.96.171.117:6379: i/o timeout

# unable to reach redis hostdial tcp 10.96.171.117:6379: i/o timeout 
# 发现Redis问题
kubectl logs -n kubesphere-system redis-6fd6c6d6f9-x2l5w

# 1:C 18 Mar 2025 04:47:14.085 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
# ..........
# 1:M 18 Mar 2025 04:47:14.086 * Ready to accept connections
# 看着是正常的

# 查看一下服务
kubectl get svc -n kubesphere-system
# NAME            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
# ks-account      ClusterIP   10.96.85.94     <none>        80/TCP         18m
# ks-apigateway   ClusterIP   10.96.114.5     <none>        80/TCP         18m
# ks-apiserver    ClusterIP   10.96.229.139   <none>        80/TCP         18m
# ks-console      NodePort    10.96.18.47     <none>        80:30880/TCP   18m
# redis           ClusterIP   10.96.171.117   <none>        6379/TCP       18m

# 重启 redis 和 ks-apigateway
kubectl delete pod -n kubesphere-system redis-6fd6c6d6f9-x2l5w
kubectl delete pod -n kubesphere-system ks-apigateway-78bcdc8ffc-z7cgm
# ...这个解决不了，突然就好了

ks-account-596657f8c6-l8qzq

# 先看看日志
kubectl logs -n kubesphere-system ks-account-596657f8c6-l8qzq

# W0318 06:00:23.940798       1 client_config.go:549] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
# E0318 06:00:25.758353       1 ldap.go:78] unable to read LDAP response packet: unexpected EOF
# E0318 06:00:25.758449       1 im.go:87] create default users unable to read LDAP response packet: unexpected EOF
# Error: unable to read LDAP response packet: unexpected EOF
# ......
# 2025/03/18 06:00:25 unable to read LDAP response packet: unexpected EOF