集群管理
- 1 集群管理
- 1.1 节点管理
- 1.1.1 令牌管理
- 1.1.2 集群扩缩容
- 1.1.3 集群升级
- 1.1.4 证书管理
- 1.2 数据管理
- 1.2.1 ETCD基础
- 1.2.2 ETCD实践
- 1.2.3 备份还原
- 1.2.4 ETCD集群
1 集群管理
1.1 节点管理
1.1.1 令牌管理
学习目标
这一节,我们从 令牌基础、令牌实践、小结 三个方面来学习。
令牌基础
简介
默认情况下,kubeadm在创建集群的时候,会使用tls的方式传输信息,所有集群node节点在加入集群的时候,也会应用该信息 -- 即token,默认情况下,该token是有存活时间的,也就是说,当token时间过期后,我们就无法使用相同的kubeadm join命令将新的节点加入到集群了。
查看集群初始化时候设定的token存活时间
[root@kubernetes-master1 ~]# grep ttl /data/kubernetes/cluster_init/kubeadm_init_1.23.8.yml
ttl: 96h0m0s
在token没有过期的时候查看当前集群的token列表
[root@kubernetes-master1 ~]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
abcdef.0123456789abcdef 2d 2062-07-28T11:25:18Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
结果显示:
还剩下2天的存活时间
token过期后的效果演示
[root@kubernetes-master1 ~]# kubeadm token list
[root@kubernetes-master1 ~]#
结果显示:
没有可用的token了
需求思路:
对于token过期后的新节点添加,我们只需要重新获取一个token,然后构建新的kubeadm join的命令即可。
准备新节点环境
从当前集群中移除 kubernetes-node3环境
[root@kubernetes-master1 ~]# kubectl delete node kubernetes-node3
node "kubernetes-node3" deleted
确认效果
[root@kubernetes-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 4d2h v1.23.8
kubernetes-master2 Ready control-plane,master 4d2h v1.23.8
kubernetes-master3 Ready control-plane,master 4d2h v1.23.8
kubernetes-node1 Ready <none> 4d2h v1.23.8
kubernetes-node2 Ready <none> 4d2h v1.23.8
kubernetes-node3环境清空所有集群环境
[root@kubernetes-node3 ~]# kubeadm reset
[root@kubernetes-node3 ~]# rm -f /etc/cni/net.d/*
[root@kubernetes-node3 ~]# reboot
令牌实践
查看历史token的列表
查看当前的token信息
[root@kubernetes-master1 ~]# kubeadm token list
[root@kubernetes-master1 ~]#
结果显示:
这是没有token历史记录
生成token方法
[root@kubernetes-master1 ~]# kubeadm token create
o4hkeo.yfs5qr4ashdjeic6
查看效果
[root@kubernetes-master1 ~]# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
o4hkeo.yfs5qr4ashdjeic6 23h 2062-07-29T14:28:02Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
结果显示:
生成了一个默认周期为24小时的token
获取ca证书sha256编码hash值
[root@kubernetes-master1 ~]# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
d8a77a69fb0b54cd72a692be83fdcd2c39f203bdc6e729b9d7d63cca3030cfcc
结果显示:
这个token就是我们为新节点加入在集群生成的内容。
生成添加结点命令
[root@kubernetes-node3 ~]# kubeadm join 10.0.0.200:6443 --token o4hkeo.yfs5qr4ashdjeic6 --discovery-token-ca-cert-hash sha256:d8a77a69fb0b54cd72a692be83fdcd2c39f203bdc6e729b9d7d63cca3030cfcc
主节点查看效果:
[root@kubernetes-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 4d3h v1.23.8
kubernetes-master2 Ready control-plane,master 4d3h v1.23.8
kubernetes-master3 Ready control-plane,master 4d3h v1.23.8
kubernetes-node1 Ready <none> 4d3h v1.23.8
kubernetes-node2 Ready <none> 4d3h v1.23.8
kubernetes-node3 Ready <none> 39s v1.23.8
结果显示:
新的节点已经添加成功了
精简方法
[root@kubernetes-master1 ~]# kubeadm token create --print-join-command
kubeadm join 10.0.0.200:6443 --token e36qre.9nhy8a3qofq2lgaa --discovery-token-ca-cert-hash sha256:d8a77a69fb0b54cd72a692be83fdcd2c39f203bdc6e729b9d7d63cca3030cfcc
结果显示:
使用 --print-join-command 方法可以更快的输出完整的新阶段添加到集群的命令。
小结
1.1.2 集群扩缩容
学习目标
这一节,我们从 集群缩容、集群扩容、小结 三个方面来学习。
集群缩容
简介
所谓的集群缩容,一般来说指的是工作节点的删减,对于工作节点的删减流程,我们还是需要遵循以下基本的流程的:
1 基本环境确认
2 冻结待删除节点
3 驱离待删除节点资源
4 执行节点删除命令
5 被删除节点清理集群环境
6 重启被删除节点主机
1 基本环境确认
准备应用环境
[root@kubernetes-master1 ~]# kubectl create deployment nginx-web --image=kubernetes-register.superopsmsb.com/superopsmsb/nginx_web:v0.1 --replicas=3
deployment.apps/nginx-web created
查看效果
[root@kubernetes-master1 ~]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-web-6c84585c7b-b6xc9 1/1 Running 0 6s 10.244.2.41 kubernetes-node2 <none> <none>
nginx-web-6c84585c7b-fsxdk 1/1 Running 0 6s 10.244.1.38 kubernetes-node1 <none> <none>
nginx-web-6c84585c7b-kxhc4 1/1 Running 0 6s 10.244.3.2 kubernetes-node3 <none> <none>
2 冻结待删除节点
使用cordon冻结节点
[root@kubernetes-master1 ~]# kubectl cordon kubernetes-node3
node/kubernetes-node3 cordoned
查看效果
[root@kubernetes-master1 ~]# kubectl get node kubernetes-node3
NAME STATUS ROLES AGE VERSION
kubernetes-node3 Ready,SchedulingDisabled <none> 5m3s v1.23.8
3 驱离待删除节点资源
使用drain命令清理一般资源对象
[root@kubernetes-master1 ~]# kubectl drain kubernetes-node3 --delete-emptydir-data --force
这种方式对于ds类型的资源无效
[root@kubernetes-master1 ~]# kubectl get pod -o wide -n kube-flannel | grep node3
kube-flannel-ds-5w746 1/1 Running 0 10m 10.0.0.17 kubernetes-node3 <none> <none>
强制将ds等相关资源驱离
[root@kubernetes-master1 ~]# kubectl taint node kubernetes-node3 diskfull=true:NoExecute
[root@kubernetes-master1 ~]# kubectl get pod -o wide -n kube-flannel | grep node3
[root@kubernetes-master1 ~]# kubectl get pod -o wide -n kube-system | grep node3
kube-proxy-69sgn 1/1 Running 0 11m 10.0.0.17 kubernetes-node3 <none> <none>
结果显示:
因为当前节点还需要被管理,所以需要留一个集群组件服务
node3节点确认效果
[root@kubernetes-node3 ~]# docker ps | grep -v NAME | wc -l
2
结果显示:
只剩下一个pause和应用容器了
4 执行节点删除命令
使用delete移除节点
[root@kubernetes-master1 ~]# kubectl delete nodes kubernetes-node3
node "kubernetes-node3" deleted
master节点确认效果
[root@kubernetes-master1 ~]# kubectl get nodes kubernetes-node3
Error from server (NotFound): nodes "kubernetes-node3" not found
6 被删除节点清理集群环境
集群环境还原
[root@kubernetes-node3 ~]# kubeadm reset
清理网络配置
[root@kubernetes-node3 ~]# rm -f /etc/cni/net.d/*
7 重启被删除节点主机
重启主机
[root@kubernetes-node3 ~]# reboot
注意:
这步的目的是将当期节点上遗留的网络相关信息记录全部清理
集群扩容
简介
所谓的集群扩容,一般来说指的是工作节点的增加,对于工作节点的增加流程,我们还是需要遵循以下基本的流程的:
1 待添加节点准备集群环境
主机名环境定制
主机内核调整
集群环境软件环境配置
2 添加节点到集群
定制添加节点的集群命令
执行节点添加命令
master节点确认效果
1 待添加节点准备集群环境
设定主机名
[root@localhost ~]# hostnamectl set-hostname kubernetes-node4
[root@localhost ~]# exec /bin/bash
[root@kubernetes-node4 ~]#
获取文件
[root@kubernetes-node4 ~]# mkdir /data/scripts -p ; cd /data/scripts
[root@kubernetes-node4 /data/scripts]# scp root@10.0.0.12:/data/scripts/* ./
[root@kubernetes-node4 /data/scripts]# scp root@10.0.0.12:/etc/yum.repos.d/kubernetes.repo /etc/yum.repos.d/kubernetes.repo
[root@kubernetes-node4 /data/scripts]# scp root@10.0.0.12:/etc/hosts /etc/hosts
主机内核调整
[root@kubernetes-node4 /data/scripts]# /bin/bash 02_kubernetes_kernel_conf.sh
vm.swappiness = 0
vm.swappiness = 0
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
集群环境软件环境配置
[root@kubernetes-node4 /data/scripts]# /bin/bash 03_kubernetes_docker_install.sh
准备docker环境
[root@kubernetes-node4 /data/scripts]# scp root@10.0.0.12:/etc/docker/daemon.json /etc/docker/daemon.json
重启docker服务
[root@kubernetes-node4 /data/scripts]# systemctl restart docker
[root@kubernetes-node4 /data/scripts]# systemctl enable docker
确认效果
[root@kubernetes-node4 /data/scripts]# docker info | egrep 'systemd|superopsmsb'
Cgroup Driver: systemd
kubernetes-register.superopsmsb.com
安装基础软件
[root@kubernetes-node4 ~]# yum install -y kubelet-1.23.8-0 kubeadm-1.23.8-0
2 添加节点的到集群
master节点定制节点添加命令
[root@kubernetes-master1 ~]# kubeadm token create --print-join-command
kubeadm join 10.0.0.200:6443 --token 0om798.7e1x521h7fog3o50 --discovery-token-ca-cert-hash sha256:d8a77a69fb0b54cd72a692be83fdcd2c39f203bdc6e729b9d7d63cca3030cfcc
新节点执行添加命令
[root@kubernetes-node4 ~]# kubeadm join 10.0.0.200:6443 --token 0om798.7e1x521h7fog3o50 --discovery-token-ca-cert-hash sha256:d8a77a69fb0b54cd72a692be83fdcd2c39f203bdc6e729b9d7d63cca3030cfcc
master节点确认效果
[root@kubernetes-master1 ~]# kubectl get nodes kubernetes-node4
NAME STATUS ROLES AGE VERSION
kubernetes-node4 NotReady <none> 118s v1.23.8
注意:
因为是新的节点,所以依赖的镜像需要准备一段时间,等待一段时间后,就可以转变为正常状态了
[root@kubernetes-master1 ~]# kubectl get nodes kubernetes-node4
NAME STATUS ROLES AGE VERSION
kubernetes-node4 Ready <none> 4m59s v1.23.8
小结
1.1.3 集群升级
学习目标
这一节,我们从 升级原理、升级实践、小结 三个方面来学习。
升级原理
简介
kubeadm 是 kubernetes 提供的一个初始化集群的工具,使用起来非常方便,但是因为 版本的更新 和 证书的有效期 等原因,我们可能在某个阶段会对当前的软件版本进行调整,可能是版本升级,也可能是版本回退。但是这些场景的操作方法基本上是一样的。
在k8s环境进行变动的时候,需要根据场景的不同采取不同的措施:
对于一般的 开发环境、测试环境、预发布环境等非生产环境之外的其他环境的更新,没有什么限制。
对于生产环境,k8s环境在更新的时候,需要
1 在一个 对业务影响非常小的时段,进行软件版本的更新
2 如果是集群场景
- 不要一下子对所有节点进行更新
- 更新前,首先要将更新的节点从 高可用集群中屏蔽
- 更新完毕后,再将更新后的节点加入到 高可用集群中
- 循环上面的两步,直到所有节点都更新完毕。
集群更新原则
主角色更新原则
1 将待更新节点从高可用集群的反向代理中剔除
2 更新指定的集群环境软件
3 确认更新计划
4 执行更新计划
5 将更新后的节点加入到高可用集群
7 对于其他节点执行1-5步骤
从角色更新原则
1 冻结待删除节点
2 驱离待删除节点资源
3 确认更新计划
4 执行更新计划
5 删除驱离和冻结动作
6 对于其他节点执行1-5步骤
主角色升级实践
1 将待更新节点从高可用集群的反向代理中剔除
所有的高可用节点更新haproxy配置
[root@kubernetes-ha1 ~]# vim /etc/haproxy/haproxy.cfg
...
listen kubernetes-master-6443
bind 10.0.0.200:6443
mode tcp
# server kubernetes-master1 10.0.0.12:6443 check inter 3s fall 3 rise 5
server kubernetes-master2 10.0.0.13:6443 check inter 3s fall 3 rise 5
server kubernetes-master3 10.0.0.14:6443 check inter 3s fall 3 rise 5
重启服务
[root@kubernetes-ha1 ~]# systemctl restart haproxy.service
2 更新指定的集群环境软件
安装新版本软件
[root@kubernetes-master1 ~]# yum install -y kubeadm-1.23.9-0 kubectl-1.23.9-0 kubelet-1.23.9-0
检查效果
[root@kubernetes-master1 ~]# kubectl version
[root@kubernetes-master1 ~]# kubeadm version
[root@kubernetes-master1 ~]# kubelet --version
注意:
kubectl 会同时出现两个版本
3 确认更新计划
查看更新条件
[root@kubernetes-master1 ~]# kubeadm upgrade plan
...
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.23.8
[upgrade/versions] kubeadm version: v1.23.9
I0728 23:14:42.949711 11880 version.go:255] remote version is much newer: v1.24.3; falling back to: stable-1.23
[upgrade/versions] Target version: v1.23.9
[upgrade/versions] Latest version in the v1.23 series: v1.23.9
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT CURRENT TARGET
kubelet 6 x v1.23.8 v1.23.9
Upgrade to the latest version in the v1.23 series:
COMPONENT CURRENT TARGET
kube-apiserver v1.23.8 v1.23.9
kube-controller-manager v1.23.8 v1.23.9
kube-scheduler v1.23.8 v1.23.9
kube-proxy v1.23.8 v1.23.9
CoreDNS v1.8.6 v1.8.6
etcd 3.5.1-0 3.5.1-0
You can now apply the upgrade by executing the following command:
kubeadm upgrade apply v1.23.9
_____________________________________________________________________
The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.
API GROUP CURRENT VERSION PREFERRED VERSION MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io v1alpha1 v1alpha1 no
kubelet.config.k8s.io v1beta1 v1beta1 no
_____________________________________________________________________
4 执行更新计划
根据提示命令,更新到指定的软件版本
[root@kubernetes-master1 ~]# kubeadm upgrade apply v1.23.9
...
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
...
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.23.9". Enjoy!
...
kubernetes集群软件版本更新,kubelet软件文件变动,需要重载后才能重启
[root@kubernetes-master3 ~]# systemctl daemon-reload
[root@kubernetes-master3 ~]# systemctl restart docker kubelet
确认效果
[root@kubernetes-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 4d3h v1.23.9
kubernetes-master2 Ready control-plane,master 4d3h v1.23.8
kubernetes-master3 Ready control-plane,master 4d3h v1.23.8
kubernetes-node1 Ready <none> 4d3h v1.23.8
kubernetes-node2 Ready <none> 4d3h v1.23.8
kubernetes-node4 Ready <none> 20m v1.23.8
5 将更新后的节点加入到高可用集群
更新haproxy配置
[root@kubernetes-ha1 ~]# vim /etc/haproxy/haproxy.cfg
...
listen kubernetes-master-6443
bind 10.0.0.200:6443
mode tcp
server kubernetes-master1 10.0.0.12:6443 check inter 3s fall 3 rise 5
# server kubernetes-master2 10.0.0.13:6443 check inter 3s fall 3 rise 5
server kubernetes-master3 10.0.0.14:6443 check inter 3s fall 3 rise 5
重启服务
[root@kubernetes-ha1 ~]# systemctl restart haproxy.service
7 对于其他节点执行1-5步骤
针对其他所有节点更新后的集群最终效果
[root@kubernetes-master3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 4d4h v1.23.9
kubernetes-master2 Ready control-plane,master 4d4h v1.23.9
kubernetes-master3 Ready control-plane,master 4d4h v1.23.9
kubernetes-node1 Ready <none> 4d4h v1.23.8
kubernetes-node2 Ready <none> 4d4h v1.23.8
kubernetes-node4 Ready <none> 37m v1.23.8
从角色升级实践
1 冻结待删除节点
按照原计划将标记节点为不可调度
[root@kubernetes-master3 ~]# kubectl cordon kubernetes-node1
node/kubernetes-node1 cordoned
[root@kubernetes-master3 ~]# kubectl get nodes kubernetes-node1
NAME STATUS ROLES AGE VERSION
kubernetes-node1 Ready,SchedulingDisabled <none> 4d4h v1.23.8
2 驱离待删除节点资源
驱逐当前节点上的应用资源
[root@kubernetes-master3 ~]# kubectl drain kubernetes-node1 --delete-emptydir-data --force
注意:
升级的时候,daemonset可以不用管理,因为他会随着服务的重启而重启
3 确认更新计划
安装新版本软件
[root@kubernetes-node1 ~]# yum install -y kubeadm-1.23.9-0 kubelet-1.23.9-0
检查效果
[root@kubernetes-node1 ~]# kubeadm version
[root@kubernetes-node1 ~]# kubelet --version
4 执行更新计划
升级节点
[root@kubernetes-node1 ~]# kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
...
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.
重启kubelet 和docker服务
[root@kubernetes-node1 ~]# systemctl daemon-reload
[root@kubernetes-node1 ~]# systemctl restart docker kubelet
查看效果
[root@kubernetes-master1 ~]# kubectl get nodes kubernetes-node1
NAME STATUS ROLES AGE VERSION
kubernetes-node1 Ready,SchedulingDisabled <none> 4d4h v1.23.9
5 删除冻结动作
取消冻结标识
[root@kubernetes-master1 ~]# kubectl uncordon kubernetes-node1
node/kubernetes-node1 uncordoned
确认最终效果
[root@kubernetes-master1 ~]# kubectl get nodes kubernetes-node1
NAME STATUS ROLES AGE VERSION
kubernetes-node1 Ready <none> 4d4h v1.23.9
6 对于其他节点执行1-5步骤
[root@kubernetes-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 4d4h v1.23.9
kubernetes-master2 Ready control-plane,master 4d4h v1.23.9
kubernetes-master3 Ready control-plane,master 4d4h v1.23.9
kubernetes-node1 Ready <none> 4d4h v1.23.9
kubernetes-node2 Ready <none> 4d4h v1.23.9
kubernetes-node3 Ready <none> 53m v1.23.9
kubernetes-node4 Ready <none> 53m v1.23.9
小结
1.1.4 证书管理
学习目标
这一节,我们从 基础知识、证书实践、小结 三个方面来学习。
基础知识
简介
Kubernetes 集群内部为了实现高质量的安全通信,需要 PKI 证书才能进行基于 TLS 的身份验证。kubeadm 部署的 Kubernetes集群, 会自动生成集群所需的证书。如果我们定制自己的证书,在kubernetes环境中也可以使用。
kubeadm 部署的 Kubernetes集群,大部分证书都存储在 /etc/kubernetes/pki目录中,只有kubernetes集群的用户证书文件在/etc/kubernetes目录中。
集群证书的简介
核心的证书和私钥
[root@kubernetes-master1 ~]# ls /etc/kubernetes/pki/{ca.*,sa.*,etcd/ca.*}
/etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/etcd/ca.crt /etc/kubernetes/pki/sa.pub
/etc/kubernetes/pki/ca.key /etc/kubernetes/pki/etcd/ca.key
/etc/kubernetes/pki/ca.srl /etc/kubernetes/pki/sa.key
证书的有效期
由 kubeadm 默认生成的客户端证书在 1 年后到期,如果需要更新证书的话,kubeadm支持自定义证书来实现更新,前提是必须将证书文件放置在通过 --cert-dir 命令行参数或者 kubeadm 配置中的 certificatesDir 配置项指明的目录中。
默认的值是 /etc/kubernetes/pki
检测证书是否过期
[root@kubernetes-master1 ~]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
CERTIFICATE EXPIRES RESIDUAL TIME ...
admin.conf Jul 28, 2063 15:17 UTC 364d ...
apiserver Jul 28, 2063 15:15 UTC 364d ...
apiserver-etcd-client Jul 28, 2063 15:15 UTC 364d ...
apiserver-kubelet-client Jul 28, 2063 15:15 UTC 364d ...
controller-manager.conf Jul 28, 2063 15:16 UTC 364d ...
etcd-healthcheck-client Jul 24, 2063 11:24 UTC 360d ...
etcd-peer Jul 24, 2063 11:24 UTC 360d ...
etcd-server Jul 24, 2063 11:24 UTC 360d ...
front-proxy-client Jul 28, 2063 15:15 UTC 364d ...
scheduler.conf Jul 28, 2063 15:16 UTC 364d ...
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME ...
ca Jul 21, 2072 11:24 UTC 9y ...
etcd-ca Jul 21, 2072 11:24 UTC 9y ...
front-proxy-ca Jul 21, 2072 11:24 UTC 9y ...
注意:
kubeadm 不能管理由外部 CA 签名的证书,否则需要自己手动去更新外部证书
kubeadm 和 kubelet 之间的证书同步是自动方式来实现的
默认证书有效期为 1 年,CA 根证书是 10 年
查看方法2
[root@kubernetes-master1 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '
Not Before: Jul 24 11:24:51 2062 GMT
Not After : Jul 28 15:15:56 2063 GMT
准备工作
为了避免证书更新对于环境的异常影响,我们这里首先将相关文件进行备份
mkdir /etc/kubernetes-bak
cd /etc/kubernetes
cp -r $(ls | grep -v tmp) ../kubernetes-bak/
cp -r /var/lib/etcd /etc/kubernetes-bak/lib-etcd
注意:
所有的master节点的操作内容一致
证书实践
注意:
kubeadm alpha 命令暂未使用,不推荐使用某些文章中的这个命令
参考资料:
https://kubernetes.io/zh-cn/docs/reference/setup-tools/kubeadm/kubeadm-alpha/
手动延长证书的使用时间
[root@kubernetes-master1 ~]# kubeadm certs renew -h
This command is not meant to be run on its own. See list of available subcommands.
Usage:
kubeadm certs renew [flags]
kubeadm certs renew [command]
Available Commands:
admin.conf Renew the certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself
all Renew all available certificates
...
注意:
指定证书,则更新指定的证书,all代表更新所有证书
certs renew 使用现有的证书作为属性(CN/O/L等)的权威来源
无论证书的到期时间如何,都会无条件地续订一年。
renew执行后,需要重启各组件,才能生效。
所有master节点上更新所有证书
[root@kubernetes-master1 ~]# kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
确认效果
[root@kubernetes-master1 ~]# kubeadm certs check-expiration
方法2:
[root@kubernetes-master1 ~]# openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -text |grep ' Not '
重启组件服务
[root@kubernetes-master1 ~]# kubectl delete pod -n kube-system -l component
确认效果
[root@kubernetes-master1 ~]# kubectl get pod -n kube-system -l component
小结
1.2 数据管理
1.2.1 ETCD基础
学习目标
这一节,我们从 方案解读、简单实践、小结 三个方面来学习。
方案解读
简介
etcd 是 CoreOS 团队于 2013 年 6 月发起的开源项目,采用 Go 语言编写,它具有出色的跨平台支持,于 2018 年 12 月正式加入云原生计算基金会,目前由 CNCF 孵化托管,etcd 作为云原生架构中重要的基础组件,在微服务和 Kubernates 集群中不仅可以作为服务注册与发现,还可以作为 key-value 存储的中间件。
官方地址: https://etcd.io/
etcd是干什么的
简单的数据接口,存储简单、数据的监听机制,性能非常高
应用场景
ETCD 有很多使用场景,包括但不限于:
配置管理
服务注册于发现
选主
应用调度
分布式队列
分布式锁
简单实践
k8s集群内部的etcd
查看etcd的pod
[root@kubernetes-master1 ~]# kubectl get pod -n kube-system | grep etcd
etcd-kubernetes-master1 1/1 Running 44 (19h ago) 29h
etcd-kubernetes-master2 1/1 Running 2 (19h ago) 29h
etcd-kubernetes-master3 1/1 Running 2 (19h ago) 29h
集群的etcd服务是以静态pod方式来管理的
[root@kubernetes-master1 ~]# ls /etc/kubernetes/manifests/etcd.yaml
/etc/kubernetes/manifests/etcd.yaml
查看pod的基本信息查看
[root@kubernetes-master1 ~]# cat /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
...
name: etcd
namespace: kube-system
spec:
containers:
- command:
- etcd
# 启用客户端证书验证
- --client-cert-auth=true
# 客户端服务器TLS证书文件的路径
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
# 客户端服务器TLS密钥文件的路径
- --key-file=/etc/kubernetes/pki/etcd/server.key
# 客户端服务器的路径TLS可信CA证书文件
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
# 数据存储目录,在容器内部以ETCD_DATA_DIR变量存在
# --wal-dir存放预写式日志,记录了数据变化的历程,不写与data-dir一致。
- --data-dir=/var/lib/etcd
# 节点名称,该值与集群初始化时候的--initial-cluster值一致
- --name=kubernetes-master1
# 监听数据地址地址
- --listen-client-urls=https://127.0.0.1:2379,https://10.0.0.12:2379
# 要监听的其他URL列表将响应端点/metrics和/health端点
- --listen-metrics-urls=http://127.0.0.1:2381
# 与其他节点进行数据交换(选举,数据同步)的监听地址
- --listen-peer-urls=https://10.0.0.12:2380
# 用于通知其他ETCD节点,客户端接入本节点的监听地址
- --advertise-client-urls=https://10.0.0.12:2379
# 通知其他节点与本节点进行数据交换(选举,同步)的地址
# 属于listen-peer-urls属性值的子集
- --initial-advertise-peer-urls=https://10.0.0.12:2380
# 初始化etcd集群时候,设定集群中所有节点的名称信息
- --initial-cluster=kubernetes-master1=https://10.0.0.12:2380
# etcd服务器集群间通信启用对等客户端证书验证
- --peer-client-cert-auth=true
# etcd服务器集群间通信用TLS证书文件的路径
- --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
# etcd服务器集群间通信用TLS密钥文件的路径
- --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
# etcd服务器集群间通信用TLS可信CA文件的路径
- --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
# 设定事务提交后,数据快照触发数量,容器内部以ETCD_SNAPSHOT_COUNT变量方式存在
- --snapshot-count=10000
# 数据传输时候,检查数据的有效性
- --experimental-initial-corrupt-check=true
...
volumes:
- hostPath:
path: /etc/kubernetes/pki/etcd
type: DirectoryOrCreate
name: etcd-certs
- hostPath:
path: /var/lib/etcd
type: DirectoryOrCreate
name: etcd-data
注意:
这里了解一些基础的数据卷方便我们后续做命令操作,因为etcd的默认镜像环境实在是太恶心了
查看etcd的运行日志
[root@kubernetes-master1 ~]# kubectl -n kube-system logs etcd-kubernetes-master1
结果显示:
他们在节点内部实现了对端主机的信息数据同步
进入etcd集群
登录到任意一个etcd pod中检测集群状态
[root@kubernetes-master1 ~]# kubectl -n kube-system exec -it etcd-kubernetes-master1 -- /bin/sh
查看命令帮助
sh-5.1# etcdctl --help
...
COMMANDS:
...
get Gets the key or a range of keys
help Help about any command
endpoint health Checks the healthiness of endpoints specified in `--endpoints` flag
endpoint status Prints out the status of endpoints specified in `--endpoints` flag
...
member add Adds a member into the cluster
member list Lists all members in the cluster
...
version Prints the version of etcdctl
OPTIONS:
--cacert="" verify certificates of TLS-enabled secure servers using this CA bundle
--cert="" identify secure client using this TLS certificate file
...
--endpoints=[127.0.0.1:2379] gRPC endpoints
...
--key="" identify secure client using this TLS key file
...
-w, --write-out="simple" etcd的信息支持四种格式(fields, json, protobuf, simple, table)
注意:
etcd的集群环境中,只能通过sh来连接,这个终端中不支持回退甚至ls、grep命令都不支持
命令内容超出一行内容会出现结构性变形,可以借助于换行符号来实现内容的持续输入
etcdctl 目前主要有两种命令模式 2版本和3版本,通过辩论指定命令的版本
export ETCDCTL_API=3
基本信息查看
检查etcd的版本信息
sh-5.1# ETCDCTL_API=3 etcdctl \
--endpoints 10.0.0.12:2379,10.0.0.14:2379,10.0.0.13:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
version
etcdctl version: 3.5.1
API version: 3.5
确认集群成员信息
sh-5.1# ETCDCTL_API=3 etcdctl \
--endpoints 10.0.0.12:2379,10.0.0.14:2379,10.0.0.13:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
member list
278be81a99993ec, started, kubernetes-master2, https://10.0.0.13:2380, https://10.0.0.13:2379, false
2f4d6688eb4a75f7, started, kubernetes-master3, https://10.0.0.14:2380, https://10.0.0.14:2379, false
f7a9c20602b8532e, started, kubernetes-master1, https://10.0.0.12:2380, https://10.0.0.12:2379, false
etcd内部环境过于繁琐,所以我们准备用环境变量的方式改造一下命令的基本使用方式
sh-5.1# export ETCDCTL_API=3
sh-5.1# etcdctl version
etcdctl version: 3.5.1
API version: 3.5
定制命令别名,自动附加认证信息
sh-5.1# alias etcdctl='etcdctl --endpoints 10.0.0.12:2379,10.0.0.14:2379,10.0.0.13:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key'
sh-5.1# etcdctl endpoint status
10.0.0.12:2379, f7a9c20602b8532e, 3.5.1, 7.4 MB, false, false, 62, 294909, 294909,
10.0.0.14:2379, 2f4d6688eb4a75f7, 3.5.1, 7.7 MB, false, false, 62, 294909, 294909,
10.0.0.13:2379, 278be81a99993ec, 3.5.1, 7.7 MB, true, false, 62, 294909, 294909,
注意:
在定制别名的时候,最好让 etcdctl --endpoints 在同一行,否则会出现异常情况
确认集群成员信息
sh-5.1# etcdctl member list
278be81a99993ec, started, kubernetes-master2, https://10.0.0.13:2380, https://10.0.0.13:2379, false
2f4d6688eb4a75f7, started, kubernetes-master3, https://10.0.0.14:2380, https://10.0.0.14:2379, false
f7a9c20602b8532e, started, kubernetes-master1, https://10.0.0.12:2380, https://10.0.0.12:2379, false
以表格方式查看信息
sh-5.1# etcdctl -w table endpoint status
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| 10.0.0.12:2379 | f7a9c20602b8532e | 3.5.1 | 6.6 MB | false | false | 10 | 160474 | 160474 | |
| 10.0.0.14:2379 | 2f4d6688eb4a75f7 | 3.5.1 | 6.7 MB | false | false | 10 | 160474 | 160474 | |
| 10.0.0.13:2379 | 278be81a99993ec | 3.5.1 | 6.6 MB | true | false | 10 | 160474 | 160474 | |
+----------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
结果显示:
三个etcd,一个为true,二个为false,实现了内部集群状态一主两从
登录到任意一个etcd pod中检测集群状态
sh-5.1# etcdctl endpoint health
10.0.0.13:2379 is healthy: successfully committed proposal: took = 34.19307ms
10.0.0.12:2379 is healthy: successfully committed proposal: took = 35.242549ms
10.0.0.14:2379 is healthy: successfully committed proposal: took = 34.912959ms
结果显示:
etcd集群的所有节点状态都是正常的
切换主角色的id
sh-5.1# etcdctl move-leader f7a9c20602b8532e
Leadership transferred from 2f4d6688eb4a75f7 to f7a9c20602b8532e
再次查看效果
sh-5.1# etcdctl endpoint status
10.0.0.12:2379, f7a9c20602b8532e, 3.5.1, 7.4 MB, true, false, 96, 310291, 310291,
10.0.0.14:2379, 2f4d6688eb4a75f7, 3.5.1, 7.7 MB, false, false, 96, 310291, 310291,
10.0.0.13:2379, 278be81a99993ec, 3.5.1, 7.7 MB, false, false, 96, 310291, 310291,
小结
1.2.2 ETCD实践
学习目标
这一节,我们从 命令解读、数据操作、小结 三个方面来学习。
命令解读
简介
查看与数据操作相关的命令帮助
sh-5.1# etcdctl --help
...
COMMANDS:
...
del Removes the specified key or range of keys
...
get Gets the key or a range of keys
...
put Puts the given key into the store
...
注意:
etcd的集群环境中,只能通过sh来连接,这个终端中不支持回退甚至ls、grep命令都不支持
命令内容超出一行内容会出现结构性变形,可以借助于换行符号来实现内容的持续输入
etcdctl 目前主要有两种命令模式 2版本和3版本,通过辩论指定命令的版本
export ETCDCTL_API=3
k8s集群基本信息查看
查看集群所有的key信息
sh-5.1# etcdctl get / --prefix --keys-only
...
/registry/secrets/kube-system/horizontal-pod-autoscaler-token-46bs2
...
注意:
etcd的命令行里面不支持 ls、grep、echo等基础linux命令
查看制定目录下的文件
sh-5.1# etcdctl get /registry/services/specs/default --keys-only --prefix
/registry/services/specs/default/kubernetes
/registry/services/specs/default/nginx-web
/registry/services/specs/default/superopsmsb-flask-web
/registry/services/specs/default/superopsmsb-nginx-web
查看制定的key信息
sh-5.1# etcdctl get /registry/services/specs/default/nginx-web -w json
注意:
默认查看的信息是乱码,需要格式化输出
相关信息输出到宿主机
sh-5.1# etcdctl \
get /registry/serviceaccounts/kube-public/default -w json \
> /var/lib/etcd/default.txt
linux系统安装jq格式化命令
[root@kubernetes-master1 ~]# yum install jq
[root@kubernetes-master1 ~]# cat /var/lib/etcd/default.txt | jq .
{
"header": {
"cluster_id": 12231819682772950000,
"member_id": 17846008329503527000,
"revision": 127005,
"raft_term": 10
},
"kvs": [
{
"key": "L3JlZ2lzdHJ5L3NlcnZpY2VhY2NvdW50cy9rdWJlLXB1YmxpYy9kZWZhdWx0",
"create_revision": 371,
"mod_revision": 377,
"version": 2,
"value": "azhzAAoUCgJ2MRIOU2VydmljZUFjY291bnQSdQpQCgdkZWZhdWx0EgAaC2t1YmUtcHVibGljIgAqJDM1YmVmYmRjLTdkMjItNDA0YS05ZGRkLTQxY2JkZDYyYzcwNjIAOABCCAjnkYuXBhAAegASIQoAEgAaE2RlZmF1bHQtdG9rZW4tcTl0ZHIiACoAMgA6ABoAIgA="
}
],
"count": 1
}
将所有的key导出到外部
sh-5.1# etcdctl get / --prefix --keys-only > /var/lib/etcd/keys.txt
外部主机查看效果
[root@kubernetes-master1 ~]# grep superopsmsb-nginx-web /var/lib/etcd/keys.txt
/registry/deployments/default/superopsmsb-nginx-web
/registry/endpointslices/default/superopsmsb-nginx-web-hvsj6
...
查看网络信息
[root@kubernetes-master1 ~]# kubectl calico get workloadEndpoint
WORKLOAD NODE NETWORKS INTERFACE
superopsmsb-flask-web-7f89d69844-6jsh8 kubernetes-node3 10.244.3.9/32 calie65b968039f
superopsmsb-flask-web-7f89d69844-f9l68 kubernetes-node1 10.244.1.7/32 cali55f61d402c9
superopsmsb-flask-web-7f89d69844-kjzm9 kubernetes-node2 10.244.2.8/32 cali9cf2e2fbec4
superopsmsb-nginx-web-757bcb8fc9-6qkhd kubernetes-node2 10.244.2.9/32 cali87afd7523de
superopsmsb-nginx-web-757bcb8fc9-kdr2c kubernetes-node3 10.244.3.10/32 cali1e4ee599ca3
superopsmsb-nginx-web-757bcb8fc9-nmtrw kubernetes-node1 10.244.1.8/32 cali55c3108ec35
过滤网络信息key
[root@kubernetes-master1 ~]# grep ipamblocks /var/lib/etcd/keys.txt
/registry/apiextensions.k8s.io/customresourcedefinitions/ipamblocks.crd.projectcalico.org
/registry/crd.projectcalico.org/ipamblocks/10-244-1-0-24
/registry/crd.projectcalico.org/ipamblocks/10-244-2-0-24
/registry/crd.projectcalico.org/ipamblocks/10-244-3-0-24
/registry/crd.projectcalico.org/ipamblocks/10-244-5-0-24
etcd中获取相关信息
[root@kubernetes-master1 /var/lib/etcd]# egrep -v 'ipamblocks' /var/lib/etcd/ipab.txt | jq .
[root@kubernetes-master1 /var/lib/etcd]# egrep -v 'ipamblocks' /var/lib/etcd/ipab.txt | jq . | grep '"cidr":'
"cidr": "10.244.1.0/24",
数据操作
添加数据实践
添加数据
sh-5.1# etcdctl put /testkey "test etcd key"
OK
查看数据
sh-5.1# etcdctl get /testkey
/testkey
test etcd key
改动数据实践
改动数据
sh-5.1# etcdctl put /testkey "test etcd edit key"
OK
查看数据
sh-5.1# etcdctl get /testkey
/testkey
test etcd edit key
删除数据实践
删除数据
sh-5.1# etcdctl del /testkey
1
查看效果
sh-5.1# etcdctl get /testkey
模拟etcd管理k8s操作
查看当前的deployment
[root@kubernetes-master1 ~]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
superopsmsb-flask-web 3/3 3 3 10h
superopsmsb-nginx-web 3/3 3 3 10h
etcd获取deployment的内容
[root@kubernetes-master1 ~]# grep superopsmsb-flask-web /var/lib/etcd/keys.txt | grep -v events
/registry/deployments/default/superopsmsb-flask-web
/registry/endpointslices/default/superopsmsb-flask-web-zxkv6
/registry/pods/default/superopsmsb-flask-web-7f89d69844-6jsh8
/registry/pods/default/superopsmsb-flask-web-7f89d69844-f9l68
/registry/pods/default/superopsmsb-flask-web-7f89d69844-kjzm9
/registry/replicasets/default/superopsmsb-flask-web-7f89d69844
/registry/services/endpoints/default/superopsmsb-flask-web
/registry/services/specs/default/superopsmsb-flask-web
etcd删除资源对象
sh-5.1# etcdctl del /registry/deployments/default/superopsmsb-flask-web
1
查看外部kubectl的监控效果
[root@kubernetes-master1 ~]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
superopsmsb-nginx-web 3/3 3 3 10h
结果显示:
与superopsmsb-flask-web相关的资源全部清理了
小结
1.2.3 备份还原
学习目标
这一节,我们从 备份实践、还原实践、小结 三个方面来学习。
备份实践
命令简介
查看备份相关的命令
sh-5.1# etcdctl --help
...
snapshot restore 从默认数据目录还原备份数据
snapshot save 备份etcd数据到默认数据目录
snapshot status 查看备份状态
注意:
备份ETCD集群时,只需要备份一个ETCD就行,因为集群节点间数据会自动同步
etcd在进行备份的时候,禁止出现多个数据入口,否则会发生报错
sh-5.1# etcdctl snapshot save /var/lib/etcd/a.db
Error: snapshot must be requested to one selected node, not multiple [10.0.0.12:2379 10.0.0.14:2379 10.0.0.13:2379]
制作命令别名
sh-5.1# alias etcdctl='etcdctl \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key'
数据备份
sh-5.1# etcdctl --endpoints=https://10.0.0.12:2379 \
> snapshot save /var/lib/etcd/snapshot-etcd-1.db
{"level":"info","ts":1659233955.8515584,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/var/lib/etcd/snapshot-etcd-1.db.part"}
...
Snapshot saved at /var/lib/etcd/snapshot-etcd-1.db
宿主机查看效果
[root@kubernetes-master1 ~]# ll -h /var/lib/etcd/snapshot-etcd-1.db
-rw------- 1 root root 7.1M 7月 31 10:19 /var/lib/etcd/snapshot-etcd-1.db
删除一些数据
[root@kubernetes-master1 ~]# kubectl delete deployments superopsmsb-nginx-web
deployment.apps "superopsmsb-nginx-web" deleted
还原实践
还原流程
基本步骤:
停止kube-apiserver --> 停止ETCD --> 恢复数据 --> 启动ETCD --> 启动kube-apiserve
注意:
备份ETCD集群时,只需要备份一个ETCD就行,恢复时,拿同一份备份数据恢复。
准备工作: etcd恢复的时候需要etcdctl命令,所有节点安装etcd的相关命令
[root@kubernetes-master1 ~]# yum install etcd -y
所有节点都停止kube-apiserver和etcd服务
通过移除静态pod文件方式关闭服务
[root@kubernetes-master1 ~]# mv /etc/kubernetes/manifests /etc/kubernetes/manifests.bak
确认效果
[root@kubernetes-master1 ~]# docker ps|grep k8s_
注意:
确保 etcd 和 api-server没有UP
所有节点清理etcd数据
[root@kubernetes-master1 ~]# mv /var/lib/etcd /var/lib/etcd.bak
etcd数据恢复
etcd集群用同一份snapshot恢复。
[root@kubernetes-master1 ~]# for i in {12..14}; do scp /var/lib/etcd.bak/snapshot-etcd-1.db root@10.0.0.$i:/tmp/; done
snapshot-etcd-1.db 100% 7188KB 73.6MB/s 00:00
snapshot-etcd-1.db 100% 7188KB 55.9MB/s 00:00
snapshot-etcd-1.db 100% 7188KB 57.6MB/s 00:00
kubernetes-master1节点上还原etcd数据
[root@kubernetes-master1 ~]# export ETCDCTL_API=3 etcdctl
[root@kubernetes-master1 ~]# etcdctl snapshot restore /tmp/snapshot-etcd-1.db \
--endpoints=https://10.0.0.12:2379 \
--name=kubernetes-master1 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-cluster="kubernetes-master1=https://10.0.0.12:2380,kubernetes-master2=https://10.0.0.13:2380,kubernetes-master3=https://10.0.0.14:2380" \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://10.0.0.12:2380 \
--data-dir=/var/lib/etcd
kubernetes-master2节点上还原etcd数据
[root@kubernetes-master2 ~]# export ETCDCTL_API=3
[root@kubernetes-master2 ~]# etcdctl snapshot restore /tmp/snapshot-etcd-1.db \
--endpoints=https://10.0.0.13:2379 \
--name=kubernetes-master2 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-cluster="kubernetes-master1=https://10.0.0.12:2380,kubernetes-master2=https://10.0.0.13:2380,kubernetes-master3=https://10.0.0.14:2380" \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://10.0.0.13:2380 \
--data-dir=/var/lib/etcd
kubernetes-master3节点上还原etcd数据
[root@kubernetes-master3 ~]# export ETCDCTL_API=3
[root@kubernetes-master3 ~]# etcdctl snapshot restore /tmp/snapshot-etcd-1.db \
--endpoints=https://10.0.0.14:2379 \
--name kubernetes-master3 \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--initial-cluster="kubernetes-master1=https://10.0.0.12:2380,kubernetes-master2=https://10.0.0.13:2380,kubernetes-master3=https://10.0.0.14:2380" \
--initial-cluster-token=etcd-cluster-0 \
--initial-advertise-peer-urls=https://10.0.0.14:2380 \
--data-dir=/var/lib/etcd
所有节点都恢复kube-apiserver和etcd服务
将静态pod还原回来
[root@kubernetes-master1 ~]# mv /etc/kubernetes/manifests.bak /etc/kubernetes/manifests
确认效果
稍等十数秒后查看集群效果
[root@kubernetes-master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubernetes-master1 Ready control-plane,master 2d9h v1.23.9
kubernetes-master2 Ready control-plane,master 2d9h v1.23.9
kubernetes-master3 Ready control-plane,master 2d9h v1.23.9
kubernetes-node1 Ready <none> 2d9h v1.23.9
kubernetes-node2 Ready <none> 2d9h v1.23.9
kubernetes-node3 Ready <none> 2d9h v1.23.9
查看数据效果
[root@kubernetes-master1 ~]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
superopsmsb-nginx-web 3/3 3 3 12h
小结
1.2.4 ETCD集群
学习目标
这一节,我们从 案例解读、证书实践、环境实践、小结 三个方面来学习。
案例解读
简介
根据我们刚才在kubernetes上的etcd方式的备份还原,无论是操作方面来说,还是从集群管理上面来说,都不太方便,所以我们希望一种专用的etcd高可用集群方式来实现为kubernetes集群提供数据基础设施服务。
高可用方案解读
etcd的高可用基本有三种思路:
独立的etcd集群:
使用3台或者5台服务器只运行etcd,独立维护和升级。etcd集群将作为数据存储基础设施用于构建整个集群。etcd集群的节点增减需要显式的通知整个集群,保证etcd集群节点稳定可以更方便的用程序完成集群滚动升级,减轻维护负担。
特点:成本高,可控性高,维护难度大。
static pod的etcd集群:
将多台Kubernetes Master上的静态pod组成etcd集群,各个服务器的etcd实例被注册进了Kubernetes当中,虽然无法直接使用kubectl来管理这部分实例,但是监控以及日志搜集组件均可正常工作。在这一模式运行下的etcd可管理性更强。
特点:折中方式。
self-hosted etcd方案:
以容器应用方式部署到Kubernetes集群中,实现Kubernetes对自身依赖组件的管理。需要借助etcd-operator来维护etcd集群,最符合Kubernetes的使用习惯。
特点:自由简单、但有风险 -- pod变化频率可知。
部署方案
我们这里以第一种方法来进行etcd的高可用实践,因为这种方式是作为kubernetes的数据基础的,所以,我们需要在部署kubernetes之前将这个环境部署完毕,然后作为数据存储位置在集群初始化的时候,进行配置。
准备工作
所有节点破坏kubernetes集群环境
kubeadm reset
rm -f /etc/cni/net.d/*
reboot
准备etcd高可用环境
CFSSL是CloudFlare开源的一款PKI/TLS工具。 CFSSL 包含一个命令行工具 和一个用于 签名,验证并且捆绑TLS证书的 HTTP API 服务。 使用Go语言编写。
代码地址: https://github.com/cloudflare/cfssl
官网地址: https://pkg.cfssl.org/
准备证书环境
mkdir /data/kubernetes/etcd ; cd /data/kubernetes/etcd
for i in cfssl cfssljson
do
wget https://github.com/cloudflare/cfssl/releases/download/v1.6.1/$i_1.6.1_linux_amd64
chmod +x $i_1.6.1_linux_amd64
mv $i_1.6.1_linux_amd64 /usr/local/bin/$i
done
环境实践
创建证书
创建ca证书,客户端,服务端,节点之间的证书
ca证书 自己给自己签名的权威证书,用来给其他证书签名
server证书 etcd的证书
client证书 客户端,比如etcdctl的证书
peer证书 节点与节点之间通信的证书
创建专属文件
[root@kubernetes-master1 ~]# mkdir -p /etc/etcd/ssl
[root@kubernetes-master1 ~]# cd /etc/etcd/ssl
生成证书初始化模板文件
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl print-defaults config
{
"signing": {
"default": {
"expiry": "168h"
},
"profiles": {
"www": {
"expiry": "8760h",
"usages": [
"signing",
"key encipherment",
"server auth"
]
},
"client": {
"expiry": "8760h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
}
}
}
}
属性解析:
server auth表示client可以用该ca对server提供的证书进行验证
client auth表示server可以用该ca对client提供的证书进行验证
生成证书签名请求初始化模板文件
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl print-defaults csr
{
"CN": "example.net",
"hosts": [
"example.net",
"www.example.net"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "US",
"ST": "CA",
"L": "San Francisco"
}
]
}
定制ca证书文件 ca-config.json
{
"signing": {
"default": {
"expiry": "438000h"
},
"profiles": {
"server": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
注意:
需要定义server、client、peer 三种角色的认证通信
定制前面请求文件 ca-csr.json
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
}
}
注意:
只需要定制算法和名称即可
生成私钥和证书
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl gencert -initca ca-csr.json | cfssljson -bare ca
...
110542548803478808677950627582407176512561912812
[root@kubernetes-master1 /etc/etcd/ssl]# ls
ca-config.json ca.csr ca-csr.json ca-key.pem ca.pem
生成客户端证书
定制专属配置文件 client.json
{
"CN": "client",
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BJ",
"O": "k8s",
"OU": "System"
}
]
}
生成证书
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client -
...
specifically, section 10.2.3 ("Information Requirements").
查看效果
[root@kubernetes-master1 /etc/etcd/ssl]# ls
ca-config.json ca-csr.json ca.pem client.json client.pem
ca.csr ca-key.pem client.csr client-key.pem
定制etcd节点间证书
定制集群节点间证书文件 etcd.json
{
"CN": "etcd",
"hosts": [
"10.0.0.12",
"10.0.0.13",
"10.0.0.14",
"127.0.0.1"
],
"key": {
"algo": "rsa",
"size": 2048
},
"names": [
{
"C": "CN",
"ST": "BeiJing",
"L": "BJ",
"O": "k8s",
"OU": "System"
}
]
}
生成服务端通信专属的证书
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server etcd.json | cfssljson -bare server
[root@kubernetes-master1 /etc/etcd/ssl]# cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer etcd.json | cfssljson -bare peer
确认效果
[root@kubernetes-master1 /data/etcd/ssl]# ls
ca-config.json ca-key.pem client.json etcd.json peer.pem server.pem
ca.csr ca.pem client-key.pem peer.csr server.csr
ca-csr.json client.csr client.pem peer-key.pem server-key.pem
环境实践
获取etcd
获取软件
cd /data/softs
wget https://github.com/etcd-io/etcd/releases/download/v3.5.2/etcd-v3.5.2-linux-amd64.tar.gz
解压文件到命令目录
tar zxf etcd-v3.5.2-linux-amd64.tar.gz
cp etcd-v3.5.2-linux-amd64/{etcd etcdctl} /usr/local/bin/
rm -rf etcd-v3.5.2-linux-amd64
准备服务启动文件
[root@kubernetes-master1 ~]# vim /usr/lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=neCNork.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/local/bin/etcd \
--name=kubernetes-master1 \
--data-dir=/var/lib/etcd/default.etcd \
--listen-peer-urls=https://10.0.0.12:2380 \
--listen-client-urls=https://10.0.0.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://10.0.0.12:2379 \
--initial-advertise-peer-urls=https://10.0.0.12:2380 \
--initial-cluster=kubernetes-master1=https://10.0.0.12:2380,kubernetes-master2=https://10.0.0.13:2380,kubernetes-master3=https://10.0.0.14:2380 \
--initial-cluster-token=etcd-cluster \
--initial-cluster-state=new \
--cert-file=/etc/etcd/ssl/server.pem \
--key-file=/etc/etcd/ssl/server-key.pem \
--peer-cert-file=/etc/etcd/ssl/peer.pem \
--peer-key-file=/etc/etcd/ssl/peer-key.pem \
--trusted-ca-file=/etc/etcd/ssl/ca.pem \
--peer-trusted-ca-file=/etc/etcd/ssl/ca.pem
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
同步所有文件到所有etcd节点
for i in 13 14
do
ssh root@10.0.0.$i mkdir -p /etc/etcd
scp -r /etc/etcd/* root@10.0.0.$i:/etc/etcd/
scp /usr/lib/systemd/system/etcd.service root@10.0.0.$i:/usr/lib/systemd/system/etcd.service
done
master2节点修改服务启动文件
[root@kubernetes-master2 ~]# sed -i '/name/s/master1/master2/' /usr/lib/systemd/system/etcd.service
[root@kubernetes-master2 ~]# sed -i '/urls/s/0.12/0.13/g' /usr/lib/systemd/system/etcd.service
[root@kubernetes-master2 ~]# egrep 'name|urls' /usr/lib/systemd/system/etcd.service
--name=kubernetes-master2 \
--initial-advertise-peer-urls=https://10.0.0.13:2380 \
--listen-peer-urls=https://10.0.0.13:2380 \
--listen-client-urls=https://10.0.0.13:2379 \
--advertise-client-urls=https://10.0.0.13:2379 \
master3节点修改服务启动文件
[root@kubernetes-master3 ~]# sed -i '/name/s/master1/master3/' /usr/lib/systemd/system/etcd.service
[root@kubernetes-master3 ~]# sed -i '/urls/s/0.12/0.14/g' /usr/lib/systemd/system/etcd.service
[root@kubernetes-master3 ~]# egrep 'name|urls' /usr/lib/systemd/system/etcd.service
--name=kubernetes-master3 \
--initial-advertise-peer-urls=https://10.0.0.14:2380 \
--listen-peer-urls=https://10.0.0.14:2380 \
--listen-client-urls=https://10.0.0.14:2379 \
--advertise-client-urls=https://10.0.0.14:2379 \
启动etcd服务
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
systemctl status etcd
检查效果
cd /etc/etcd/ssl
查看集群状态
etcdctl --endpoints="https://10.0.0.12:2379,https://10.0.0.13:2379,https://10.0.0.14:2379," --cacert=ca.pem --cert=server.pem --key=server-key.pem endpoint status
查看集群主机
etcdctl --endpoints="https://10.0.0.12:2379,https://10.0.0.13:2379,https://10.0.0.14:2379," --cacert=ca.pem --cert=server.pem --key=server-key.pem member list
k8s集群改造
传输证书
etcd集群的ca证书
cp /etc/etcd/ssl/ca.pem /etc/kubernetes/pki/etcd/ca.pem
etcd集群的client证书,apiserver访问etcd使用
cp /etc/etcd/ssl/client.pem /etc/kubernetes/pki/apiserver-etcd-client.pem
etcd集群的client私钥
cp /etc/etcd/ssl/client-key.pem /etc/kubernetes/pki/apiserver-etcd-client-key.pem
改造配置文件
[root@kubernetes-master1 /etc/kubernetes]# cat /data/kubernetes/cluster_init/kubeadm_init_1.23.8.yml
...
# 原内容
# etcd:
# local:
# dataDir: /var/lib/etcd
# 修改后内容
etcd:
external:
endpoints:
- https://10.0.0.12:2379
- https://10.0.0.13:2379
- https://10.0.0.14:2379
caFile: /etc/kubernetes/pki/etcd/ca.pem
certFile: /etc/kubernetes/pki/apiserver-etcd-client.pem
keyFile: /etc/kubernetes/pki/apiserver-etcd-client-key.pem
...
集群初始化
kubeadm init --config=kubeadm-init.yaml
后续操作,与第二单元完全一致。
curl模拟访问k8s资源
定制资源访问的client证书
grep client-cert ~/.kube/config |cut -d" " -f 6 | base64 -d > ./client.pem
定制资源访问的client秘钥
grep client-key-data ~/.kube/config |cut -d" " -f 6 | base64 -d > ./client-key.pem
定制资源访问的ca证书
grep certificate-authority-data ~/.kube/config |cut -d" " -f 6 | base64 -d > ./ca.pem
获取集群的入口
kubectl config view |grep server
server: https://10.0.0.200:6443
模拟访问效果
curl --cert ./client.pem --key ./client-key.pem --cacert ./ca.pem https://10.0.0.12:6443/api/v1/namespaces/default/pods/flask-client-default
小结