k8s迁移——岁月云实战笔记

新系统使用rockylinux9.5，旧系统虚拟机装的是centos7

1 目标服务器

1.1 禁止swap

swapoff -a
vi /etc/fstab
#/dev/mapper/rl-swap     none                    swap    defaults        0 0
#执行，swap一行都是0
free -h

1.2 关闭防火墙

只是为了减少维护成本。

systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld

1.3 关闭SE

# 临时关闭 重启系统后还会开启
setenforce 0
# 永久关闭
vi /etc/selinux/config
# 将SELINUX=enforcing改为SELINUX=disabled，

1.4 更改主机名

hostnamectl set-hostname master7

1.5 添加host

vi /etc/hosts
10.101.10.6 master6
10.101.10.7 master7
10.101.10.8 master8

1.6 配置ip_forward机制

# 设置
modprobe br_netfilter
# net.ipv4.ip_forward为0，则pod的ip无法转发
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -p
# 检查
sysctl -a | grep net.ipv4.ip_forward
sysctl -a | grep net.bridge.bridge-nf-call-iptables
sysctl -a | grep net.bridge.bridge-nf-call-ip6tables

1.7 时间同步

sudo dnf install chrony
sudo systemctl start chronyd
sudo systemctl enable chronyd

# 添加配置
vi /etc/chrony.conf
# 添加如下配置
pool ntp1.aliyun.com iburst
pool ntp2.aliyun.com iburst


server ntp1.aliyun.com iburst
server ntp2.aliyun.com iburst
server ntp3.aliyun.com iburst
server ntp4.aliyun.com iburst
server ntp5.aliyun.com iburst
server ntp7.aliyun.com iburst


# 立即同步
sudo chronyc -a makestep

# 查看时间状态
timedatectl status

1.8 添加rancher用户

useradd rancher
usermod -aG docker rancher
echo 123456 | passwd --stdin rancher
cat /etc/group | grep docker

2 源服务器

由原来的master节点添加新的节点，因此这个是在源服务器上执行。

2.1 免密登录

# 在原master节点中执行
su - rancher
ssh-copy-id rancher@master7

2.2 安装新的rke

curl -sfL https://get.rke2.io | sh -

2.2 添加节点

rke管理k8s节点的新增与删除，更改cluster.yml配置,然后执行rke up --update-only --config cluster.yml，因为涉及到etcd的添加，因此需要选择空闲时段来处理。

2.3 安装kubectlctl

安装对应的kubectl

https://dl.k8s.io/release/v1.30.7/bin/linux/amd64/kubectl

chmod +x kubectl
cp -a kubectl /usr/bin
cd /root
mkdir .kube
cp /home/rancher/kube_config_cluster.yml /root/.kube/config

3 一些问题

3.1 docker版本不兼容问题

su - rancher
rke up --update-only --config cluster.yml

执行完命令后，提示下面的错误信息，rancher官网也有这个错误Failed to set up SSH tunneling for host [xxx.xxx.xxx.xxx]: Can't retrieve Docker Info#

WARN[0000] Failed to set up SSH tunneling for host [master6]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [master6:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain 
WARN[0000] Removing host [master6] from node lists      
INFO[0000] [network] No hosts added existing cluster, skipping port check

但在源服务器中执行，下面的命令是通过的

ssh -i ~/.ssh/id_rsa rancher@master6

查看docker版本，估计是docker版本

# 目标服务器
[root@master6 ~]# docker --version
Docker version 27.4.0, build bde2b89
# 源服务器
[root@master1 ~]# docker --version
Docker version 19.03.8, build afacb8b

docker并不是最新的就好，当前 rke 版本Release v1.6.5，但是安装的时候提示,也就是说docker27.4.1当前不支持。因此还得做版本回退。

[rancher@master8 ~]$ rke up --config cluster.yml
INFO[0000] Running RKE version: v1.6.5                  
INFO[0000] Initiating Kubernetes cluster                
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates 
INFO[0000] [certificates] Generating Kubernetes API server certificates 
INFO[0000] [certificates] Generating admin certificates and kubeconfig 
INFO[0000] [certificates] Generating kube-etcd-master6 certificate and key 
INFO[0000] [certificates] Generating kube-etcd-master7 certificate and key 
INFO[0000] [certificates] Generating kube-etcd-master8 certificate and key 
INFO[0000] Successfully Deployed state file at [./cluster.rkestate] 
INFO[0000] Building Kubernetes cluster                  
INFO[0000] [dialer] Setup tunnel for host [master7]     
INFO[0000] [dialer] Setup tunnel for host [master8]     
INFO[0000] [dialer] Setup tunnel for host [master6]     
FATA[0001] Unsupported Docker version found [27.4.1] on host [master8], supported versions are [1.13.x 17.03.x 17.06.x 17.09.x 18.06.x 18.09.x 19.03.x 20.10.x 23.0.x 24.0.x 25.0.x 26.0.x 26.1.x 27.0.x 27.1.x 27.2.x]

重置docker环境

systemctl disable docker
sudo systemctl stop docker.socket
systemctl stop docker
dnf remove docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
# 删除docker数据
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
rm -rf /home/docker
# 清理残留文件,如果是重装下面两步也可以跳过
sudo rm -rf /etc/docker
sudo rm -rf /etc/systemd/system/docker.service.d
# 查看可用的docker
sudo yum list docker-ce --showduplicates | sort -r
# 安装指定版本的docker
yum install docker-ce-27.2.1-1.el9 docker-ce-cli-27.2.1-1.el9 containerd.io -y
# 更改docker路径
vi /lib/systemd/system/docker.service
# 重启docker
systemctl start docker
systemctl enable docker

3.2 rke下载不了文件

虽然你改了/etc/docker/daemon.json，但是执行rke up --config cluster.yml，镜像还是下载不下来。在各个节点手工执行一下，如下面拉去对应的镜像，然后再rke up --config cluster.yml就可以往下走了。

docker pull rancher/rke-tools:v0.1.105

下面是执行过程中，我的截图，可以看到有些rancher相关的镜像比较大，都有16.GB，而有些镜像还在下载过程中。

3.3 canal安装失败

calico-kube-controllers安装也失败，但是解决下面的问题后，一并会解决

# 执行这个可以看到详细的错误日志
kubectl describe pod canal-5vznx -n kube-system

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  32m                   default-scheduler  Successfully assigned kube-system/canal-5vznx to master7
  Normal   Pulling    27m (x4 over 32m)     kubelet            Pulling image "rancher/calico-cni:v3.28.1-rancher1"
  Warning  Failed     25m (x4 over 31m)     kubelet            Error: ErrImagePull
  Warning  Failed     24m (x7 over 31m)     kubelet            Error: ImagePullBackOff
  Warning  Failed     11m (x7 over 31m)     kubelet            Failed to pull image "rancher/calico-cni:v3.28.1-rancher1": rpc error: code = Canceled desc = context canceled
  Normal   BackOff    2m44s (x77 over 31m)  kubelet            Back-off pulling image "rancher/calico-cni:v3.28.1-rancher1"

# 于是手工执行
docker pull rancher/calico-cni:v3.28.1-rancher1
docker pull rancher/mirrored-calico-node:v3.28.1

3.5 kuboard安装失败

下面看还是同样的问题，镜像下载不下来，这个是因为kuboard要设置secret到本地harbor中下载镜像。

Events:
  Type     Reason                           Age                From               Message
  ----     ------                           ----               ----               -------
  Normal   Scheduled                        46s                default-scheduler  Successfully assigned kube-system/kuboard-559bccdc6-zf67z to master6
  Normal   BackOff                          18s (x2 over 44s)  kubelet            Back-off pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"
  Warning  Failed                           18s (x2 over 44s)  kubelet            Error: ImagePullBackOff
  Warning  FailedToRetrieveImagePullSecret  3s (x5 over 46s)   kubelet            Unable to retrieve some image pull secrets (regcred); attempting to pull the image may not succeed.
  Normal   Pulling                          3s (x3 over 45s)   kubelet            Pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"
  Warning  Failed                           3s (x3 over 45s)   kubelet            Failed to pull image "10.101.10.2:8081/mid/eipwork/kuboard:latest": Error response from daemon: unauthorized: unauthorized to access repository: mid/eipwork/kuboard, action: pull: unauthorized to access repository: mid/eipwork/kuboard, action: pull
  Warning  Failed                           3s (x3 over 45s)   kubelet            Error: ErrImagePull

kubectl create secret docker-registry regcred \
  --docker-server=harbor ip:端口 \
  --docker-username=用户名 \
  --docker-password=密码\
  --docker-email=邮箱 \
  -n kube-system

接口要获取kuboard的token

echo $(kubectl -n kube-system get secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}') -o go-template='{{.data.token}}' | base64 -d)