新系统使用rockylinux9.5,旧系统虚拟机装的是centos7
1 目标服务器
1.1 禁止swap
swapoff -a
vi /etc/fstab
#/dev/mapper/rl-swap none swap defaults 0 0
#执行,swap一行都是0
free -h
1.2 关闭防火墙
只是为了减少维护成本。
systemctl stop firewalld
systemctl disable firewalld
systemctl status firewalld
1.3 关闭SE
# 临时关闭 重启系统后还会开启
setenforce 0
# 永久关闭
vi /etc/selinux/config
# 将SELINUX=enforcing改为SELINUX=disabled,
1.4 更改主机名
hostnamectl set-hostname master7
1.5 添加host
vi /etc/hosts
10.101.10.6 master6
10.101.10.7 master7
10.101.10.8 master8
1.6 配置ip_forward机制
# 设置
modprobe br_netfilter
# net.ipv4.ip_forward为0,则pod的ip无法转发
sysctl -w net.ipv4.ip_forward=1
sysctl -w net.bridge.bridge-nf-call-iptables=1
sysctl -w net.bridge.bridge-nf-call-ip6tables=1
sysctl -p
# 检查
sysctl -a | grep net.ipv4.ip_forward
sysctl -a | grep net.bridge.bridge-nf-call-iptables
sysctl -a | grep net.bridge.bridge-nf-call-ip6tables
1.7 时间同步
sudo dnf install chrony
sudo systemctl start chronyd
sudo systemctl enable chronyd
# 添加配置
vi /etc/chrony.conf
# 添加如下配置
pool ntp1.aliyun.com iburst
pool ntp2.aliyun.com iburst
server ntp1.aliyun.com iburst
server ntp2.aliyun.com iburst
server ntp3.aliyun.com iburst
server ntp4.aliyun.com iburst
server ntp5.aliyun.com iburst
server ntp7.aliyun.com iburst
# 立即同步
sudo chronyc -a makestep
# 查看时间状态
timedatectl status
1.8 添加rancher用户
useradd rancher
usermod -aG docker rancher
echo 123456 | passwd --stdin rancher
cat /etc/group | grep docker
2 源服务器
由原来的master节点添加新的节点,因此这个是在源服务器上执行。
2.1 免密登录
# 在原master节点中执行
su - rancher
ssh-copy-id rancher@master7
2.2 安装新的rke
curl -sfL https://get.rke2.io | sh -
2.2 添加节点
rke管理k8s节点的新增与删除,更改cluster.yml配置,然后执行rke up --update-only --config cluster.yml,因为涉及到etcd的添加,因此需要选择空闲时段来处理。
2.3 安装kubectlctl
安装对应的kubectl
https://dl.k8s.io/release/v1.30.7/bin/linux/amd64/kubectl
chmod +x kubectl
cp -a kubectl /usr/bin
cd /root
mkdir .kube
cp /home/rancher/kube_config_cluster.yml /root/.kube/config
3 一些问题
3.1 docker版本不兼容问题
su - rancher
rke up --update-only --config cluster.yml
执行完命令后,提示下面的错误信息,rancher官网也有这个错误Failed to set up SSH tunneling for host [xxx.xxx.xxx.xxx]: Can't retrieve Docker Info#
WARN[0000] Failed to set up SSH tunneling for host [master6]: Can't retrieve Docker Info: error during connect: Get "http://%2Fvar%2Frun%2Fdocker.sock/v1.24/info": Unable to access node with address [master6:22] using SSH. Please check if you are able to SSH to the node using the specified SSH Private Key and if you have configured the correct SSH username. Error: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain
WARN[0000] Removing host [master6] from node lists
INFO[0000] [network] No hosts added existing cluster, skipping port check
但在源服务器中执行,下面的命令是通过的
ssh -i ~/.ssh/id_rsa rancher@master6
查看docker版本,估计是docker版本
# 目标服务器
[root@master6 ~]# docker --version
Docker version 27.4.0, build bde2b89
# 源服务器
[root@master1 ~]# docker --version
Docker version 19.03.8, build afacb8b
docker并不是最新的就好,当前 rke 版本Release v1.6.5,但是安装的时候提示,也就是说docker27.4.1当前不支持。因此还得做版本回退。
[rancher@master8 ~]$ rke up --config cluster.yml
INFO[0000] Running RKE version: v1.6.5
INFO[0000] Initiating Kubernetes cluster
INFO[0000] [certificates] GenerateServingCertificate is disabled, checking if there are unused kubelet certificates
INFO[0000] [certificates] Generating Kubernetes API server certificates
INFO[0000] [certificates] Generating admin certificates and kubeconfig
INFO[0000] [certificates] Generating kube-etcd-master6 certificate and key
INFO[0000] [certificates] Generating kube-etcd-master7 certificate and key
INFO[0000] [certificates] Generating kube-etcd-master8 certificate and key
INFO[0000] Successfully Deployed state file at [./cluster.rkestate]
INFO[0000] Building Kubernetes cluster
INFO[0000] [dialer] Setup tunnel for host [master7]
INFO[0000] [dialer] Setup tunnel for host [master8]
INFO[0000] [dialer] Setup tunnel for host [master6]
FATA[0001] Unsupported Docker version found [27.4.1] on host [master8], supported versions are [1.13.x 17.03.x 17.06.x 17.09.x 18.06.x 18.09.x 19.03.x 20.10.x 23.0.x 24.0.x 25.0.x 26.0.x 26.1.x 27.0.x 27.1.x 27.2.x]
重置docker环境
systemctl disable docker
sudo systemctl stop docker.socket
systemctl stop docker
dnf remove docker-ce docker-ce-cli containerd.io docker-compose-plugin -y
# 删除docker数据
sudo rm -rf /var/lib/docker
sudo rm -rf /var/lib/containerd
rm -rf /home/docker
# 清理残留文件,如果是重装下面两步也可以跳过
sudo rm -rf /etc/docker
sudo rm -rf /etc/systemd/system/docker.service.d
# 查看可用的docker
sudo yum list docker-ce --showduplicates | sort -r
# 安装指定版本的docker
yum install docker-ce-27.2.1-1.el9 docker-ce-cli-27.2.1-1.el9 containerd.io -y
# 更改docker路径
vi /lib/systemd/system/docker.service
# 重启docker
systemctl start docker
systemctl enable docker
3.2 rke下载不了文件
虽然你改了/etc/docker/daemon.json,但是执行rke up --config cluster.yml,镜像还是下载不下来。在各个节点手工执行一下,如下面拉去对应的镜像,然后再rke up --config cluster.yml就可以往下走了。
docker pull rancher/rke-tools:v0.1.105
下面是执行过程中,我的截图,可以看到有些rancher相关的镜像比较大,都有16.GB,而有些镜像还在下载过程中。
3.3 canal安装失败
calico-kube-controllers安装也失败,但是解决下面的问题后,一并会解决
# 执行这个可以看到详细的错误日志
kubectl describe pod canal-5vznx -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 32m default-scheduler Successfully assigned kube-system/canal-5vznx to master7
Normal Pulling 27m (x4 over 32m) kubelet Pulling image "rancher/calico-cni:v3.28.1-rancher1"
Warning Failed 25m (x4 over 31m) kubelet Error: ErrImagePull
Warning Failed 24m (x7 over 31m) kubelet Error: ImagePullBackOff
Warning Failed 11m (x7 over 31m) kubelet Failed to pull image "rancher/calico-cni:v3.28.1-rancher1": rpc error: code = Canceled desc = context canceled
Normal BackOff 2m44s (x77 over 31m) kubelet Back-off pulling image "rancher/calico-cni:v3.28.1-rancher1"
# 于是手工执行
docker pull rancher/calico-cni:v3.28.1-rancher1
docker pull rancher/mirrored-calico-node:v3.28.1
3.5 kuboard安装失败
下面看还是同样的问题,镜像下载不下来,这个是因为kuboard要设置secret到本地harbor中下载镜像。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46s default-scheduler Successfully assigned kube-system/kuboard-559bccdc6-zf67z to master6
Normal BackOff 18s (x2 over 44s) kubelet Back-off pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"
Warning Failed 18s (x2 over 44s) kubelet Error: ImagePullBackOff
Warning FailedToRetrieveImagePullSecret 3s (x5 over 46s) kubelet Unable to retrieve some image pull secrets (regcred); attempting to pull the image may not succeed.
Normal Pulling 3s (x3 over 45s) kubelet Pulling image "10.101.10.2:8081/mid/eipwork/kuboard:latest"
Warning Failed 3s (x3 over 45s) kubelet Failed to pull image "10.101.10.2:8081/mid/eipwork/kuboard:latest": Error response from daemon: unauthorized: unauthorized to access repository: mid/eipwork/kuboard, action: pull: unauthorized to access repository: mid/eipwork/kuboard, action: pull
Warning Failed 3s (x3 over 45s) kubelet Error: ErrImagePull
kubectl create secret docker-registry regcred \
--docker-server=harbor ip:端口 \
--docker-username=用户名 \
--docker-password=密码\
--docker-email=邮箱 \
-n kube-system
接口要获取kuboard的token
echo $(kubectl -n kube-system get secret $(kubectl -n kube-system get secret | grep kuboard-user | awk '{print $1}') -o go-template='{{.data.token}}' | base64 -d)
3.6 kuboard拿不到token
以往都很容易执行上面的命令就可以了,但是今天不知道为什么kuboard没有创建对应的secret。