文章目录
- 一、写在前面
- 二、问题排查
- 1、执行join时加上-v=2参数查看日志
- 2、处理证书问题
- 3、重启
- 4、其他方法1
- 5、其他方法2
- 三、总结
- 参考资料
一、写在前面
部署k8s时,主节点部署成功了,从节点1执行kubeadm join也成功了,从节点2执行kubeadm join一直卡在[preflight] Running pre-flight checks不动!
二、问题排查
网上查的资料,不管是时间同步,还是重新生成token都尝试了,结果还是不行。
kubeadm token list 查看token也并没有 过期。
kubeadm token create --ttl 0 --print-join-command 重新生成token也不行。
而且重新部署之后,还是不行。
# 0、删除node
kubectl get nodes
kubectl cordon w1 # 不可调度
kubectl drain w1 --ignore-daemonsets
kubectl delete node w1
# 1、重置
kubeadm reset
rm -rf /etc/kubernetes/*
rm -rf ~/.kube
# 2、重新init
# 3、重新执行init后的日志
# 4、重新部署calico网络插件
# 5、从节点重新加入
关键问题是,从节点1正常能加入集群,为什么从节点2无法加入集群???
1、执行join时加上-v=2参数查看日志
[root@w1 ~]# kubeadm join -v=2 192.168.56.100:6443 --token wvsok4.5kjxe1ts8kidll1b --discovery-token-ca-cert-hash sha256:e94113cc2b2fb1b9994c7e419c5f3b776493c7151377812672fe55163b3f97a5
I0703 09:09:38.169190 2029 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I0703 09:09:38.169794 2029 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
[preflight] Running pre-flight checks
I0703 09:09:38.169865 2029 preflight.go:90] [preflight] Running general checks
I0703 09:09:38.170055 2029 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0703 09:09:38.170069 2029 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I0703 09:09:38.170078 2029 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0703 09:09:38.170085 2029 checks.go:105] validating the container runtime
I0703 09:09:38.221649 2029 checks.go:131] validating if the service is enabled and active
I0703 09:09:38.262731 2029 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0703 09:09:38.262898 2029 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0703 09:09:38.262920 2029 checks.go:653] validating whether swap is enabled or not
I0703 09:09:38.262941 2029 checks.go:382] validating the presence of executable ip
I0703 09:09:38.263176 2029 checks.go:382] validating the presence of executable iptables
I0703 09:09:38.263554 2029 checks.go:382] validating the presence of executable mount
I0703 09:09:38.263659 2029 checks.go:382] validating the presence of executable nsenter
I0703 09:09:38.263669 2029 checks.go:382] validating the presence of executable ebtables
I0703 09:09:38.263680 2029 checks.go:382] validating the presence of executable ethtool
I0703 09:09:38.263688 2029 checks.go:382] validating the presence of executable socat
I0703 09:09:38.263696 2029 checks.go:382] validating the presence of executable tc
I0703 09:09:38.263703 2029 checks.go:382] validating the presence of executable touch
I0703 09:09:38.263718 2029 checks.go:524] running all checks
I0703 09:09:38.275230 2029 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0703 09:09:38.275514 2029 checks.go:622] validating kubelet version
I0703 09:09:38.311281 2029 checks.go:131] validating if the service is enabled and active
I0703 09:09:38.316858 2029 checks.go:209] validating availability of port 10250
I0703 09:09:38.317624 2029 checks.go:292] validating the existence of file /etc/kubernetes/pki/ca.crt
I0703 09:09:38.317634 2029 checks.go:439] validating if the connectivity type is via proxy or direct
I0703 09:09:38.317653 2029 join.go:427] [preflight] Discovering cluster-info
I0703 09:09:38.317704 2029 token.go:200] [discovery] Trying to connect to API Server "192.168.56.100:6443"
I0703 09:09:38.318179 2029 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.100:6443"
I0703 09:09:38.319099 2029 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://192.168.56.100:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 192.168.56.100:6443: connect: protocol not available]
发现提示protocol not available,然后我们使用curl https://192.168.56.100:6443/api/v1/namespaces/kube-public/configmaps/cluster-info,发现也确实是提示protocol not available。
从主节点curl,发现有以下提示:
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
但是!我们使用浏览器访问这个网址,发现是https链接的证书出现了问题!
2、处理证书问题
搜了一大堆资料,大致参考了以下:
https://www.cnblogs.com/hkgov/p/14959992.html
https://blog.csdn.net/u012375924/article/details/108832392
(1)处理方式1
将这一串追加到/etc/pki/tls/certs/ca-bundle.crt文件
(2)处理方式2
随便下载一个有效的证书,将该文件上传到目录/etc/pki/ca-trust/source/anchors/下,将文件的后缀名改为.crt, 然后执行命令update-ca-trust extract
再次使用join命令,发现还是不行。
3、重启
再次将主节点重启之后,发现,join命令竟然可以使用了。。。。
# 查看日志
journalctl -u kubelet -f
# 重启k8(如果一直没ready的话)
systemctl restart kubelet && systemctl enable kubelet
4、其他方法1
https://blog.csdn.net/axin_123456/article/details/128961219
可能的原因: 之前错误操作,
systemctl stop NetworkManager--临时关闭
systemctl disable NetworkManager --永久关闭网络管理命令
又重新做了如下操作:
systemctl start NetworkManager
systemctl start network.service --开启网络服务
5、其他方法2
#安装utpdate工具
yum -y install utp ntpdate
timedatectl set-timezone Asia/Shanghai # 设置系统时区为上海
#设置系统时间与网络时间同步
ntpdate cn.pool.ntp.org
#将系统时间写入硬件时间
hwclock --systohc
三、总结
就上面那几种方式……不知道哪一个生效了,最后都是重启主节点之后就好了。注意!只重启主节点即可
,重启主节点+从节点仍然不好用。
不知道什么毛病……
要不是为了学习,才不会自己手贱装这玩意。。。
参考资料
https://www.cnblogs.com/hkgov/p/14959992.html
https://blog.csdn.net/u012375924/article/details/108832392