linux ipvs模式下dummy网卡的作用
1、场景:
在使用ipvs来实现vip的负载均衡的时候,有时我们会在linux中创建一块dummy网卡,并在网卡上绑上vip
2、场景示例:k8s kube-proxy组件的ipvs模式
kube-proxy在ipvs模式下生成了一块kube-ipvs0虚拟网卡,并且在上面绑定了service ip
[root@master140 ~]# ip addr
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether d2:b0:08:01:3e:52 brd ff:ff:ff:ff:ff:ff
inet 172.18.13.31/32 brd 172.18.13.31 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 172.18.13.1/32 brd 172.18.13.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 172.18.13.187/32 brd 172.18.13.187 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 172.18.13.113/32 brd 172.18.13.113 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 172.18.13.222/32 brd 172.18.13.222 scope global kube-ipvs0
valid_lft forever preferred_lft forever
还会为vip生成route规则
[root@master140 ~]# ip route show table local
local 172.18.13.1 dev kube-ipvs0 proto kernel scope host src 172.18.13.1
local 172.18.13.31 dev kube-ipvs0 proto kernel scope host src 172.18.13.31
local 172.18.13.113 dev kube-ipvs0 proto kernel scope host src 172.18.13.113
local 172.18.13.187 dev kube-ipvs0 proto kernel scope host src 172.18.13.187
local 172.18.13.222 dev kube-ipvs0 proto kernel scope host src 172.18.13.222
同时还会生成ipvs规则
[root@master140 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 127.0.0.1:30001 rr
TCP 127.0.0.1:30002 rr
-> 10.0.3.5:8080 Masq 1 0 0
-> 10.0.3.7:8080 Masq 1 0 0
TCP 127.0.0.1:30094 rr
-> 10.0.3.2:80 Masq 1 0 0
TCP 172.17.0.1:30001 rr
TCP 172.17.0.1:30002 rr
-> 10.0.3.5:8080 Masq 1 0 0
-> 10.0.3.7:8080 Masq 1 0 0
TCP 172.17.0.1:30094 rr
-> 10.0.3.2:80 Masq 1 0 0
TCP 172.18.13.1:443 rr
-> 192.168.204.142:6443 Masq 1 0 0
TCP 172.18.13.31:80 rr
TCP 172.18.13.113:8082 rr
-> 10.0.3.5:8080 Masq 1 0 0
-> 10.0.3.7:8080 Masq 1 0 0
TCP 172.18.13.187:3306 rr
-> 10.0.3.4:3306 Masq 1 0 0
TCP 172.18.13.222:80 rr
-> 10.0.3.2:80 Masq 1 0 0
TCP 192.168.204.142:30001 rr
TCP 192.168.204.142:30002 rr
-> 10.0.3.5:8080 Masq 1 0 0
-> 10.0.3.7:8080 Masq 1 0 0
TCP 192.168.204.142:30094 rr
-> 10.0.3.2:80 Masq 1 0 0
TCP 10.0.1.0:30001 rr
TCP 10.0.1.0:30002 rr
-> 10.0.3.5:8080 Masq 1 0 0
-> 10.0.3.7:8080 Masq 1 0 0
TCP 10.0.1.0:30094 rr
-> 10.0.3.2:80 Masq 1 0 0
3、ipvs模式下的dummy网卡作用
先看下ipvs转发的流转图:
过程:
-
1、当用户请求到达Director Server,此时请求的数据报文会先到内核空间的PREROUTING链。 此时报文的源IP为CIP,目标IP为VIP。
-
2、PREROUTING检查发现数据包的目标IP是本机,将数据包送至INPUT链。
-
3、ipvs会监听到达input链的数据包,比对数据包请求的服务是否为集群服务,若是,修改数据包的目标IP地址为后端服务器IP,然后将数据包发至POSTROUTING链。 此时报文的源IP为CIP,目标IP为RIP。
-
4、POSTROUTING链通过选路,将数据包发送给Real Server
-
5、Real Server比对发现目标为自己的IP,开始构建响应报文发回给Director Server。 此时报文的源IP为RIP,目标IP为CIP。
-
6、Director Server在响应客户端前,此时会将源IP地址修改为自己的VIP地址,然后响应给客户端。 此时报文的源IP为VIP,目标IP为CIP。
新增网卡和route的作用:
由于 IPVS 的 DNAT 钩子挂在 INPUT 链上,因此必须要让内核识别 VIP 是本机的 IP。这样才会过INPUT 链,要不然就通过OUTPUT链出去了。k8s 通过设置将service cluster ip 绑定到虚拟网卡kube-ipvs0。
以cni为flannel为例,此时流转图变成这样,有了dummy网卡后,service ip的流量就能进入input链,从而执行ipvs的hook,进行dnat
ipvs的hook:
具体hook定义:
static const struct nf_hook_ops ip_vs_ops[] = {
/* After packet filtering, change source only for VS/NAT */
{
.hook = ip_vs_reply4,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP_PRI_NAT_SRC - 2,
},
/* After packet filtering, forward packet through VS/DR, VS/TUN,
* or VS/NAT(change destination), so that filtering rules can be
* applied to IPVS. */
{
.hook = ip_vs_remote_request4,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_IN,
.priority = NF_IP_PRI_NAT_SRC - 1,
},
/* Before ip_vs_in, change source only for VS/NAT */
{
.hook = ip_vs_local_reply4,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_OUT,
.priority = NF_IP_PRI_NAT_DST + 1,
},
/* After mangle, schedule and forward local requests */
{
.hook = ip_vs_local_request4,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_LOCAL_OUT,
.priority = NF_IP_PRI_NAT_DST + 2,
},
/* After packet filtering (but before ip_vs_out_icmp), catch icmp
* destined for 0.0.0.0/0, which is for incoming IPVS connections */
{
.hook = ip_vs_forward_icmp,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_FORWARD,
.priority = 99,
},
/* After packet filtering, change source only for VS/NAT */
{
.hook = ip_vs_reply4,
.pf = NFPROTO_IPV4,
.hooknum = NF_INET_FORWARD,
.priority = 100,
},
...