k8s的亲和调度
出于高效通信等需求,偶尔需要把一些Pod对象组织在相近的位置(同一节点、机架、区域或地区等),例如应用程序的Pod及其后端提供数据服务的Pod等,我们可以认为这是一类具有亲和关系的Pod对象。
理想的实现方式是允许调度器把第一个Pod放置在任何位置,而后与其有着亲和或反亲和关系的其他Pod据此动态完成位置编排,这就是Pod亲和调度与反亲和调度的功用。Pod间的亲和关系也存在强制亲和及首选亲和的区别,它们表示的约束意义同节点亲和相似。
Pod 亲和性
Pod 亲和性(podAffinity)主要解决 Pod 可以和哪些 Pod 部署在同一个拓扑域中的问题(其中拓扑域用主机标签实现,可以是单个主机,也可以是多个主机组成的 cluster、zone 等等),而 Pod 反亲和性主要是解决 Pod 不能和哪些 Pod 部署在同一个拓扑域中的问题,它们都是处理的 Pod 与 Pod 之间的关。
[root@k8s-01 ~]# kubectl explain deploy.spec.template.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution
KIND: Deployment
VERSION: apps/v1
RESOURCE: requiredDuringSchedulingIgnoredDuringExecution <[]Object>
DESCRIPTION:
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to a pod label update), the system may or
may not try to eventually evict the pod from its node. When there are
multiple elements, the lists of nodes corresponding to each podAffinityTerm
are intersected, i.e. all terms must be satisfied.
Defines a set of pods (namely those matching the labelSelector relative to
the given namespace(s)) that this pod should be co-located (affinity) or
not co-located (anti-affinity) with, where co-located is defined as running
on a node whose value of the label with key <topologyKey> matches that of
any node on which a pod of the set of pods is running
FIELDS:
labelSelector <Object>
A label query over a set of resources, in this case pods.
namespaceSelector <Object>
A label query over the set of namespaces that the term applies to. The term
is applied to the union of the namespaces selected by this field and the
ones listed in the namespaces field. null selector and null or empty
namespaces list means "this pod's namespace". An empty selector ({})
matches all namespaces. This field is beta-level and is only honored when
PodAffinityNamespaceSelector feature is enabled.
namespaces <[]string>
namespaces specifies a static list of namespace names that the term applies
to. The term is applied to the union of the namespaces listed in this field
and the ones selected by namespaceSelector. null or empty namespaces list
and null namespaceSelector means "this pod's namespace"
topologyKey <string> -required-
This pod should be co-located (affinity) or not co-located (anti-affinity)
with the pods matching the labelSelector in the specified namespaces, where
co-located is defined as running on a node whose value of the label with
key topologyKey matches that of any node on which any of the selected pods
is running. Empty topologyKey is not allowed.
[root@k8s-01 ~]#
Pod间的亲和关系定义在spec.affinity.podAffinity字段中,而反亲和关系定义在spec.affinity.podAntiAffinity字段中,它们各自的约束特性也存在强制与首选两种,它们都支持使用如下关键字段。
- topologyKey :拓扑键,用来划分拓扑结构的节点标签,在指定的键上具有相同值的节点归属为同一拓扑;必选字段。
- labelSelector
- namespaces <[]string>:用于指示labelSelector字段的生效目标名称空间,默认为当前Pod所属的同一名称空间。
下面是测试的yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-affinity
labels:
app: pod-affinity
spec:
replicas: 3
selector:
matchLabels:
app: pod-affinity
template:
metadata:
labels:
app: pod-affinity
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: nginxweb
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
- labelSelector:
matchExpressions:
- key: logging
operator: In
values:
- true
topologyKey: kubernetes.io/hostname
这里的 topologyKey为 kubernetes.io/hostname,即以每个node节点名为一个区域,然后在选择有pod为logging=true的pod所在的节点
查看pods,发现所有的pods都在node3节点
[root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
pod-affinity-64bc56d789-2bczb 1/1 Running 0 5m25s 10.244.165.213 k8s-03 <none> <none>
pod-affinity-64bc56d789-qgtkd 1/1 Running 0 5m25s 10.244.165.211 k8s-03 <none> <none>
pod-affinity-64bc56d789-w95dv 1/1 Running 0 5m25s 10.244.165.208 k8s-03 <none> <none>
[root@k8s-01 ~]#
如果此时,我们修改部分的yaml,并将副本改成10
- labelSelector:
matchExpressions:
- key: app
operator: In
values: ["nginx-readiness","nginx-test"]
topologyKey: disk
运行yaml,可以看见pod分散在node2和node4 2个节点上。
[root@k8s-01 ~]# kubectl get pods -o wide |grep pod-affinity
pod-affinity-94b66f75b-2cxns 1/1 Running 0 107s 10.244.7.86 k8s-04 <none> <none>
pod-affinity-94b66f75b-6jfrv 1/1 Running 0 107s 10.244.7.87 k8s-04 <none> <none>
pod-affinity-94b66f75b-7bftn 1/1 Running 0 107s 10.244.179.15 k8s-02 <none> <none>
pod-affinity-94b66f75b-9tqgm 1/1 Running 0 107s 10.244.7.85 k8s-04 <none> <none>
pod-affinity-94b66f75b-dnph9 1/1 Running 0 107s 10.244.7.88 k8s-04 <none> <none>
pod-affinity-94b66f75b-fznzb 1/1 Running 0 107s 10.244.179.11 k8s-02 <none> <none>
pod-affinity-94b66f75b-q6lv2 1/1 Running 0 107s 10.244.179.13 k8s-02 <none> <none>
pod-affinity-94b66f75b-s7jj5 1/1 Running 0 107s 10.244.179.16 k8s-02 <none> <none>
pod-affinity-94b66f75b-tn4s4 1/1 Running 0 107s 10.244.179.10 k8s-02 <none> <none>
pod-affinity-94b66f75b-xpbnq 1/1 Running 0 107s 10.244.7.89 k8s-04 <none> <none>
[root@k8s-01 ~]#
由此可见,Pod间的亲和调度能够将有密切关系或密集通信的应用约束在同一位置,通过降低通信延迟来降低性能损耗。需要注意的是,若节点上的标签在运行时发生更改导致不能再满足Pod上的亲和关系定义时,该Pod将继续在该节点上运行而不会被重新调度。另外,labelSelector属性仅匹配与被调度的Pod在同一名称空间中的Pod资源,不过也可以通过为其添加namespace字段以指定其他名称空间。
pod的亲和也支持柔性亲和,和节点亲和一致,这里不再给出具体的测试过程。
Pod 反亲和性
Pod 反亲和性(podAntiAffinity)则是反着来的,比如一个节点上运行了某个 Pod,那么我们的模板 Pod 则不希望被调度到这个节点上面去了。我们把上面的 podAffinity
直接改成podAntiAffinity。
反亲和可以实现DaemonSe+nodeSelector的效果,但是比它更加的灵活,前者如果node节点挂了,则pod就少一份,必须要等这个node起来,才会拉起pod,而反亲和的话,则可以在满足的topologyKey中,选择任意一节点,在起一个pod。因此,反亲和性调度一般用于分散同一类应用的Pod对象等,也包括把不同安全级别的Pod对象调度至不同的区域、机架或节点等。
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-antiaffinity
labels:
app: pod-antiaffinity
spec:
replicas: 3
selector:
matchLabels:
app: pod-antiaffinity
template:
metadata:
labels:
app: pod-antiaffinity
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
name: nginxweb
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬策略
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pod-antiaffinity
topologyKey: kubernetes.io/hostname
发现每一个pod都运行在不同的节点上
[root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
pod-antiaffinity-86566d4dd5-bpspt 1/1 Running 0 23s 10.244.61.220 k8s-01 <none> <none>
pod-antiaffinity-86566d4dd5-ggbgc 1/1 Running 0 23s 10.244.179.2 k8s-02 <none> <none>
pod-antiaffinity-86566d4dd5-q5jl4 1/1 Running 0 23s 10.244.7.83 k8s-04 <none> <none>
[root@k8s-01 ~]#
如果此时将副本改成5个,则有一个pod处于pending状态
[root@k8s-01 ~]# kubectl get pods -o wide |grep pod-antiaffinity
pod-antiaffinity-86566d4dd5-5h9h7 1/1 Running 0 59s 10.244.61.224 k8s-01 <none> <none>
pod-antiaffinity-86566d4dd5-fslqk 1/1 Running 0 59s 10.244.179.14 k8s-02 <none> <none>
pod-antiaffinity-86566d4dd5-n474x 1/1 Running 0 59s 10.244.165.222 k8s-03 <none> <none>
pod-antiaffinity-86566d4dd5-pcbhs 1/1 Running 0 59s 10.244.7.91 k8s-04 <none> <none>
pod-antiaffinity-86566d4dd5-vqvhv 0/1 Pending 0 59s <none> <none> <none> <none>
[root@k8s-01 ~]#
类似地,Pod反亲和调度也支持使用柔性约束机制,调度器会尽量不把位置相斥的Pod对象调度到同一位置,但约束关系无法得到满足时,也可以违反约束规则进行调度,而非把Pod置于Pending状态。