Pod调度之亲和性
<h2 id="概述">概述</h2><p>官方文档:</p>
<ul>
<li>https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity</li>
<li>https://kubernetes.io/zh-cn/docs/tasks/configure-pod-container/assign-pods-nodes-using-node-affinity</li>
</ul>
<p>在 Kubernetes(K8s)中,<strong>Pod 调度亲和性(Affinity)</strong> 是一种高级调度策略,用于控制 Pod 与节点(Node)或其他 Pod 之间的关联(亲和)或反关联(反亲和)关系。通过亲和性规则,管理员可以更精细地控制 Pod 的调度行为,满足业务的拓扑约束、资源局部性、高可用等需求。</p>
<p>亲和性主要有两类</p>
<ul>
<li>节点亲和性</li>
<li>Pod亲和性</li>
</ul>
<h2 id="节点亲和性">节点亲和性</h2>
<p>节点亲和性是 Kubernetes 中<strong>控制 Pod 调度到特定节点的核心机制</strong>,它比传统的 nodeSelector 更具灵活性,支持更复杂的匹配规则和优先级策略。</p>
<p>节点亲和性通过 <code>nodeAffinity</code> 字段定义,包含 <code>requiredDuringSchedulingIgnoredDuringExecution</code>(硬性)和 <code>preferredDuringSchedulingIgnoredDuringExecution</code>(软性)两种规则。</p>
<h3 id="节点亲和性的作用">节点亲和性的作用</h3>
<h4 id="精准匹配节点标签">精准匹配节点标签</h4>
<p>节点亲和性通过定义 标签匹配规则(如节点必须包含 / 不包含某个标签、标签值在指定范围内等),将 Pod 调度到符合条件的节点。</p>
<ul>
<li>
<p>硬性规则(requiredDuringScheduling):强制 Pod 只能调度到满足条件的节点,否则保持 Pending 状态。<br>
例:将数据库 Pod 强制调度到标记为 role=database 且 storage=ssd 的节点。</p>
</li>
<li>
<p>软性规则(preferredDuringScheduling):优先调度到满足条件的节点,若不满足则尝试其他节点(通过权重配置优先级)。<br>
例:优先将 Web 服务 Pod 调度到 region=east 的节点(权重 100),其次调度到 zone=az1 的节点(权重 70)。</p>
</li>
</ul>
<h4 id="精细化控制">精细化控制</h4>
<p>替代简单的 nodeSelector,支持复杂逻辑(如 “节点必须包含 A 标签且不包含 B 标签”)。</p>
<h4 id="灵活容错">灵活容错</h4>
<p>软性规则允许调度器在条件不满足时 “退而求其次”,避免 Pod 长时间处于 Pending。</p>
<h4 id="资源利用率优化">资源利用率优化</h4>
<p>通过标签分层(如环境、硬件、地域),实现集群资源的合理分配与负载均衡。</p>
<h3 id="节点亲和性实战-硬性规则匹配">节点亲和性实战-硬性规则匹配</h3>
<p>必须满足条件才能调度,否则 Pod 将处于 Pending 状态。调度成功后,即使节点标签变更导致规则不再满足,Pod 也不会被驱逐。</p>
<p>将 Pod 调度到同时满足以下条件的节点:</p>
<ul>
<li>标签 env=prod</li>
<li>标签 disk-type=ssd</li>
<li>标签 cpu-cores > 2</li>
</ul>
<p>示例:<br>
给node01节点打上标签</p>
<pre><code># kubectl label node node01 env=prod disk-type=ssd cpu-cores=3
node/node01 labeled
# 查看标签
# kubectl describe node node01 | grep Labels -A 5
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
cpu-cores=3
disk-type=ssd
env=prod
kubernetes.io/arch=amd64
</code></pre>
<p>创建Pod</p>
<pre><code># cat affinity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-affinity
spec:
replicas: 5
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
# 节点亲和性
nodeAffinity:
# 指定硬性规则
requiredDuringSchedulingIgnoredDuringExecution:
# 匹配规则
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values: ["prod"]
- key: disk-type
operator: In
values: ["ssd"]
- key: cpu-cores
operator: Gt
values: ["2"]# 注意:values 是字符串列表,内部会转为数字比较
containers:
- name: nginx
image: nginx
#创建
# kubectl apply -f affinity-deploy.yaml
deployment.apps/nginx-affinity created
</code></pre>
<p>查看Pod调度到哪个节点上了?<br>
发现Pod全部调度到node01节点上</p>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-affinity-556d5d5987-5bnjm 1/1 Running 0 94s 100.117.144.175 node01 <none> <none>
nginx-affinity-556d5d5987-bvmvh 1/1 Running 0 94s 100.117.144.173 node01 <none> <none>
nginx-affinity-556d5d5987-d7tkg 1/1 Running 0 94s 100.117.144.174 node01 <none> <none>
nginx-affinity-556d5d5987-frkkm 1/1 Running 0 94s 100.117.144.172 node01 <none> <none>
nginx-affinity-556d5d5987-ttlcr 1/1 Running 0 94s 100.117.144.176 node01 <none> <none>
</code></pre>
<p>如果标签不匹配,Pod会出现什么情况呢?</p>
<pre><code># 将node01节点上的标签删除一个
# kubectl label node node01 cpu-cores-
node/node01 unlabeled
# 查看标签
# kubectl describe node node01 | grep Labels -A 5
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
disk-type=ssd
env=prod
kubernetes.io/arch=amd64
kubernetes.io/hostname=node01
# 删除Pod让其重建
# kubectl get po | awk '{print $1}' | xargs kubectl delete po
pod "nginx-affinity-556d5d5987-5bnjm" deleted
pod "nginx-affinity-556d5d5987-bvmvh" deleted
pod "nginx-affinity-556d5d5987-d7tkg" deleted
pod "nginx-affinity-556d5d5987-frkkm" deleted
pod "nginx-affinity-556d5d5987-ttlcr" deleted
</code></pre>
<p>查看Pod,发现状态都是Pending状态</p>
<pre><code># kubectl get po
NAME READY STATUS RESTARTS AGE
nginx-affinity-556d5d5987-44qmc 0/1 Pending 0 27s
nginx-affinity-556d5d5987-4qtth 0/1 Pending 0 27s
nginx-affinity-556d5d5987-4tws5 0/1 Pending 0 27s
nginx-affinity-556d5d5987-bgm7n 0/1 Pending 0 27s
nginx-affinity-556d5d5987-kh555 0/1 Pending 0 27s
</code></pre>
<p>查看一下详细信息,发现是标签不匹配</p>
<pre><code>
# kubectl describe po nginx-affinity-556d5d5987-44qmc
Name: nginx-affinity-556d5d5987-44qmc
Events:
Type Reason Age From Message
---- ------ -------- -------
WarningFailedScheduling71s default-scheduler0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
WarningFailedScheduling70s default-scheduler0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) didn't match Pod's node affinity/selector. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
</code></pre>
<blockquote>
<p>总结一下,节点亲和性可以更加细腻指定Pod调度到某一个节点上,如果指定的标签不匹配,那么Pod会处于Pending状态</p>
</blockquote>
<h3 id="节点亲和性实战-软性规则匹配">节点亲和性实战-软性规则匹配</h3>
<p>软性规则匹配优先满足条件,但不强制。调度器会为每个满足条件的节点打分,选择分数最高的节点。</p>
<h4 id="评分机制">评分机制</h4>
<p>当存在多个满足软性规则的节点时,调度器会计算每个节点的得分:</p>
<ul>
<li>基础分:所有节点初始分为 0。</li>
<li>权重叠加:对每个 preferredDuringSchedulingIgnoredDuringExecution 规则:
<ul>
<li>若节点满足规则,得分为 weight 值。</li>
<li>若不满足,得分为 0。</li>
</ul>
</li>
<li>总分计算:节点最终得分是所有匹配规则的 weight 之和。</li>
</ul>
<p>示例:<br>
优先调度到以下节点:</p>
<ul>
<li>首选 region=east 的节点(权重 100)</li>
<li>其次 disk-type=ssd 的节点(权重 70)</li>
</ul>
<p>给node01节点打上<code>region=east</code>标签</p>
<pre><code># kubectl label node node01 region=east
node/node01 labeled
</code></pre>
<p>给node02节点打上<code>disk-type=ssd</code>标签</p>
<pre><code># kubectl label node node02 disk-type=ssd
node/node02 labeled
</code></pre>
<p>创建deploy</p>
<pre><code># cat affinity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-affinity
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
# 节点亲和性
nodeAffinity:
# 指定软性匹配规则
preferredDuringSchedulingIgnoredDuringExecution:
# 权重范围 1-100,值越高优先级越高
- weight: 100
preference:
matchExpressions:
- key: region
operator: In
values: ["east"]
- weight: 70
preference:
matchExpressions:
- key: disk-type
operator: In
values: ["ssd"]
containers:
- name: nginx
image: nginx
# kubectl apply -f affinity-deploy.yaml
deployment.apps/nginx-affinity created
</code></pre>
<p>查看Pod调度到哪一个节点上</p>
<blockquote>
<p>发现调度到node01节点上的Pod居多,因为node01的权重是100,而node02节点上略少,权重为70</p>
</blockquote>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-affinity-65587946bd-84mv5 1/1 Running 0 30s 100.117.144.180 node01 <none> <none>
nginx-affinity-65587946bd-8lfwl 1/1 Running 0 30s 100.117.144.183 node01 <none> <none>
nginx-affinity-65587946bd-9fnhb 1/1 Running 0 30s 100.117.144.179 node01 <none> <none>
nginx-affinity-65587946bd-9rqt9 1/1 Running 0 30s 100.117.144.177 node01 <none> <none>
nginx-affinity-65587946bd-gsq4c 1/1 Running 0 30s 100.117.144.182 node01 <none> <none>
nginx-affinity-65587946bd-pf845 1/1 Running 0 30s 100.95.185.238 node02 <none> <none>
nginx-affinity-65587946bd-pvwps 1/1 Running 0 30s 100.95.185.237 node02 <none> <none>
nginx-affinity-65587946bd-qhhh7 1/1 Running 0 30s 100.95.185.239 node02 <none> <none>
nginx-affinity-65587946bd-tn54h 1/1 Running 0 30s 100.117.144.178 node01 <none> <none>
nginx-affinity-65587946bd-x64qs 1/1 Running 0 30s 100.117.144.181 node01 <none> <none>
</code></pre>
<h3 id="节点亲和性-混合使用硬性和软性规则">节点亲和性-混合使用硬性和软性规则</h3>
<p>示例:</p>
<pre><code>apiVersion: v1
kind: Pod
metadata:
name: mixed-affinity-pod
spec:
affinity:
nodeAffinity:
# 硬性限制
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: env
operator: In
values: ["prod"]
# 软性限制
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: gpu
operator: Exists
containers:
- name: nginx
image: nginx
</code></pre>
<h2 id="pod亲和性">Pod亲和性</h2>
<p>Pod 亲和性(Pod Affinity)是 Kubernetes 中控制 Pod 调度的重要机制,其核心作用是根据其他 Pod 的位置(如节点、命名空间等)来影响当前 Pod 的调度决策,实现 Pod 之间的协同部署或反亲和(互斥部署)。这一机制通过标签匹配规则,将相关 Pod 「吸引」到同一区域(如节点、机架、可用区等)或「排斥」到不同区域,从而优化资源利用、提升服务性能或增强系统稳定性。</p>
<h3 id="pod亲和性的分类">Pod亲和性的分类</h3>
<p>Pod亲和性分为间亲和性和反亲和性。</p>
<p>亲和性是当第一个Pod调度到一个特定的拓扑域中时,后续的所有的Pod都会往该拓扑域调度。</p>
<blockquote>
<p>拓扑域理解为亲和性规则作用的「区域范围」,通常为节点标签(如 kubernetes.io/hostname 表示节点,kubernetes.io/zone 表示可用区)。</p>
</blockquote>
<p>反亲和性是保证一个拓扑域中最多只能有且仅有一个相同的pod,多余的pod处于pending状态</p>
<h3 id="pod间亲和性实战">Pod间亲和性实战</h3>
<p>Pod 间亲和性可以用于将相关 Pod 调度到同一拓扑层级(如同一节点或同一可用区),从而减少网络延迟,提高性能。例如:</p>
<ul>
<li>将前端和后端服务部署在同一节点或同一可用区:减少服务间通信的网络延迟。</li>
<li>将依赖的服务部署在同一节点:提高服务间的通信效率。</li>
</ul>
<p>Pod 间亲和性也分为硬性限制和软性限制,通过<code>requiredDuringSchedulingIgnoredDuringExecution</code>(硬性)和 <code>preferredDuringSchedulingIgnoredDuringExecution</code>(软性)来指定。</p>
<h4 id="硬性限制实战">硬性限制实战</h4>
<p>示例:</p>
<pre><code># 定义deploy
# cat affinity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-affinity
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
# Pod亲和性
podAffinity:
# 指定硬性匹配规则
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
# 这里指定Pod的标签,而不是节点的标签
matchExpressions:
- key: app
operator: In
values: ["nginx"]
# 添加 topologyKey,topologyKey 决定了 Pod 亲和性或反亲和性规则在集群中的作用范围。
# 这里指定节点标签的key
topologyKey: "kubernetes.io/hostname"
containers:
- name: nginx
image: nginx
# kubectl apply -f affinity-deploy.yaml
deployment.apps/nginx-affinity created
</code></pre>
<blockquote>
<p>配置说明:<br>
labelSelector:标签选择器,在这里是选择Pod的标签,而不是选择节点的标签,因为Pod亲和性是Pod级别的调度<br>
topologyKey:该字段是Pod亲和性中一个很重要的字段,它的作用是定义 Pod 亲和性或反亲和性规则的作用范围,即在什么级别的拓扑结构中应用这些规则。这里指定的节点标签的key</p>
</blockquote>
<p>查看Pod的调度,发现Pod都在node02节点上</p>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-affinity-7f74fbb7c-2r9lm 1/1 Running 0 76s 100.95.185.249 node02 <none> <none>
nginx-affinity-7f74fbb7c-55bfw 1/1 Running 0 76s 100.95.185.241 node02 <none> <none>
nginx-affinity-7f74fbb7c-9h5qq 1/1 Running 0 76s 100.95.185.248 node02 <none> <none>
nginx-affinity-7f74fbb7c-bq9m9 1/1 Running 0 76s 100.95.185.245 node02 <none> <none>
nginx-affinity-7f74fbb7c-cdxpg 1/1 Running 0 76s 100.95.185.240 node02 <none> <none>
nginx-affinity-7f74fbb7c-j7vbf 1/1 Running 0 76s 100.95.185.242 node02 <none> <none>
nginx-affinity-7f74fbb7c-rnqfs 1/1 Running 0 76s 100.95.185.243 node02 <none> <none>
nginx-affinity-7f74fbb7c-sscsx 1/1 Running 0 76s 100.95.185.244 node02 <none> <none>
nginx-affinity-7f74fbb7c-v7jf2 1/1 Running 0 76s 100.95.185.246 node02 <none> <none>
nginx-affinity-7f74fbb7c-w249m 1/1 Running 0 76s 100.95.185.247 node02 <none> <none>
</code></pre>
<h4 id="软性限制实战略生产环境中使用的不多">软性限制实战(略,生产环境中使用的不多)</h4>
<h3 id="pod反亲和性">Pod反亲和性</h3>
<p>Pod 反亲和性(podAntiAffinity)是 Kubernetes 中的一种调度策略,与 Pod 亲和性(podAffinity)相对。它用于控制 Pod 的调度位置,确保满足特定条件的 Pod 不会被调度到同一拓扑层级(如同一节点、同一可用区或同一区域)上。Pod 反亲和性主要用于实现高可用性和资源隔离等目标。</p>
<p>Pod 反亲和性也分为硬性限制和软性限制,通过<code>requiredDuringSchedulingIgnoredDuringExecution</code>(硬性)和 <code>preferredDuringSchedulingIgnoredDuringExecution</code>(软性)来指定。</p>
<h4 id="pod反亲和性作用">Pod反亲和性作用</h4>
<ul>
<li>高可用性:
<ul>
<li>通过将多个副本 Pod 分布到不同的故障域(如不同的节点或可用区),确保系统的容错能力。例如,将多个副本 Pod 调度到不同的节点或可用区,避免单点故障导致所有副本同时不可用。</li>
</ul>
</li>
<li>资源隔离:
<ul>
<li>通过将某些 Pod 分布到不同的节点或可用区,避免它们相互竞争资源。例如,将不同租户的 Pod 分布到不同的节点或可用区,实现资源隔离。</li>
</ul>
</li>
<li>性能优化:
<ul>
<li>通过将 Pod 分布到不同的节点或可用区,减少单个节点的负载压力,提高整体性能。</li>
</ul>
</li>
</ul>
<h4 id="硬限制实战">硬限制实战</h4>
<pre><code># 创建deploy
# cat affinity-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-affinity
spec:
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
affinity:
# Pod反亲和性
podAntiAffinity:
# 指定性匹配规则
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
# 这里指定Pod的标签,而不是节点的标签
matchExpressions:
- key: app
operator: In
values: ["nginx"]
# # 添加 topologyKey
topologyKey: "kubernetes.io/hostname"
containers:
- name: nginx
image: nginx
# kubectl apply -f affinity-deploy.yaml
deployment.apps/nginx-affinity created
</code></pre>
<p>查看一下Pod</p>
<blockquote>
<p>发现只有两个Pod处于Running状态,为什么呢?<br>
因为反亲和性是在每个拓扑域中调度一个Pod,当Pod的数量多余拓扑域时,那么剩余的Pod则无法完成调度,所以处于了Pending状态</p>
</blockquote>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-affinity-675c568f99-5brpf 1/1 Running 0 5s 100.117.144.184 node01 <none> <none>
nginx-affinity-675c568f99-68z9q 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-7vbqb 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-8tqps 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-96cs2 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-prpg4 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-qsp8b 1/1 Running 0 5s 100.95.185.250 node02 <none> <none>
nginx-affinity-675c568f99-tf9qv 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-x7k8p 0/1 Pending 0 5s <none> <none> <none> <none>
nginx-affinity-675c568f99-xbfn2 0/1 Pending 0 5s <none> <none> <none> <none>
</code></pre>
<h2 id="pod反亲和性和daemonset的区别">Pod反亲和性和DaemonSet的区别</h2>
<p>Pod反亲和性和DaemonSet感觉很类似,在不考虑污点的情况下,会在每一个节点上都会创建一个Pod,但是也有一些区别</p>
<blockquote>
<p>DaemonSet可以阅读这篇文章:K8s新手系列之DaemonSet资源</p>
</blockquote>
<table>
<thead>
<tr>
<th>特性</th>
<th>Pod 反亲和性</th>
<th>DaemonSet</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>定义</strong></td>
<td>调度策略,控制 Pod 的调度位置</td>
<td>控制器,确保每个节点上运行一个 Pod 的副本</td>
</tr>
<tr>
<td><strong>用途</strong></td>
<td>高可用性、资源隔离、性能优化</td>
<td>运行集群级别的守护进程,如日志收集、监控代理</td>
</tr>
<tr>
<td><strong>调度方式</strong></td>
<td>根据 <code>labelSelector</code> 和 <code>topologyKey</code> 调度 Pod</td>
<td>自动在每个节点上运行一个 Pod 的副本</td>
</tr>
<tr>
<td><strong>配置方式</strong></td>
<td>在 Pod 的 <code>spec.affinity.podAntiAffinity</code> 中配置</td>
<td>使用 DaemonSet 资源对象配置</td>
</tr>
<tr>
<td><strong>适用场景</strong></td>
<td>多副本应用,需要跨节点或可用区分布</td>
<td>集群级别的守护进程,每个节点都需要运行一个副本</td>
</tr>
<tr>
<td><strong>调度器角色</strong></td>
<td>调度器根据规则调度 Pod</td>
<td>DaemonSet 控制器自动管理 Pod 的生命周期</td>
</tr>
<tr>
<td><strong>Pod 数量</strong></td>
<td>根据副本数和调度规则动态调整</td>
<td>每个节点上运行一个 Pod,数量与节点数量一致</td>
</tr>
</tbody>
</table>
</div>
<div id="MySignature" role="contentinfo">
<p>本文来自博客园,作者:huangSir-devops,转载请注明原文链接:https://www.cnblogs.com/huangSir-devops/p/18859120,微信Vac6666666,欢迎交流</p><br><br>
来源:https://www.cnblogs.com/huangSir-devops/p/18859120
頁:
[1]