K8s中的污点和容忍
<h2 id="概述">概述</h2><p>官方文档:https://kubernetes.io/zh-cn/docs/concepts/scheduling-eviction/taint-and-toleration/</p>
<p>污点是作用在k8s集群节点上的(包括worker和master),Node被设置上污点之后就和Pod之间存在了一种相斥的关系,进而拒绝Pod调度进来,甚至可以将已经存在的Pod驱逐出去。</p>
<p>污点类似于Label标签,但是污点是在节点上的。定义的语法结构也有点类似,但也存在一定区别<br>
学习Label标签可以阅读这篇文章:K8s新手系列之Label标签和Label选择器</p>
<h2 id="污点的组成结构">污点的组成结构</h2>
<p>一个污点由以下三部分组成:</p>
<pre><code>key=value:effect
</code></pre>
<ul>
<li>key:污点的键(自定义,如 node-type)。</li>
<li>value:污点的值(可选,如 special)。</li>
<li>effect:污点的效果,决定 Pod 如何被影响,可选值:
<ul>
<li>PreferNoSchedule:尽量避免 Pod 调度到该节点(非强制,调度器会尝试寻找其他节点,但若没有合适节点仍会调度)。</li>
<li>NoSchedule:禁止 Pod 调度到该节点(除非 Pod 有对应的容忍)。</li>
<li>NoExecute:不仅禁止调度,还会驱逐已存在的不满足容忍的 Pod(适用于节点维护、故障处理等场景)。</li>
</ul>
</li>
</ul>
<p><img src="https://img2024.cnblogs.com/blog/3468887/202505/3468887-20250522200722794-1119990970.png" alt="image" loading="lazy"></p>
<h2 id="污点的作用">污点的作用</h2>
<p>污点(Taint) 是一种节点级别的属性,用于 阻止特定 Pod 调度到节点,或使节点对 Pod 具有 “排斥性”。它通常与 容忍度(Toleration) 配合使用,实现更精细的资源调度策略。以下是污点的核心作用、应用场景和工作机制</p>
<h3 id="隔离节点">隔离节点</h3>
<p>将节点标记为特定用途(如专用节点、性能节点),阻止普通 Pod 调度到该节点,确保关键业务独占资源。</p>
<p>例如:将 GPU 节点、高内存节点标记为污点,仅允许特定业务的 Pod(如机器学习任务、数据库)通过容忍度调度至此。</p>
<h3 id="驱逐非预期-pod">驱逐非预期 Pod</h3>
<p>通过 NoExecute 类型的污点,可强制驱逐节点上 不匹配容忍度的现有 Pod,常用于节点维护、升级或故障处理。</p>
<p>例如:节点需要重启时,添加 NoExecute 污点,驱逐所有不兼容的 Pod 到其他节点。</p>
<h3 id="实现分层调度策略">实现分层调度策略</h3>
<p>结合容忍度,实现 “节点分组 + Pod 定向调度” 的分层管理,避免资源混用导致的性能干扰或安全问题。</p>
<p>例如:区分开发、测试、生产环境节点,通过污点限制不同环境的 Pod 只能调度到对应节点。</p>
<h2 id="k8s集群中默认存在的污点">K8s集群中默认存在的污点</h2>
<p>在 Kubernetes(K8s)集群中,控制平面节点(如 Master 节点)通常会自动添加默认污点,以避免普通业务 Pod 调度到这些节点,确保控制平面组件(如 API Server、Scheduler、Controller Manager 等)的资源不受干扰。</p>
<ul>
<li>
<p>node-role.kubernetes.io/control-plane:NoSchedule</p>
<ul>
<li>
<p>该污点阻止普通 Pod 调度到控制平面节点,但允许 K8s 系统组件(如 kube-apiserver、etcd)的 Pod 运行。</p>
</li>
<li>
<p>因为控制平面节点需专注于处理集群管理任务,普通业务 Pod(如 Web 服务、数据库)不应占用其资源。</p>
</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/not-ready:NoExecute</p>
<ul>
<li>该污点是一个动态污点,由 K8s 自动管理的临时污点,当节点处于 NotReady 状态(如网络故障、节点失联)时自动添加,用于驱逐该节点上的 Pod。当节点恢复正常后,该污点会自动移除。</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/unreachable:NoExecute</p>
<ul>
<li>该污点是一个动态污点,当节点与控制平面(API Server)失联(如网络分区、节点故障),且超过 pod-eviction-timeout(默认 5 分钟)时,节点会被标记为 Unreachable,并自动添加此污点。</li>
<li>当节点恢复通信,污点会自动移除,已调度到其他节点的 Pod 不会回迁。</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/out-of-disk:NoExecute</p>
<ul>
<li>
<p>该污点是一个动态污点,节点磁盘使用率超过阈值(如 kubelet 参数 --eviction-hard 配置的memory.available<100Mi、nodefs.available<10%)</p>
</li>
<li>
<p>主要作用是:驱逐 Pod 以释放磁盘空间,优先驱逐消耗磁盘资源较多的 Pod(如日志、临时文件)。</p>
</li>
<li>
<p>此污点可能导致关键系统 Pod(如 kube-proxy)被驱逐,需合理配置 tolerations。</p>
</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/memory-pressure:NoExecute</p>
<ul>
<li>
<p>该污点是一个动态污点,节点内存压力过大(如可用内存低于阈值)会触发。</p>
</li>
<li>
<p>触发 Pod 驱逐,优先驱逐资源请求高、QoS 等级低的 Pod(如 BestEffort 类型)。</p>
</li>
<li>
<p>可通过 kubelet 参数 --eviction-hard 调整内存压力阈值(如 memory.available<5%)。</p>
</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/disk-pressure:NoExecute</p>
<ul>
<li>
<p>该污点是一个动态污点,节点磁盘压力过大(如根分区或容器运行时分区空间不足)会触发</p>
</li>
<li>
<p>与 out-of-disk 类似,但在磁盘空间接近耗尽(尚未完全耗尽)时触发,用于预防磁盘溢出。</p>
</li>
</ul>
</li>
<li>
<p>node.kubernetes.io/pid-pressure:NoExecute</p>
<ul>
<li>
<p>该污点是一个动态污点,节点进程 ID(PID)资源不足(如系统创建新进程的能力受限)会触发</p>
</li>
<li>
<p>驱逐 Pod 以释放 PID 资源,避免系统因 PID 耗尽而崩溃。</p>
</li>
</ul>
</li>
</ul>
<h2 id="污点的管理">污点的管理</h2>
<h3 id="查看污点">查看污点</h3>
<p>语法:</p>
<pre><code># 查看所有节点的污点,grep查看多行
kubectl describe node | grep -C <int-num> Taints
# 查看指定节点的污点,grep查看多行
kubectl describe node <node-name> | grep -C <int-num>Taints
</code></pre>
<p>示例:</p>
<pre><code># 查看主节点的污点
# kubectl describe node master | grep Taints
Taints: node-role.kubernetes.io/control-plane:NoSchedule
</code></pre>
<h3 id="添加污点">添加污点</h3>
<p>语法:</p>
<pre><code># 其中=value可以省略,相当于添加一个不带value的污点
kubectl taint node <node-name> key<=value>:effect
</code></pre>
<p>示例:</p>
<pre><code># 给master节点添加一个带value的污点
# kubectl taint node master name=huangsir:PreferNoSchedule
node/master tainted
# 给master节点添加一个不带value的污点
# kubectl taint node master app:PreferNoSchedule
node/master tainted
# 查看污点
#kubectl describe node master | grep -C 2 Taints
Taints: node-role.kubernetes.io/control-plane:NoSchedule #master节点自带的污点
app:PreferNoSchedule # 添加的不带value的污点
name=huangsir:PreferNoSchedule # 添加的带value的污点
</code></pre>
<h3 id="删除污点">删除污点</h3>
<p>语法:</p>
<pre><code>kubectl taint nodes <node-name> <key><=value>:<effect>-
</code></pre>
<p>示例:</p>
<pre><code># 删除app:PreferNoSchedule的污点
# kubectl taint nodes master01 app:PreferNoSchedule-
node/master01 untainted
# 验证,发现已经删除
# kubectl describe node master01 | grep -C 2 -i taint
CreationTimestamp:Sat, 26 Apr 2025 14:02:33 +0800
Taints: node-role.kubernetes.io/control-plane:NoSchedule
name=huangsir:PreferNoSchedule
</code></pre>
<h3 id="修改污点">修改污点</h3>
<p>修改污点实际上是先删除原有的污点,再添加新的污点。</p>
<p>示例:将<code>name=huangsir:PreferNoSchedule</code>修改为<code>name=zhangsan:NoSchedule</code></p>
<pre><code>#先删除name=huangsir:PreferNoSchedule
# kubectl taint nodes master01 name=huangsir:PreferNoSchedule-
node/master01 untainted
# 再添加name=zhangsan:NoSchedule
# kubectl taint nodes master01 name=zhangsan:NoSchedule
node/master01 tainted
# 查看
# kubectl describe node master01 | grep -C 2 Taint
Taints: name=zhangsan:NoSchedule # 修改的污点
node-role.kubernetes.io/control-plane:NoSchedule
</code></pre>
<h2 id="验证三个污点类型的调度">验证三个污点类型的调度</h2>
<h3 id="prefernoschedule">PreferNoSchedule</h3>
<p><code>PreferNoSchedule</code>是限制Pod调度最弱的一个类型,会尽量避免 Pod 调度到该节点(非强制,调度器会尝试寻找其他节点,但若没有合适节点仍会调度)。</p>
<p>示例:给node01节点创建一个<code>PreferNoSchedule</code>类型的污点。</p>
<pre><code># kubectl taint node node01 name=zhangsan:PreferNoSchedule
node/node01 tainted
# kubectl describe node node01 | grep Taints
Taints: name=zhangsan:PreferNoSchedule
</code></pre>
<p>现在node01节点上存在一个污点,是<code>name=zhangsan:PreferNoSchedule</code>,我们指定Pod调度到该节点上会发生什么呢?</p>
<p>我们使用deploy创建10个Pod看看会发生什么?</p>
<pre><code># 创建deploy,创建10个Pod
# cat deploy-pod.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
# 创建10个副本
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
# 创建deploy
# kubectl apply -f deploy-pod.yaml
deployment.apps/nginx-deployment created
</code></pre>
<p>查看一下Pod的调度状态,发现全部调度到node02节点上了</p>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-654975c8cd-2kzzs 1/1 Running 0 108s 100.95.185.204 node02 <none> <none>
nginx-deployment-654975c8cd-5jbvk 1/1 Running 0 108s 100.95.185.203 node02 <none> <none>
nginx-deployment-654975c8cd-6g2jd 1/1 Running 0 108s 100.95.185.206 node02 <none> <none>
nginx-deployment-654975c8cd-8pfb7 1/1 Running 0 108s 100.95.185.209 node02 <none> <none>
nginx-deployment-654975c8cd-c7s6m 1/1 Running 0 108s 100.95.185.208 node02 <none> <none>
nginx-deployment-654975c8cd-dphzf 1/1 Running 0 108s 100.95.185.211 node02 <none> <none>
nginx-deployment-654975c8cd-kvllb 1/1 Running 0 108s 100.95.185.205 node02 <none> <none>
nginx-deployment-654975c8cd-mbdhc 1/1 Running 0 108s 100.95.185.210 node02 <none> <none>
nginx-deployment-654975c8cd-mnfkz 1/1 Running 0 108s 100.95.185.207 node02 <none> <none>
nginx-deployment-654975c8cd-psbtk 1/1 Running 0 108s 100.95.185.212 node02 <none> <none>
</code></pre>
<p>我们接着给node02节点上添加一个<code>NoSchedule</code>类型的污点</p>
<pre><code># kubectl taint node node02 app:NoSchedule
node/node01 tainted
# kubectl describe node node02 | grep Taints
Taints: name=zhangsan:NoSchedule
</code></pre>
<p>我们继续使用deploy创建10个Pod,看看会发生什么?</p>
<pre><code># 创建deploy,创建10个Pod
# cat deploy-noschedule.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-noschedule
labels:
app: nginx
spec:
# 创建10个副本
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
# kubectl apply -f deploy-noschedule.yaml
deployment.apps/nginx-noschedule created
</code></pre>
<p>查看一下pod的调度状态,发现Pod又都调度到node01节点上了</p>
<pre><code># kubectl get po-o wide | grep nginx-noschedule
nginx-noschedule-654975c8cd-27529 1/1 Running 0 63s 100.117.144.143 node01 <none> <none>
nginx-noschedule-654975c8cd-472wh 1/1 Running 0 63s 100.117.144.144 node01 <none> <none>
nginx-noschedule-654975c8cd-56wbp 1/1 Running 0 63s 100.117.144.142 node01 <none> <none>
nginx-noschedule-654975c8cd-5vwvx 1/1 Running 0 63s 100.117.144.138 node01 <none> <none>
nginx-noschedule-654975c8cd-99ld7 1/1 Running 0 63s 100.117.144.146 node01 <none> <none>
nginx-noschedule-654975c8cd-brjlh 1/1 Running 0 63s 100.117.144.145 node01 <none> <none>
nginx-noschedule-654975c8cd-fkzwr 1/1 Running 0 63s 100.117.144.147 node01 <none> <none>
nginx-noschedule-654975c8cd-hmqkg 1/1 Running 0 63s 100.117.144.141 node01 <none> <none>
nginx-noschedule-654975c8cd-sxx2h 1/1 Running 0 63s 100.117.144.140 node01 <none> <none>
nginx-noschedule-654975c8cd-xbgkc 1/1 Running 0 63s 100.117.144.139 node01 <none> <none>
</code></pre>
<h3 id="noschedule">NoSchedule</h3>
<p><code>NoSchedule</code>会禁止 Pod 调度到该节点(除非 Pod 有对应的容忍),但是不会影响当前节点已经存在Pod的状态</p>
<p>示例:我们给node节点上添加一个<code>NoSchedule</code>类型的污点:</p>
<pre><code># kubectl taint node node01 app:NoSchedule
node/node01 tainted
# kubectl taint node node02 app:NoSchedule
node/node01 tainted
# 查看污点,所有K8s节点上都存在了NoSchedule类型的污点
# kubectl describe node | grep Taint
Taints: node-role.kubernetes.io/control-plane:NoSchedule
Taints: app:NoSchedule
Taints: app:NoSchedule
</code></pre>
<p>创建deploy</p>
<pre><code># 定义资源文件
# cat test-schedule.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-noschedule
labels:
app: nginx
spec:
# 创建10个副本
replicas: 10
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
# 创建deploy
# kubectl apply -f test-schedule.yaml
deployment.apps/test-noschedule created
</code></pre>
<p>查看pod的调度状态,发现pod全部处于pending状态</p>
<pre><code># kubectl get po -o wide | grep test
test-noschedule-654975c8cd-79hzm 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-7hsr7 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-8zf82 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-bk6fh 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-fq7hk 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-htf66 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-n7bsk 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-nv5vh 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-rq9th 0/1 Pending 0 79s <none> <none> <none> <none>
test-noschedule-654975c8cd-wkg7d 0/1 Pending 0 79s <none> <none> <none> <none>
</code></pre>
<p>查看一下详细信息,发现是因为污点的原因</p>
<pre><code># kubectl describe po test-noschedule-654975c8cd-79hzm
Name: test-noschedule-654975c8cd-79hzm
##...省略万字内容
Events:
Type Reason Age From Message
---- ------ -------- -------
WarningFailedScheduling2m7sdefault-scheduler0/3 nodes are available: 1 node(s) had untolerated taint {node-role.kubernetes.io/control-plane: }, 2 node(s) had untolerated taint {app: }. preemption: 0/3 nodes are available: 3 Preemption is not helpful for scheduling..
</code></pre>
<h3 id="noexecute">NoExecute</h3>
<p><code>NoExecute</code>类型的污点,不仅禁止调度,还会驱逐已存在的不满足容忍的 Pod(适用于节点维护、故障处理等场景)。</p>
<h4 id="准备一下测试环境">准备一下测试环境</h4>
<p>将上面案例中的关于node01节点中的污点及Pod进行删除,</p>
<pre><code># 删除污点
# kubectl taint node node01 app:NoSchedule-
node/node01 untainted
# kubectl taint node node01 name=zhangsan:PreferNoSchedule-
node/node01 untainted
# kubectl describe node node01 | grep Taint
Taints: <none>
# 删除deploy
# kubectl delete deploy nginx-noschedule
deployment.apps "nginx-noschedule" deleted
# kubectl delete deploy test-noschedule
deployment.apps "test-noschedule" deleted
# 查看node02节点上的pod
# kubectl get po -o wide | grep node02
nginx-deployment-654975c8cd-5j2kn 1/1 Running 0 28m 100.95.185.220 node02 <none> <none>
nginx-deployment-654975c8cd-b44mb 1/1 Running 0 28m 100.95.185.218 node02 <none> <none>
nginx-deployment-654975c8cd-b9pg7 1/1 Running 0 28m 100.95.185.214 node02 <none> <none>
nginx-deployment-654975c8cd-dlwqc 1/1 Running 0 28m 100.95.185.213 node02 <none> <none>
nginx-deployment-654975c8cd-dvkhh 1/1 Running 0 28m 100.95.185.219 node02 <none> <none>
nginx-deployment-654975c8cd-kxlpc 1/1 Running 0 28m 100.95.185.215 node02 <none> <none>
nginx-deployment-654975c8cd-nv99z 1/1 Running 0 28m 100.95.185.221 node02 <none> <none>
nginx-deployment-654975c8cd-p79bz 1/1 Running 0 28m 100.95.185.216 node02 <none> <none>
nginx-deployment-654975c8cd-p84cj 1/1 Running 0 28m 100.95.185.217 node02 <none> <none>
nginx-deployment-654975c8cd-q4ll4 1/1 Running 0 28m 100.95.185.222 node02 <none> <none>
</code></pre>
<h4 id="验证禁止调度该步骤省略">验证禁止调度(该步骤省略)</h4>
<p>该步骤和<code>NoSchedule</code>类型一致,在这里省略了</p>
<h4 id="验证驱逐pod">验证驱逐Pod</h4>
<p>给node02节点上添加一个<code>NoExecute</code>类型的污点</p>
<pre><code># 添加污点
# kubectl taint node node02 app:NoExecute
node/node02 tainted
# 查看
# kubectl describe node node02 | grep -C 2 Taint
Taints: app:NoExecute
app:NoSchedule
</code></pre>
<p>查看一下pod,发现pod全部调度到node01节点上了</p>
<pre><code># kubectl get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-deployment-654975c8cd-29kbl 1/1 Running 0 73s 100.117.144.160 node01 <none> <none>
nginx-deployment-654975c8cd-55fmq 1/1 Running 0 72s 100.117.144.163 node01 <none> <none>
nginx-deployment-654975c8cd-7rq6t 1/1 Running 0 72s 100.117.144.166 node01 <none> <none>
nginx-deployment-654975c8cd-cb8hl 1/1 Running 0 73s 100.117.144.161 node01 <none> <none>
nginx-deployment-654975c8cd-fzblg 1/1 Running 0 73s 100.117.144.159 node01 <none> <none>
nginx-deployment-654975c8cd-m6mxw 1/1 Running 0 71s 100.117.144.167 node01 <none> <none>
nginx-deployment-654975c8cd-mw8st 1/1 Running 0 71s 100.117.144.169 node01 <none> <none>
nginx-deployment-654975c8cd-p2kcc 1/1 Running 0 71s 100.117.144.168 node01 <none> <none>
nginx-deployment-654975c8cd-x7k8p 1/1 Running 0 73s 100.117.144.158 node01 <none> <none>
nginx-deployment-654975c8cd-xx7t7 1/1 Running 0 73s 100.117.144.162 node01 <none> <none>
</code></pre>
<h2 id="污点容忍">污点容忍</h2>
<p>我们想让Pod调度到存在污点的节点上,<strong>我们可以使用 <code>spec.tolerations</code> 字段配置污点容忍</strong></p>
<p>tolerations解析:</p>
<pre><code>tolerations:
- key: "env" # 匹配污点的key(必须存在)
operator: "Equal"# 匹配方式(Equal表示值需相等,Exists表示无需值)
value: "prod" # 匹配污点的value(仅Equal时需要)
effect: "NoSchedule"# 匹配污点的effect(可选,不指定则匹配所有effect)
tolerationSeconds # 容忍时间, 当effect为NoExecute时生效,表示pod在Node上的停留时间
</code></pre>
<h3 id="污点容忍规则">污点容忍规则</h3>
<p>污点容忍需要匹配节点的所有污点,节点的污点与 Pod 的容忍度是 多对多匹配关系:</p>
<ul>
<li>若节点有 多个污点(如 taint1、taint2),Pod 必须配置 所有对应污点的容忍度,才能调度到该节点。</li>
<li>若 Pod 仅容忍其中部分污点,则无法调度(除非节点的某些污点未设置 effect 或 effect 为 NoExecute 且 Pod 满足特殊条件)。</li>
</ul>
<h3 id="污点容忍匹配规则">污点容忍匹配规则:</h3>
<ul>
<li>key+operator+value+effect 全匹配:完全匹配污点。</li>
</ul>
<pre><code>tolerations:
- key: "maintenance"
operator: "Equal"
value: "true"
effect: "NoExecute"
</code></pre>
<ul>
<li>key+operator=Exists:匹配所有带有该 key 的污点(无论 value 和 effect)。</li>
</ul>
<pre><code>tolerations:
- key: "maintenance"
operator: "Exists"
</code></pre>
<ul>
<li>不指定 key 和 operator:匹配所有污点(慎用,相当于绕过所有污点限制)。</li>
</ul>
<pre><code>tolerations:
- effect: "NoSchedule"
operator: "Exists"# 容忍所有 NoSchedule 类型的污点
</code></pre>
<p>示例:仅容忍节点上存在 <code>app=web:NoSchedule</code>的污点</p>
<pre><code>apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx
tolerations:
- key: "app"
operator: "Equal"
value: "web"
effect: "NoSchedule"
</code></pre>
<p>示例:显示匹配多个污点</p>
<pre><code>apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx
tolerations:
- key: "app"
operator: "Equal"
value: "web"
effect: "NoSchedule"
- key: "env"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
</code></pre>
</div>
<div id="MySignature" role="contentinfo">
<p>本文来自博客园,作者:huangSir-devops,转载请注明原文链接:https://www.cnblogs.com/huangSir-devops/p/18891913,微信Vac6666666,欢迎交流</p><br><br>
来源:https://www.cnblogs.com/huangSir-devops/p/18891913
頁:
[1]