K8s Affinity Deep Dive Best Practices

2026/05/11 共 7214 字,约 21 分钟

K8s亲和性详解:NodeAffinity/PodAffinity/PodAntiAffinity生产最佳实践

情境与背景

在Kubernetes生产环境中,我们经常需要精细控制Pod的调度位置,例如:让Web应用尽量跟Cache在同一个可用区、让数据库主从分散到不同的节点、让计算密集型作业调度到特定资源的节点上。亲和性(Affinity)是K8s提供的强大调度工具,包括节点亲和性(NodeAffinity)和Pod间亲和性(PodAffinity/PodAntiAffinity)。

作为高级DevOps/SRE工程师,深入理解亲和性的工作原理、掌握各类亲和性的适用场景,是构建高可用、高性能集群的必备技能。

一、亲和性概述

1.1 三种亲和性类型

flowchart TB
    A["亲和性 Affinity"] --> B["节点亲和性 NodeAffinity"]
    A --> C["Pod亲和性 PodAffinity"]
    A --> D["Pod反亲和性 PodAntiAffinity"]

    B --> E["Pod→Node"]
    C --> F["Pod→Pod(在一起)"]
    D --> G["Pod→Pod(不在一起)"]

    H["硬亲和 required"] --> I["必须满足"]
    J["软亲和 preferred"] --> K["尽量满足"]
类型作用对象功能适用场景
NodeAffinityPod→Node根据节点标签筛选Pod调度位置特定节点资源调度(GPU/高内存节点)
PodAffinityPod→Pod让Pod尽量和某些Pod在同一拓扑域提升性能,减少网络延迟(Web+Cache同AZ)
PodAntiAffinityPod→Pod让Pod尽量不和某些Pod在同一拓扑域提升可用性,分散风险(多副本不集中)

1.2 亲和性强度

强度关键字调度行为风险权衡
硬亲和requiredDuringSchedulingIgnoredDuringExecution必须满足,否则Pending调度失败风险保证强但可能失败
软亲和preferredDuringSchedulingIgnoredDuringExecution尽量满足,不强制更灵活但保证弱灵活但不强求

二、节点亲和性 NodeAffinity

2.1 基本配置

apiVersion: v1
kind: Pod
metadata:
  name: node-affinity-pod
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/os
            operator: In
            values:
            - linux
          - key: topology.kubernetes.io/zone
            operator: In
            values:
            - cn-beijing-1
            - cn-beijing-2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
          - key: disktype
            operator: In
            values:
            - ssd
      - weight: 50
        preference:
          matchExpressions:
          - key: team
            operator: In
            values:
            - platform
  containers:
  - name: test
    image: nginx:1.21

2.2 操作符Operator

操作符说明
Inlabel value包含列表中的值
NotInlabel value不在列表中
Existslabel存在
DoesNotExistlabel不存在
Gtlabel value大于(数值类型)
Ltlabel value小于(数值类型)
# 组合使用多个操作符
requiredDuringSchedulingIgnoredDuringExecution:
  nodeSelectorTerms:
  - matchExpressions:
    - key: node-type
      operator: In
      values:
      - compute
    - key: kubernetes.io/os
      operator: In
      values:
      - linux
    - key: min-memory
      operator: Gt
      values:
      - "32"

三、Pod亲和性 PodAffinity

3.1 基本配置

apiVersion: v1
kind: Pod
metadata:
  name: web-pod
  labels:
    app: web
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: redis-cache
        topologyKey: topology.kubernetes.io/zone
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 80
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: monitoring
          topologyKey: kubernetes.io/hostname
  containers:
  - name: web
    image: nginx:1.21

3.2 Web+Cache架构示例

flowchart TB
    subgraph 可用区cn-beijing-1
        A["Web Pod"] --> B["Redis Cache Pod"]
    end
    subgraph 可用区cn-beijing-2
        C["Web Pod"] --> D["Redis Cache Pod"]
    end
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-frontend
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web-frontend
  template:
    metadata:
      labels:
        app: web-frontend
    spec:
      affinity:
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: redis-cache
              topologyKey: topology.kubernetes.io/zone
      containers:
      - name: web
        image: nginx:1.21
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"

四、Pod反亲和性 PodAntiAffinity

4.1 基本配置

apiVersion: v1
kind: Pod
metadata:
  name: db-pod
  labels:
    app: mysql
spec:
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchLabels:
            app: mysql
        topologyKey: kubernetes.io/hostname
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: mysql
          topologyKey: topology.kubernetes.io/zone
  containers:
  - name: mysql
    image: mysql:8.0

4.2 数据库高可用示例

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: mysql
            topologyKey: topology.kubernetes.io/zone
      containers:
      - name: mysql
        image: mysql:8.0
        volumeMounts:
        - name: data
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 100Gi

五、拓扑域 TopologyKey

5.1 常用拓扑域

TopologyKey说明隔离级别
kubernetes.io/hostname节点主机名节点级
topology.kubernetes.io/zone可用区AZ级
topology.kubernetes.io/region地域区域级
node.kubernetes.io/instance-type实例规格自定义

5.2 拓扑域与故障隔离

flowchart TB
    subgraph 拓扑域层级
        A["Region 地域"] --> B["Zone 可用区"]
        B --> C["Host 节点"]
    end

    D["故障风险"] --> E["节点故障"]
    D --> F["AZ故障"]
    D --> G["Region故障"]
隔离级别节点分散程度故障容忍能力
Hostname节点级分散单节点故障不影响
ZoneAZ级分散单AZ故障不影响
Region区域级分散最严格的灾备

六、生产环境最佳实践

6.1 组合使用策略

apiVersion: apps/v1
kind: Deployment
metadata:
  name: production-web
spec:
  replicas: 8
  selector:
    matchLabels:
      app: production-web
  template:
    metadata:
      labels:
        app: production-web
    spec:
      affinity:
        # 1. 节点亲和性:要求Linux节点,优先SSD节点
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: disktype
                operator: In
                values:
                - ssd
        # 2. Pod反亲和性:要求不同节点,尽量不同AZ
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: production-web
            topologyKey: kubernetes.io/hostname
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 80
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: production-web
              topologyKey: topology.kubernetes.io/zone
        # 3. Pod亲和性:尽量跟Cache同AZ
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: redis-cache
              topologyKey: topology.kubernetes.io/zone
      containers:
      - name: web
        image: nginx:1.21
        resources:
          requests:
            memory: "256Mi"
            cpu: "200m"
          limits:
            memory: "512Mi"
            cpu: "500m"

6.2 关键业务多AZ分散

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: critical-database
spec:
  serviceName: critical-db
  replicas: 3
  selector:
    matchLabels:
      app: critical-database
  template:
    metadata:
      labels:
        app: critical-database
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: critical-database
            topologyKey: topology.kubernetes.io/zone
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-type
                operator: In
                values:
                - database
      tolerations:
      - key: "dedicated"
        operator: "Equal"
        value: "database"
        effect: "NoSchedule"
      containers:
      - name: postgres
        image: postgres:13.4
        resources:
          requests:
            memory: "4Gi"
            cpu: "2"
          limits:
            memory: "8Gi"
            cpu: "4"
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 500Gi

6.3 监控告警

# Prometheus监控亲和性失败
groups:
- name: k8s_affinity_alerts
  rules:
  - alert: PodUnschedulableAffinity
    expr: kube_pod_status_phase{phase="Pending"} > 0
    for: 10m
    labels:
      severity: warning
    annotations:
      summary: "Pod  亲和性调度失败"

七、常见问题排查

7.1 Pod无法调度

# 1. 查看Pod状态和事件
kubectl get pods
kubectl describe pod <pod-name>

# 2. 查看节点标签
kubectl get nodes --show-labels

# 3. 检查相关Pod分布
kubectl get pods -l app=<label> -o wide

# 4. 查看亲和性配置
kubectl get pod <pod-name> -o yaml

7.2 亲和性优化

flowchart TB
    A["亲和性过严"] --> B["Pod无法调度"]
    B --> C["放宽约束"]
    C --> D["硬亲和改软亲和"]

    E["亲和性过松"] --> F["分散不够"]
    F --> G["增强约束"]
    G --> H["拓扑域Hostname改Zone"]

八、面试精简版

8.1 一分钟版本

K8s的亲和性(Affinity)包括三类:节点亲和性(NodeAffinity,根据节点标签调度Pod)、Pod亲和性(PodAffinity,让Pod尽量在一起,如Web和Cache同AZ)、Pod反亲和性(PodAntiAffinity,让Pod尽量不在一起,如多副本分散到不同节点)。亲和性强度分硬亲和(required,必须满足否则Pending)和软亲和(preferred,尽量满足)。生产环境常用Pod反亲和性分散多副本提升可用性,用Pod亲和性优化性能。

8.2 记忆口诀

亲和性分三种,节点亲和NodeAffinity,
Pod亲和PodAffinity,Pod反亲和AntiAffinity,
硬的必须要满足,软的尽量来配合,
拓扑域选好键,节点主机hostname,
可用区topology.kubernetes.io/zone,
多副本要分散,性能优先聚同地。

8.3 关键词速查

关键词说明
NodeAffinity节点亲和性
PodAffinityPod亲和性
PodAntiAffinityPod反亲和性
requiredDuringScheduling硬亲和,必须满足
preferredDuringScheduling软亲和,尽量满足
topologyKey拓扑域键

参考链接SRE运维面试题全解析:从理论到实践(第三部分)

文档信息

Search

    Table of Contents