Controlling Kubernetes Pod Placement: Node Selector & Affinity – Overriding Default Scheduling (Part 3)

In the previous article we got to know taints and toleration which is a method to control pod placement. check it out from this link

Controlling Kubernetes Pod Placement: Taints & Toleration – Overriding Default Scheduling (Part 2)

In this intensive article we would go through two methods, simple one which is node selector and a powerful one which is Affinity

Node Selector

Node selector basically is a key/value pair assigned to the node so it is a label for the node to restrict pod placements on that node only for pods with the same key/value pairs ! , so we will implement two steps on node and on the pod

Step 1: label the node

Lets play around the GPU example, we have specific node for machine learning jobs workloads which require a high performance GPU, so we label the node either using imperative way as below

kubectl label nodes node-1 purpose=high-performance resource=gpu

Or using declarative way through Node definition file as below

apiVersion: v1
kind: Node
metadata:
  name: node-1
  labels:
    purpose: high-performance
    resource: gpu

then apply Node definition file

kubectl apply -f node-label.yaml

Step 2: apply node selector on pod side

Now, create a deployment for your ML application that uses the nodeSelector section to schedule the pods only on nodes with the purpose=high-performance and resource=gpu labels.

use the declarative deployment YAML file

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: ml-app
  template:
    metadata:
      labels:
        app: ml-app
    spec:
      nodeSelector:
        purpose: high-performance
        resource: gpu
      containers:
      - name: ml-container
        image: ml-app-image:latest
        ports:
        - containerPort: 80

then save and apply the deployment

kubectl apply -f ml-deployment.yaml

You can do the same using imperative way as below and generate the output of the command which is the deployment file in a yaml format using -o yaml then pass then set the nodeSelector section using kubectl set.

personally I don’t prefer that sophisticated way 😀

kubectl create deployment ml-app --image=ml-app-image:latest --replicas=2 \
  --dry-run=client -o yaml | kubectl set spec.template.spec.nodeSelector \
  purpose=high-performance resource=gpu --local -o yaml | kubectl apply -f -

The command does the following:

  1. Creates a YAML definition for the deployment with the --dry-run=client -o yaml flags.
  2. Sets the nodeSelector fields to target nodes with specific labels.
  3. Applies the configuration.

Some cons of using Node Selector

  • Simple Matching Only: nodeSelector only supports exact key-value matching. If you need more nuanced scheduling, such as matching a range of values or adding OR/AND logic, nodeSelector won’t be enough.
  • Limited Control: nodeSelector doesn’t allow for preferences (exmple: “prefer nodes with label X but allow others”). It’s all-or-nothing, meaning that if no nodes match, the pod remains unscheduled !
  • No Rebalancing: Kubernetes won’t rebalance pods if resources become constrained or node labels change. You need to delete and recreate the pods for them to be rescheduled according to updated nodeSelector constraints.
  • Overloading Specific Nodes: If too many workloads require the same nodeSelector labels, it can lead to resource pressure on a few nodes while others remain underutilized!
  • Not for large clusters: In large clusters with varied node capabilities (e.g., GPU, memory, CPU), using nodeSelector alone may limit the flexibility of the scheduler.

So lets demonstrate more effective way which is Node Affinity. With node Affinity you have more control and it usually combined with taints and tolerations.

Affinity

we have hard requirements that can be achieved using node selector (pods will only run on matching nodes) but we have also soft preferences (pods prefer certain nodes but can run elsewhere if needed).

Affinity in Kubernetes is a set of rules that helps control pod placement in a cluster by defining preferences or requirements for specific nodes or proximity to other pods.

ffinity is mainly used to:

  1. Node Affinity: Specify on which nodes a pod should (or should preferably) run, based on node labels
  2. Pod Affinity: Co-locate pods close to other pods, useful for applications that benefit from running together, like microservices with high network communication.
  3. Pod Anti-Affinity: Separate pods from certain other pods, ideal for increasing availability by spreading pods across multiple nodes to avoid single points of failure.

Node Affinity

Node Affinity applied on pods with more flexibility for control placement on specific labeled nodes, Let’s say you have nodes with a label node-type=high-memory for high-memory workloads. You want to ensure that memory-intensive applications only run on these nodes.

Lets apply the label on the node

kubectl label nodes node-1 node-type=high-memory

To do that we added an affinity section in pod definition file ” or deployment file” as below

affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: node-type
                operator: In
                values:
                - high-memory

Explanation

  • requiredDuringSchedulingIgnoredDuringExecution specifies that the pod must be scheduled on nodes with the label node-type=high-memory.
  • If no nodes have this label, the pod will remain unscheduled.

but what if we would prefer placement for that high performance workloads to run but it is not mandatory, so the affinty section will be like that

affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: performance
                operator: In
                values:
                - high

Explanation:

  • preferredDuringSchedulingIgnoredDuringExecution adds a preference (soft constraint) for nodes labeled performance=high.
  • If a node with this label is available, Kubernetes will try to schedule the pod there. If not, it will place the pod on any available node.

A different scenario , we would combine hard and soft rules together, you want pods to be scheduled only on nodes with region=us-west, but you prefer nodes that also have performance=high for better efficiency.

Here are the combination and location of affinity in Deployment file ” pod definition section “

apiVersion: apps/v1
kind: Deployment
metadata:
  name: combined-affinity-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: combined-app
  template:
    metadata:
      labels:
        app: combined-app
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: region
                operator: In
                values:
                - us-west
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            preference:
              matchExpressions:
              - key: performance
                operator: In
                values:
                - high
      containers:
      - name: combined-container
        image: combined-app-image:latest
Node Affinity Section possible fields – reference

I copied all the possible fields under nodeAffinity section from k8s documentation to be a one article reference for that topic

  • requiredDuringSchedulingIgnoredDuringExecution: this is a hard constraint. The pod will only be scheduled on nodes that match the defined conditions. It will not be placed on nodes without matching labels. sub fields in that section as below:
    • nodeSelectorTerms: List of terms that each contain matchExpressions
      • matchExpressions: Array that specifies matching rules for node labels
        • key: Label key to match on (e.g., region).
        • operator: Specifies how the key/value relationship should be evaluated. Possible values:
          • In: Matches nodes with specified values for the key.
          • NotIn: Excludes nodes with specified values for the key.
          • Exists: Matches nodes with the specified key, regardless of value.
          • DoesNotExist: Matches nodes that don’t have the specified key.
          • Gt: Matches nodes where the label value is greater than a specified integer.
          • Lt: Matches nodes where the label value is less than a specified integer.
        • values: List of values that apply when operator is In or NotIn.
  • preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. Kubernetes will try to schedule the pod on nodes that match these preferences, but if not available, the pod can still be scheduled on other nodes. sub fields:
    • weight: Integer from 1–100. Higher weight indicates a stronger preference.
    • preference: Contains matchExpressions, which define the label matching rules, similar to those in requiredDuringSchedulingIgnoredDuringExecution.

Pod Affinity

Pod affinity is used to schedule pods close to other pods with matching labels, often for efficiency in communication like microservices pods which have to be placed into specific node or place all application pods into nodes which existing in the same geographical area.

In this example, we want to deploy a web application (web-app) that should be scheduled on the same node as a backend service (backend-service) for optimal performance , so look in podAffinity section where we define our constraints

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: backend-service
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-container
        image: nginx
        ports:
        - containerPort: 80

Explanation

  • podAffinity: Defines the affinity rules for the pod.
    • requiredDuringSchedulingIgnoredDuringExecution: Specifies a hard rule, meaning the pod must be scheduled according to this rule.
    • labelSelector: Looks for pods with the label app: backend-service.
    • topologyKey: Specifies the domain for colocation, set to kubernetes.io/hostname, meaning pods will be scheduled on the same node.
Pod Affinity Section possible fields – reference

I copied all possible fields from K8s documentation to be a one article reference

  • requiredDuringSchedulingIgnoredDuringExecution: his is a hard requirement. Pods will only be scheduled on nodes where the specified affinity conditions are met. sub fields are:
    • labelSelector: Defines labels that must match on the other pod(s) for the affinity rule to apply.
      • matchLabels: Specifies key-value pairs of labels that must match exactly.
      • matchExpressions: Defines more complex matching rules, similar to those in nodeSelectorTerms.
    • topologyKey: The Kubernetes label (e.g., kubernetes.io/hostname or failure-domain.beta.kubernetes.io/zone) that specifies the domain within which the affinity rule should be applied.
  • preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. The scheduler tries to place pods close to other pods with matching labels, but if no suitable nodes are found, the pod can still be scheduled elsewhere. sub fields are:
    • weight: Integer from 1–100 indicating the strength of preference.
    • podAffinityTerm: Contains labelSelector and topologyKey as described above.

Pod Anti-Affinity

Pod anti-affinity allows you to avoid scheduling pods close to certain other pods, so it is the opposite of pod affinity !

It is useful for high availability and avoiding single points of failure by ensure distribution of pods across multiple nodes !

Suppose you have a web application with multiple replicas and want to avoid having more than one replica on the same node to increase fault tolerance, so take a look into nodeAntiAffinty section in below deployment file where we define that constraints

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: web-app
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-container
        image: nginx
        ports:
        - containerPort: 80

Explanation

  • podAntiAffinity: Specifies anti-affinity rules for the pod.
    • requiredDuringSchedulingIgnoredDuringExecution: This is a hard rule, so Kubernetes will only schedule a pod if it can satisfy this constraint.
    • labelSelector: Looks for pods with the label app: web-app.
    • topologyKey: Defines the scope for anti-affinity, here set to kubernetes.io/hostname, meaning each replica will be scheduled on a different node (host).
Pod Anti Affinity Section possible fields – reference
  • requiredDuringSchedulingIgnoredDuringExecution:This is a hard requirement. Pods will only be scheduled on nodes that do not have other pods matching specified labels. sub fields are:
    • labelSelector: Specifies labels of the pods that should not be colocated with this pod.
    • topologyKey: Specifies the domain within which the anti-affinity rule is applied.
  • preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. Kubernetes will try to avoid placing the pod on nodes with certain other pods, but will allow it if there’s no better option.
    • weight: Integer from 1–100 indicating the preference strength.
    • podAffinityTerm: Contains labelSelector and topologyKey for defining matching conditions.

Use Taints & Toleration with Affinity

Now you can leverages the power of both methods the taints & tolerations and affinity by specifying node constraints (through taints and tolerations) and preferences or requirements (through affinity)

Lets have an example : Deploying a Resource-Intensive App to Dedicated High-Memory Nodes. Let’s say you have nodes with a high memory capacity specifically for heavy applications. You want to:

  • Taint those nodes to ignore general pods.
  • Tolerate the taint for your high-memory application’s pods, allowing them to be scheduled on those nodes.
  • Use node affinity to prefer or require that these pods be scheduled on nodes labeled with memory: high.

Steps to apply example Deployment with Taints, Tolerations, and Node Affinity

Step 1: add Taint to Nodes (for High-Memory Nodes)

On the desired high-memory nodes, apply this taint so that only pods with a matching toleration can be scheduled on them.

kubectl taint nodes high-mem-node key=high-memory:NoSchedule

Step 2: Create a Deployment with Tolerations and Node Affinity

This deployment file specifies that only nodes with the memory: high label and matching toleration can host these pods.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: high-mem-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: high-mem-app
  template:
    metadata:
      labels:
        app: high-mem-app
    spec:
      tolerations:
        - key: "high-memory"
          operator: "Exists"
          effect: "NoSchedule"
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: "memory"
                    operator: "In"
                    values:
                      - "high"
      containers:
        - name: high-mem-container
          image: nginx
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
          ports:
            - containerPort: 80

Explanation

  • Tolerations: The toleration in this spec (key: high-memory) allows the pods to be scheduled on nodes with the high-memory:NoSchedule taint.
  • Node Affinity:
    • requiredDuringSchedulingIgnoredDuringExecution: specifies a hard constraint that the pod can only be scheduled on nodes labeled memory: high, ensuring that the deployment will prioritize high-memory nodes.

Conclusion

Combining taints and tolerations with node affinity is a powerful way to enforce scheduling rules in Kubernetes. Taints and tolerations create boundaries, and node affinity ensures that workloads are placed on optimal nodes within those boundaries.

Despite that powerful combination still there are cons of doing that !

  • Complexity in Scheduling: Managing taints, tolerations, and affinities can complicate your scheduling policies, making it harder to debug scheduling issues or to quickly scale and deploy new workloads.
  • Inflexibility: Hard constraints (such as required node affinity) limit scheduling options, which can lead to scheduling failures if not enough suitable nodes are available.

Next

Now we have 3 methods to use and perhaps combine together, next topic will be simply demonstrate resource requirements and limits

  1. Controlling Kubernetes Pod Placement: Labels and Selectors – Overriding Default Scheduling (Part 1)
  2. Controlling Kubernetes Pod Placement: Taints & Toleration – Overriding Default Scheduling (Part 2)
  3. Controlling Kubernetes Pod Placement: Node Selector & Affinity – Overriding Default Scheduling (Part 3)
  4. Controlling Kubernetes Pod Placement: Requirements & Limits and Daemon Sets – Overriding Default Scheduling (Part 4)

Share to

Latest Topic

Authors

Arda Cetinkaya

Wael Abdullah

Islam Ibrahim

Sasha Zezulinsky

Essam Ammar

Moemen Elzeiny

Wageeh Mankaryos

Blog stats

Loading

Follow SwedQ