In the previous article we got to know taints and toleration which is a method to control pod placement. check it out from this link
Controlling Kubernetes Pod Placement: Taints & Toleration – Overriding Default Scheduling (Part 2)
In this intensive article we would go through two methods, simple one which is node selector and a powerful one which is Affinity
Node Selector
Node selector basically is a key/value pair assigned to the node so it is a label for the node to restrict pod placements on that node only for pods with the same key/value pairs ! , so we will implement two steps on node and on the pod
Step 1: label the node
Lets play around the GPU example, we have specific node for machine learning jobs workloads which require a high performance GPU, so we label the node either using imperative way as below
kubectl label nodes node-1 purpose=high-performance resource=gpu
Or using declarative way through Node definition file as below
apiVersion: v1
kind: Node
metadata:
name: node-1
labels:
purpose: high-performance
resource: gpu
then apply Node definition file
kubectl apply -f node-label.yaml
Step 2: apply node selector on pod side
Now, create a deployment for your ML application that uses the nodeSelector
section to schedule the pods only on nodes with the purpose=high-performance
and resource=gpu
labels.
use the declarative deployment YAML file
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-app
spec:
replicas: 2
selector:
matchLabels:
app: ml-app
template:
metadata:
labels:
app: ml-app
spec:
nodeSelector:
purpose: high-performance
resource: gpu
containers:
- name: ml-container
image: ml-app-image:latest
ports:
- containerPort: 80
then save and apply the deployment
kubectl apply -f ml-deployment.yaml
You can do the same using imperative way as below and generate the output of the command which is the deployment file in a yaml format using -o yaml then pass then set the nodeSelector section using kubectl set.
personally I don’t prefer that sophisticated way 😀
kubectl create deployment ml-app --image=ml-app-image:latest --replicas=2 \
--dry-run=client -o yaml | kubectl set spec.template.spec.nodeSelector \
purpose=high-performance resource=gpu --local -o yaml | kubectl apply -f -
The command does the following:
- Creates a YAML definition for the deployment with the
--dry-run=client -o yaml
flags. - Sets the
nodeSelector
fields to target nodes with specific labels. - Applies the configuration.
Some cons of using Node Selector
- Simple Matching Only:
nodeSelector
only supports exact key-value matching. If you need more nuanced scheduling, such as matching a range of values or adding OR/AND logic,nodeSelector
won’t be enough. - Limited Control:
nodeSelector
doesn’t allow for preferences (exmple: “prefer nodes with label X but allow others”). It’s all-or-nothing, meaning that if no nodes match, the pod remains unscheduled ! - No Rebalancing: Kubernetes won’t rebalance pods if resources become constrained or node labels change. You need to delete and recreate the pods for them to be rescheduled according to updated
nodeSelector
constraints. - Overloading Specific Nodes: If too many workloads require the same
nodeSelector
labels, it can lead to resource pressure on a few nodes while others remain underutilized! - Not for large clusters: In large clusters with varied node capabilities (e.g., GPU, memory, CPU), using
nodeSelector
alone may limit the flexibility of the scheduler.
So lets demonstrate more effective way which is Node Affinity. With node Affinity you have more control and it usually combined with taints and tolerations.
Affinity
we have hard requirements that can be achieved using node selector (pods will only run on matching nodes) but we have also soft preferences (pods prefer certain nodes but can run elsewhere if needed).
Affinity in Kubernetes is a set of rules that helps control pod placement in a cluster by defining preferences or requirements for specific nodes or proximity to other pods.
ffinity is mainly used to:
- Node Affinity: Specify on which nodes a pod should (or should preferably) run, based on node labels
- Pod Affinity: Co-locate pods close to other pods, useful for applications that benefit from running together, like microservices with high network communication.
- Pod Anti-Affinity: Separate pods from certain other pods, ideal for increasing availability by spreading pods across multiple nodes to avoid single points of failure.
Node Affinity
Node Affinity applied on pods with more flexibility for control placement on specific labeled nodes, Let’s say you have nodes with a label node-type=high-memory
for high-memory workloads. You want to ensure that memory-intensive applications only run on these nodes.
Lets apply the label on the node
kubectl label nodes node-1 node-type=high-memory
To do that we added an affinity section in pod definition file ” or deployment file” as below
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values:
- high-memory
Explanation
requiredDuringSchedulingIgnoredDuringExecution
specifies that the pod must be scheduled on nodes with the labelnode-type=high-memory
.- If no nodes have this label, the pod will remain unscheduled.
but what if we would prefer placement for that high performance workloads to run but it is not mandatory, so the affinty section will be like that
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: performance
operator: In
values:
- high
Explanation:
preferredDuringSchedulingIgnoredDuringExecution
adds a preference (soft constraint) for nodes labeledperformance=high
.- If a node with this label is available, Kubernetes will try to schedule the pod there. If not, it will place the pod on any available node.
A different scenario , we would combine hard and soft rules together, you want pods to be scheduled only on nodes with region=us-west
, but you prefer nodes that also have performance=high
for better efficiency.
Here are the combination and location of affinity in Deployment file ” pod definition section “
apiVersion: apps/v1
kind: Deployment
metadata:
name: combined-affinity-app
spec:
replicas: 2
selector:
matchLabels:
app: combined-app
template:
metadata:
labels:
app: combined-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: region
operator: In
values:
- us-west
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: performance
operator: In
values:
- high
containers:
- name: combined-container
image: combined-app-image:latest
Node Affinity Section possible fields – reference
I copied all the possible fields under nodeAffinity section from k8s documentation to be a one article reference for that topic
- requiredDuringSchedulingIgnoredDuringExecution: this is a hard constraint. The pod will only be scheduled on nodes that match the defined conditions. It will not be placed on nodes without matching labels. sub fields in that section as below:
- nodeSelectorTerms: List of terms that each contain
matchExpressions
- matchExpressions: Array that specifies matching rules for node labels
- key: Label key to match on (e.g.,
region
). operator
: Specifies how the key/value relationship should be evaluated. Possible values:In
: Matches nodes with specified values for the key.NotIn
: Excludes nodes with specified values for the key.Exists
: Matches nodes with the specified key, regardless of value.DoesNotExist
: Matches nodes that don’t have the specified key.Gt
: Matches nodes where the label value is greater than a specified integer.Lt
: Matches nodes where the label value is less than a specified integer.
values
: List of values that apply whenoperator
isIn
orNotIn
.
- key: Label key to match on (e.g.,
- matchExpressions: Array that specifies matching rules for node labels
- nodeSelectorTerms: List of terms that each contain
- preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. Kubernetes will try to schedule the pod on nodes that match these preferences, but if not available, the pod can still be scheduled on other nodes. sub fields:
- weight: Integer from 1–100. Higher weight indicates a stronger preference.
- preference: Contains
matchExpressions
, which define the label matching rules, similar to those inrequiredDuringSchedulingIgnoredDuringExecution
.
Pod Affinity
Pod affinity is used to schedule pods close to other pods with matching labels, often for efficiency in communication like microservices pods which have to be placed into specific node or place all application pods into nodes which existing in the same geographical area.
In this example, we want to deploy a web application (web-app
) that should be scheduled on the same node as a backend service (backend-service
) for optimal performance , so look in podAffinity section where we define our constraints
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: backend-service
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-container
image: nginx
ports:
- containerPort: 80
Explanation
podAffinity
: Defines the affinity rules for the pod.requiredDuringSchedulingIgnoredDuringExecution
: Specifies a hard rule, meaning the pod must be scheduled according to this rule.labelSelector
: Looks for pods with the labelapp: backend-service
.topologyKey
: Specifies the domain for colocation, set tokubernetes.io/hostname
, meaning pods will be scheduled on the same node.
Pod Affinity Section possible fields – reference
I copied all possible fields from K8s documentation to be a one article reference
- requiredDuringSchedulingIgnoredDuringExecution: his is a hard requirement. Pods will only be scheduled on nodes where the specified affinity conditions are met. sub fields are:
- labelSelector: Defines labels that must match on the other pod(s) for the affinity rule to apply.
matchLabels
: Specifies key-value pairs of labels that must match exactly.matchExpressions
: Defines more complex matching rules, similar to those innodeSelectorTerms
.
- topologyKey: The Kubernetes label (e.g.,
kubernetes.io/hostname
orfailure-domain.beta.kubernetes.io/zone
) that specifies the domain within which the affinity rule should be applied.
- labelSelector: Defines labels that must match on the other pod(s) for the affinity rule to apply.
- preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. The scheduler tries to place pods close to other pods with matching labels, but if no suitable nodes are found, the pod can still be scheduled elsewhere. sub fields are:
weight
: Integer from 1–100 indicating the strength of preference.podAffinityTerm
: ContainslabelSelector
andtopologyKey
as described above.
Pod Anti-Affinity
Pod anti-affinity allows you to avoid scheduling pods close to certain other pods, so it is the opposite of pod affinity !
It is useful for high availability and avoiding single points of failure by ensure distribution of pods across multiple nodes !
Suppose you have a web application with multiple replicas and want to avoid having more than one replica on the same node to increase fault tolerance, so take a look into nodeAntiAffinty section in below deployment file where we define that constraints
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: web-app
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-container
image: nginx
ports:
- containerPort: 80
Explanation
podAntiAffinity
: Specifies anti-affinity rules for the pod.requiredDuringSchedulingIgnoredDuringExecution
: This is a hard rule, so Kubernetes will only schedule a pod if it can satisfy this constraint.labelSelector
: Looks for pods with the labelapp: web-app
.topologyKey
: Defines the scope for anti-affinity, here set tokubernetes.io/hostname
, meaning each replica will be scheduled on a different node (host).
Pod Anti Affinity Section possible fields – reference
- requiredDuringSchedulingIgnoredDuringExecution:This is a hard requirement. Pods will only be scheduled on nodes that do not have other pods matching specified labels. sub fields are:
labelSelector
: Specifies labels of the pods that should not be colocated with this pod.topologyKey
: Specifies the domain within which the anti-affinity rule is applied.
- preferredDuringSchedulingIgnoredDuringExecution: This is a soft preference. Kubernetes will try to avoid placing the pod on nodes with certain other pods, but will allow it if there’s no better option.
weight
: Integer from 1–100 indicating the preference strength.podAffinityTerm
: ContainslabelSelector
andtopologyKey
for defining matching conditions.
Use Taints & Toleration with Affinity
Now you can leverages the power of both methods the taints & tolerations and affinity by specifying node constraints (through taints and tolerations) and preferences or requirements (through affinity)
Lets have an example : Deploying a Resource-Intensive App to Dedicated High-Memory Nodes. Let’s say you have nodes with a high memory capacity specifically for heavy applications. You want to:
- Taint those nodes to ignore general pods.
- Tolerate the taint for your high-memory application’s pods, allowing them to be scheduled on those nodes.
- Use node affinity to prefer or require that these pods be scheduled on nodes labeled with
memory: high
.
Steps to apply example Deployment with Taints, Tolerations, and Node Affinity
Step 1: add Taint to Nodes (for High-Memory Nodes)
On the desired high-memory nodes, apply this taint so that only pods with a matching toleration can be scheduled on them.
kubectl taint nodes high-mem-node key=high-memory:NoSchedule
Step 2: Create a Deployment with Tolerations and Node Affinity
This deployment file specifies that only nodes with the memory: high
label and matching toleration can host these pods.
apiVersion: apps/v1
kind: Deployment
metadata:
name: high-mem-app
spec:
replicas: 2
selector:
matchLabels:
app: high-mem-app
template:
metadata:
labels:
app: high-mem-app
spec:
tolerations:
- key: "high-memory"
operator: "Exists"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "memory"
operator: "In"
values:
- "high"
containers:
- name: high-mem-container
image: nginx
resources:
requests:
memory: "1Gi"
cpu: "500m"
ports:
- containerPort: 80
Explanation
- Tolerations: The toleration in this spec (
key: high-memory
) allows the pods to be scheduled on nodes with thehigh-memory:NoSchedule
taint. - Node Affinity:
requiredDuringSchedulingIgnoredDuringExecution
: specifies a hard constraint that the pod can only be scheduled on nodes labeledmemory: high
, ensuring that the deployment will prioritize high-memory nodes.
Conclusion
Combining taints and tolerations with node affinity is a powerful way to enforce scheduling rules in Kubernetes. Taints and tolerations create boundaries, and node affinity ensures that workloads are placed on optimal nodes within those boundaries.
Despite that powerful combination still there are cons of doing that !
- Complexity in Scheduling: Managing taints, tolerations, and affinities can complicate your scheduling policies, making it harder to debug scheduling issues or to quickly scale and deploy new workloads.
- Inflexibility: Hard constraints (such as required node affinity) limit scheduling options, which can lead to scheduling failures if not enough suitable nodes are available.
Next
Now we have 3 methods to use and perhaps combine together, next topic will be simply demonstrate resource requirements and limits
- Controlling Kubernetes Pod Placement: Labels and Selectors – Overriding Default Scheduling (Part 1)
- Controlling Kubernetes Pod Placement: Taints & Toleration – Overriding Default Scheduling (Part 2)
- Controlling Kubernetes Pod Placement: Node Selector & Affinity – Overriding Default Scheduling (Part 3)
- Controlling Kubernetes Pod Placement: Requirements & Limits and Daemon Sets – Overriding Default Scheduling (Part 4)