Skip to content

Commit 81ca355

Browse files
committed
Fix old plans to address evictions in this KEP
Signed-off-by: Itamar Holder <[email protected]>
1 parent e51fe82 commit 81ca355

File tree

1 file changed

+1
-70
lines changed

1 file changed

+1
-70
lines changed

keps/sig-node/2400-node-swap/README.md

Lines changed: 1 addition & 70 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,6 @@
1818
- [Swap as the default](#swap-as-the-default)
1919
- [Steps to Calculate Swap Limit](#steps-to-calculate-swap-limit)
2020
- [Example](#example)
21-
- [Swap-aware Evictions](#swap-aware-evictions)
22-
- [Background](#background)
23-
- [Defining &quot;accessible swap&quot;](#defining-accessible-swap)
24-
- [Changes to the eviction manager's memory pressure handling](#changes-to-the-eviction-managers-memory-pressure-handling)
2521
- [User Stories](#user-stories)
2622
- [Improved Node Stability](#improved-node-stability)
2723
- [Long-running applications that swap out startup memory](#long-running-applications-that-swap-out-startup-memory)
@@ -187,7 +183,6 @@ will be necessary to implement the third scenario.
187183
- Cluster administrators can enable and configure kubelet swap utilization on a
188184
per-node basis.
189185
- Use of swap memory for cgroupsv2.
190-
- Swap-aware eviction manager.
191186

192187
### Non-Goals
193188

@@ -205,6 +200,7 @@ will be necessary to implement the third scenario.
205200
- Supporting zram, zswap, or other memory types like SGX EPC. These could be
206201
addressed in a follow-up KEP, and are out of scope.
207202
- Use of swap for cgroupsv1.
203+
- Handling evictions, scheduling or pod-level APIs.
208204

209205
[swappiness]: https://en.wikipedia.org/wiki/Memory_paging#Swappiness
210206

@@ -315,67 +311,6 @@ In this example, Container A would have a swap limit of 19 GB, and Container B w
315311

316312
This approach allocates swap limits based on each container's memory request and adjusts the proportion based on the total swap memory available in the system. It ensures that each container gets a fair share of the swap space and helps maintain resource allocation efficiency.
317313

318-
### Swap-aware Evictions
319-
320-
As part of this KEP, the eviction manager will be enhanced to be swap-aware.
321-
This update will enable the eviction manager to account for swap usage in its decision-making process.
322-
By doing so, it will help prevent the system from exhausting swap space, thereby maintaining system stability and responsiveness.
323-
324-
#### Background
325-
326-
Before this KEP, kubelet's eviction manager completely overlooked swap memory, leading to several issues:
327-
* Inaccessible Swap: The memory eviction threshold is configured in such a way that swap is never triggered during node-level pressure,
328-
as eviction occurs before the node starts swapping memory.
329-
* Unfairness & Instability: The eviction manager may evict the "wrong" or innocent pods, failing to address the actual memory pressure.
330-
* Unexpected Behavior: Pods that exceed their memory limits (with regular and swap memory) are not evicted first,
331-
even though they would immediately get killed if swap were not used.
332-
333-
Here we present an extension to the eviction manager that will address these issues by becoming swap-aware.
334-
The proposed logic is fully backward compatible and requires no additional configuration, making it completely transparent to the user.
335-
336-
To achieve this, we recommend enhancing the eviction manager's memory pressure handling to account for swap memory, rather than adding a distinct swap signal.
337-
Memory and swap are inherently connected and should be addressed as a single issue.
338-
By integrating swap memory into the eviction manager's logic, we ensure a more accurate and efficient handling of system resources.
339-
For example, separating memory and swap memory is problematic because swap is not used until memory is full.
340-
However, with the approach suggested in this KEP memory will not be considered full until the accessible swap is also full.
341-
342-
#### Defining "accessible swap"
343-
344-
Let `accessible swap` be the amount of swap that is accessible by pods according to the [LimitedSwap swap behavior](#steps-to-calculate-swap-limit).
345-
Note that the amount of accessible swap changes in time according to the pods running on the node.
346-
347-
In addition, note that since only some of the Burstable QoS pods will have access to swap, the swap space will
348-
almost never be used in its entirety by workloads. In other words, this approach will effectively leave some of
349-
the swap space inaccessible for pods and reserved for system daemons and other system processes.
350-
351-
#### Changes to the eviction manager's memory pressure handling
352-
353-
When dealing with evictions, there are two main questions needed to be answered:
354-
how to identify when the node is under pressure and how to rank pods for eviction.
355-
356-
The eviction manager will become swap aware by making the following changes to its memory pressure handling:
357-
- **How to identify pressure**: The eviction manager will consider the total sum of all running pods' accessible swap as additional memory capacity.
358-
- **How to rank pods for eviction**: In the context of ranking pods for evictions, swap memory is considered as additional "regular" memory
359-
and accessible swap is considered as additional memory request.
360-
This is relevant for checking whether memory requests are exceeded [1] or for identifying which pods uses more memory [2].
361-
362-
In other words, the order of evictions documented [3] will have to change to the following:
363-
> ```
364-
> The kubelet uses the following parameters to determine the pod eviction order:
365-
> 1. Whether the pod's resource usage with swap (memory usage + swap usage) exceeds requests with swap (memory requests + swap requests).
366-
> 2. [Pod Priority](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/).
367-
> 3. The pod's resource usage (memory usage + swap usage) relative to requests (memory requests + swap requests).
368-
> ```
369-
370-
That is, in (1) and (3) swap is considered as an additional resource usage and memory request. Step (2) is unchanged.
371-
372-
On nodes with swap disabled, the accessible swap will equal to zero and pods won't be able to use swap,
373-
hence the eviction manager will behave the same as before.
374-
375-
[1] https://github.com/kubernetes/kubernetes/blob/d8093cc40394b8e25a864576fe6a38306730d3cb/pkg/kubelet/eviction/helpers.go#L684
376-
[2] https://github.com/kubernetes/kubernetes/blob/d8093cc40394b8e25a864576fe6a38306730d3cb/pkg/kubelet/eviction/helpers.go#L703
377-
[3] https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/#pod-selection-for-kubelet-eviction
378-
379314
### User Stories
380315

381316
#### Improved Node Stability
@@ -549,9 +484,6 @@ This can cause problems where workloads can use up all swap.
549484
If all swap is used up on a node, it can make the node go unhealthy.
550485
To avoid exhausting swap on a node, `UnlimitedSwap` was dropped from the API in beta2.
551486

552-
It was determined that the eviction manager should still be able to protect the node in case of swap memory pressure.
553-
In this case, we will teach the eviction manager to be aware of swap as a resource to avoid exhausting swap resource.
554-
555487
#### Security risk
556488

557489
Enabling swap on a system without encryption poses a security risk, as critical information, such as Kubernetes secrets, may be swapped out to the disk. If an unauthorized individual gains access to the disk, they could potentially obtain these secrets. To mitigate this risk, it is recommended to use encrypted swap. However, handling encrypted swap is not within the scope of kubelet; rather, it is a general OS configuration concern and should be addressed at that level. Nevertheless, it is essential to provide documentation that warns users of this potential issue, ensuring they are aware of the potential security implications and can take appropriate steps to safeguard their system.
@@ -601,7 +533,6 @@ We summarize the implementation plan as following:
601533
the CRI on the amount of swap to allocate to each container. The container
602534
runtime will then write the swap settings to the container level cgroup.
603535
1. Add node stats to report swap usage.
604-
1. Enhance eviction manager to protect against swap memory running out.
605536

606537
### Enabling swap as an end user
607538

0 commit comments

Comments
 (0)