You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[Long-running applications that swap out startup memory](#long-running-applications-that-swap-out-startup-memory)
@@ -187,7 +183,6 @@ will be necessary to implement the third scenario.
187
183
- Cluster administrators can enable and configure kubelet swap utilization on a
188
184
per-node basis.
189
185
- Use of swap memory for cgroupsv2.
190
-
- Swap-aware eviction manager.
191
186
192
187
### Non-Goals
193
188
@@ -205,6 +200,9 @@ will be necessary to implement the third scenario.
205
200
- Supporting zram, zswap, or other memory types like SGX EPC. These could be
206
201
addressed in a follow-up KEP, and are out of scope.
207
202
- Use of swap for cgroupsv1.
203
+
- Add pod-level APIs to control swap configuration on a per-pod basis.
204
+
- Add scheduling mechanisms to target nodes with swap enabled/disabled or a certain swap configuration.
205
+
- Make the eviction manager swap-aware (memory evictions still work perfectly, although the eviction threshold may need tweaking in order to be optimized).
@@ -315,67 +313,6 @@ In this example, Container A would have a swap limit of 19 GB, and Container B w
315
313
316
314
This approach allocates swap limits based on each container's memory request and adjusts the proportion based on the total swap memory available in the system. It ensures that each container gets a fair share of the swap space and helps maintain resource allocation efficiency.
317
315
318
-
### Swap-aware Evictions
319
-
320
-
As part of this KEP, the eviction manager will be enhanced to be swap-aware.
321
-
This update will enable the eviction manager to account for swap usage in its decision-making process.
322
-
By doing so, it will help prevent the system from exhausting swap space, thereby maintaining system stability and responsiveness.
323
-
324
-
#### Background
325
-
326
-
Before this KEP, kubelet's eviction manager completely overlooked swap memory, leading to several issues:
327
-
* Inaccessible Swap: The memory eviction threshold is configured in such a way that swap is never triggered during node-level pressure,
328
-
as eviction occurs before the node starts swapping memory.
329
-
* Unfairness & Instability: The eviction manager may evict the "wrong" or innocent pods, failing to address the actual memory pressure.
330
-
* Unexpected Behavior: Pods that exceed their memory limits (with regular and swap memory) are not evicted first,
331
-
even though they would immediately get killed if swap were not used.
332
-
333
-
Here we present an extension to the eviction manager that will address these issues by becoming swap-aware.
334
-
The proposed logic is fully backward compatible and requires no additional configuration, making it completely transparent to the user.
335
-
336
-
To achieve this, we recommend enhancing the eviction manager's memory pressure handling to account for swap memory, rather than adding a distinct swap signal.
337
-
Memory and swap are inherently connected and should be addressed as a single issue.
338
-
By integrating swap memory into the eviction manager's logic, we ensure a more accurate and efficient handling of system resources.
339
-
For example, separating memory and swap memory is problematic because swap is not used until memory is full.
340
-
However, with the approach suggested in this KEP memory will not be considered full until the accessible swap is also full.
341
-
342
-
#### Defining "accessible swap"
343
-
344
-
Let `accessible swap` be the amount of swap that is accessible by pods according to the [LimitedSwap swap behavior](#steps-to-calculate-swap-limit).
345
-
Note that the amount of accessible swap changes in time according to the pods running on the node.
346
-
347
-
In addition, note that since only some of the Burstable QoS pods will have access to swap, the swap space will
348
-
almost never be used in its entirety by workloads. In other words, this approach will effectively leave some of
349
-
the swap space inaccessible for pods and reserved for system daemons and other system processes.
350
-
351
-
#### Changes to the eviction manager's memory pressure handling
352
-
353
-
When dealing with evictions, there are two main questions needed to be answered:
354
-
how to identify when the node is under pressure and how to rank pods for eviction.
355
-
356
-
The eviction manager will become swap aware by making the following changes to its memory pressure handling:
357
-
-**How to identify pressure**: The eviction manager will consider the total sum of all running pods' accessible swap as additional memory capacity.
358
-
-**How to rank pods for eviction**: In the context of ranking pods for evictions, swap memory is considered as additional "regular" memory
359
-
and accessible swap is considered as additional memory request.
360
-
This is relevant for checking whether memory requests are exceeded [1] or for identifying which pods uses more memory [2].
361
-
362
-
In other words, the order of evictions documented [3] will have to change to the following:
363
-
> ```
364
-
> The kubelet uses the following parameters to determine the pod eviction order:
365
-
> 1. Whether the pod's resource usage with swap (memory usage + swap usage) exceeds requests with swap (memory requests + swap requests).
@@ -549,9 +486,6 @@ This can cause problems where workloads can use up all swap.
549
486
If all swap is used up on a node, it can make the node go unhealthy.
550
487
To avoid exhausting swap on a node, `UnlimitedSwap` was dropped from the API in beta2.
551
488
552
-
It was determined that the eviction manager should still be able to protect the node in case of swap memory pressure.
553
-
In this case, we will teach the eviction manager to be aware of swap as a resource to avoid exhausting swap resource.
554
-
555
489
#### Security risk
556
490
557
491
Enabling swap on a system without encryption poses a security risk, as critical information, such as Kubernetes secrets, may be swapped out to the disk. If an unauthorized individual gains access to the disk, they could potentially obtain these secrets. To mitigate this risk, it is recommended to use encrypted swap. However, handling encrypted swap is not within the scope of kubelet; rather, it is a general OS configuration concern and should be addressed at that level. Nevertheless, it is essential to provide documentation that warns users of this potential issue, ensuring they are aware of the potential security implications and can take appropriate steps to safeguard their system.
@@ -601,7 +535,6 @@ We summarize the implementation plan as following:
601
535
the CRI on the amount of swap to allocate to each container. The container
602
536
runtime will then write the swap settings to the container level cgroup.
603
537
1. Add node stats to report swap usage.
604
-
1. Enhance eviction manager to protect against swap memory running out.
0 commit comments