-
Notifications
You must be signed in to change notification settings - Fork 9.1k
YARN-11843. Fix potential deadlock when auto-correction of container allocation is enabled #7855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
…allocation is enabled
@shameersss1 @slfan1989 Could you please help to review this PR? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
...main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/AbstractYarnScheduler.java
Outdated
Show resolved
Hide resolved
@TaoYang526 - It is good catch indeed. |
@shameersss1 Currently, we don't have a specific use case for the auto-correction feature. We have always been aware that YARN has an over-allocation issue, but considered it a minor concern, so no action was taken. When I discovered YARN-11702, I wanted to take a look and found that it could resolve most over-allocation issues, except for some in-scheduling cases. It's sufficient to reduce unnecessary consumption in the scheduling cycle and we may enable this feature in the future. |
Thanks for the details. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
💔 -1 overall
This message was automatically generated. |
completedContainer(rmContainer, SchedulerUtils.createAbnormalContainerStatus( | ||
rmContainer.getContainerId(), SchedulerUtils.EXPIRED_CONTAINER), | ||
RMContainerEventType.EXPIRE); | ||
asyncContainerRelease(rmContainer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it cause the message at the end of the container to be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the message will be updated from "Container expired since it was unused" to "Container released by application", which I believe is more appropriate.
@slfan1989 @aajisaka Could you please help review this PR when you have time? |
Details please refer to YARN-11843.
Description of PR
How was this patch tested?
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?