Skip to content

Conversation

@lifubang
Copy link
Member

When reading cpuacct.usage_all, we assume each line contains exactly three columns; Lines with a different number of columns should be skipped.

However, a regression was introduced when the column count exceeds three. The parameter 'n' in strings.SplitN was changed to 3, causing the split result to always contain exactly three elements—any additional columns are concatenated into the third field. This breaks subsequent unit parsing.

Fixes: #46
Reported-by: vimiix [email protected]

When reading cpuacct.usage_all, we assume each line contains exactly
three columns; Lines with a different number of columns should be skipped.

However, a regression was introduced when the column count exceeds three.
The parameter 'n' in strings.SplitN was changed to 3, causing the split
result to always contain exactly three elements—any additional columns are
concatenated into the third field. This breaks subsequent unit parsing.

Reported-by: vimiix <[email protected]>
Signed-off-by: lifubang <[email protected]>
@lifubang
Copy link
Member Author

Even though cgroupv1 is being deprecated, this regression should still be addressed.

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do better:

  1. Check if the first line (that we currently discard) starts with "cpu user system", and if not, bail out (usageKernelMode, usageUserMode, nil).
  2. When reading, just ignore the 4th and subsequent values instead of discarding the whole line.

This way, if a newer kernel will add something (which I very much doubt because cgroup v1 is over) we are ready for it.

Or, we can just fix the regression (using this patch as is).

@lifubang
Copy link
Member Author

  1. Check if the first line (that we currently discard) starts with "cpu user system"

I don't know if anyone has done localization for these strings.

  1. When reading, just ignore the 4th and subsequent values instead of discarding the whole line.

However, what if someone adds extra columns before the user and system fields? In that case, assuming the first three columns are valid becomes unreliable.

So I think we should skip reading if there are more than 3 columns, because we can't trust these values.

@cyphar
Copy link
Member

cyphar commented Oct 24, 2025

FWIW, if this kind of change breaks us we should report it as a kernel regression.

@kolyshkin
Copy link
Contributor

  1. Check if the first line (that we currently discard) starts with "cpu user system"

I don't know if anyone has done localization for these strings.

Not talking about localization here. We've seen that someone patched the kernel to add more values. We only know about the first three, so instead of rejecting the file with more than 3 (which is what the current code as well as your PR does), we can check that the first three values are as expected, and use those if they are, or return empty stats if there's something unexpected in there.

  1. When reading, just ignore the 4th and subsequent values instead of discarding the whole line.

However, what if someone adds extra columns before the user and system fields? In that case, assuming the first three columns are valid becomes unreliable.

This is why I wanted to check that the first line starts with "cpu user system" (and not because of localization). If it does -- we can probably use the first 3 numbers. If it doesn't -- we return empty stats.

So I think we should skip reading if there are more than 3 columns, because we can't trust these values.

@kolyshkin
Copy link
Contributor

FWIW, if this kind of change breaks us we should report it as a kernel regression.

It's not in the upstream kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kubelet fails to start due to cgroups CPU stat parsing failed in k8s 1.34

3 participants