Detect a slow raidz child during reads #17227

robn · 2025-04-08T00:45:48Z

Motivation and Context

Replacing #16900, which was almost finished with review updates but has stalled. I've been asked to take it over.

See original PR for details.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

robn · 2025-04-08T00:48:52Z

Because there will be a little bouncing between two PRs, and because there's two different authors involved, I'll be pushing fixup commits to this branch. Once everyone is happy with review, I will squash them down for merge.

I'll close out the remaining review comments on #16900, and would like it if new comments could be added here. Thanks all for your patience; I know its a bit fiddly (it'd be nicer if Github would allow a PR to change branches, alas).

tonyhutter · 2025-04-17T23:05:45Z

I haven't looked into it, but I see all the FreeBSD runners have:

Tests with results other than PASS that are unexpected:
    SKIP events/setup (expected PASS)
    SKIP events/slow_vdev_sit_out (expected PASS)

tonyhutter · 2025-05-15T00:49:23Z

@robn this fixed the FreeBSD CI errors for me: tonyhutter@a491397. Try squashing that in and rebasing.

behlendorf

Thanks for picking up this work! It'd be great if we can refine this a bit more and get it integrated.

tests/zfs-tests/include/libtest.shlib

tests/zfs-tests/tests/functional/events/slow_vdev_sit_out.ksh

module/zfs/vdev_raidz.c

tests/zfs-tests/tests/functional/replacement/attach_resilver_sit_out.ksh

tests/zfs-tests/tests/functional/events/slow_vdev_sit_out.ksh

module/zfs/vdev_raidz.c

pcd1193182 · 2025-08-05T16:56:41Z

I've updated this branch to fix a few things. First, I've significantly reduced (hopefully eliminated almost entirely) the unnecessary sit-outs Brian was reporting. We reproduced them internally during our performance testing and the updated version doesn't display them at all. The changes here are to use the latency histogram stats instead of the EWMA as a better source of data. We also decrease the check frequency dramatically to reduce noise, decrease the number of outliers to compensate (along with adding a facility for extreme events to increase their outlier count more rapidly), increase the fence value significantly, and add a decay mechanism to prevent random noise from eventually causing a sit-out of healthy disks.

Second, I've added an autosit property to vdevs. When set to on, it activates this code. This allows users to decide if they want this logic enabled or not. This is intended to work with the next change:

Third, I've made the sitout property writeable. This allows individual vdevs to be sat out from userland. This, in conjunction with the autosit property, allows the user to decide if they want no disk sit-outs, the kernel's automatic sit-outs, or to do something more complex. Using zpool iostat latency data, SMART stats, or any other data source they can think of, they could now create a userland daemon that monitors disk health and sits out disks that it feels are unhealthy.

Giving the capability to this in userland has a number of advantages: easier access to high-level languages and their rich libraries, more safe and rapid iteration of complex logic, and the ability to improve the logic using new developments and advanced approaches without requiring a kernel upgrade or downtime. The kernel functionality is left in place as a simple plug-and-play approach.

amotin

I agree that histograms should indeed be a better source of data, comparing to EWMA, except as I mention below, we may take a closer look when we update the previous state. Same time, I wonder if histograms may actually give us even more statistical information about the distribution curves, so that we could better estimate the confidence interval on a small number of disks.

module/zfs/vdev.c

module/zfs/vdev_raidz.c

behlendorf

Given the amount of churn in this PR it'd be nice to squash the commits the next time it's updated.

module/zfs/vdev_raidz.c

include/os/linux/spl/sys/time.h

module/zfs/vdev_raidz.c

man/man4/zfs.4

include/os/freebsd/spl/sys/time.h

module/zfs/vdev_raidz.c

man/man7/vdevprops.7

include/sys/fm/fs/zfs.h

include/sys/vdev_impl.h

include/sys/vdev_raidz.h

module/zcommon/zpool_prop.c

man/man7/vdevprops.7

module/zfs/vdev_raidz.c

tests/zfs-tests/include/tunables.cfg

module/zfs/spa_misc.c

tests/zfs-tests/tests/functional/events/slow_vdev_degraded_sit_out.ksh

Signed-off-by: Paul Dagnelie <[email protected]>

A single slow responding disk can affect the overall read performance of a raidz group. When a raidz child disk is determined to be a persistent slow outlier, then have it sit out during reads for a period of time. The raidz group can use parity to reconstruct the data that was skipped. Each time a slow disk is placed into a sit out period, its `vdev_stat.vs_slow_ios count` is incremented and a zevent class `ereport.fs.zfs.delay` is posted. The length of the sit out period can be changed using the `raid_read_sit_out_secs` module parameter. Setting it to zero disables slow outlier detection. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Paul Dagnelie <[email protected]> Contributions-by: Don Brady <[email protected]> Contributions-by: Brian Behlendorf <[email protected]>

robn · 2025-09-11T00:50:25Z

@pcd1193182 thanks for picking up this often-dropped PR and getting it over the line. Thanks all for the feedback!

robn mentioned this pull request Apr 8, 2025

Detect a slow raidz child during reads #16900

Closed

13 tasks

robn requested review from amotin, behlendorf, tonyhutter and don-brady April 8, 2025 01:09

robn force-pushed the raidz-detect-slow-outlier branch from 905c466 to 4a1a3d3 Compare April 8, 2025 04:15

amotin added the Status: Code Review Needed Ready for review and testing label Apr 8, 2025

behlendorf reviewed May 16, 2025

View reviewed changes

amotin reviewed May 27, 2025

View reviewed changes

module/zfs/vdev_raidz.c Outdated Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 2 times, most recently from 83d0de9 to 6fa275b Compare July 28, 2025 23:59

amotin reviewed Aug 5, 2025

View reviewed changes

module/zfs/vdev.c Show resolved Hide resolved

module/zfs/vdev.c Show resolved Hide resolved

module/zfs/vdev_raidz.c Outdated Show resolved Hide resolved

module/zfs/vdev_raidz.c Outdated Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 2 times, most recently from e620961 to 9138823 Compare August 6, 2025 23:52

behlendorf reviewed Aug 7, 2025

View reviewed changes

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 2 times, most recently from b4ad263 to 7013cb6 Compare August 7, 2025 21:50

amotin reviewed Aug 8, 2025

View reviewed changes

include/os/freebsd/spl/sys/time.h Outdated Show resolved Hide resolved

include/os/freebsd/spl/sys/time.h Outdated Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch from 8b741cd to 0e3ae43 Compare August 8, 2025 21:50

behlendorf mentioned this pull request Aug 25, 2025

Add knob to disable slow io notifications #17477

Open

14 tasks

behlendorf reviewed Aug 25, 2025

View reviewed changes

module/zfs/vdev_raidz.c Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch from 0e3ae43 to 56cc562 Compare August 26, 2025 16:38

behlendorf reviewed Aug 26, 2025

View reviewed changes

module/zfs/vdev_raidz.c Outdated Show resolved Hide resolved

tonyhutter reviewed Aug 26, 2025

View reviewed changes

man/man7/vdevprops.7 Show resolved Hide resolved

behlendorf reviewed Aug 26, 2025

View reviewed changes

pcd1193182 force-pushed the raidz-detect-slow-outlier branch from 713ac04 to afbd724 Compare August 27, 2025 23:45

behlendorf reviewed Aug 29, 2025

View reviewed changes

module/zfs/spa_misc.c Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 3 times, most recently from 11e6837 to 611d87b Compare September 2, 2025 20:31

behlendorf reviewed Sep 3, 2025

View reviewed changes

tests/zfs-tests/tests/functional/events/slow_vdev_degraded_sit_out.ksh Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 3 times, most recently from 397c375 to 4ed592e Compare September 3, 2025 23:06

behlendorf reviewed Sep 4, 2025

View reviewed changes

tests/zfs-tests/tests/functional/events/slow_vdev_degraded_sit_out.ksh Show resolved Hide resolved

pcd1193182 force-pushed the raidz-detect-slow-outlier branch 3 times, most recently from a33ff6c to 6b529ac Compare September 8, 2025 22:43

behlendorf approved these changes Sep 9, 2025

View reviewed changes

behlendorf requested review from amotin and tonyhutter September 9, 2025 16:18

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 9, 2025

Paul Dagnelie added 2 commits September 9, 2025 13:57

Remove RAIDZ reconstruct flags from debug defaults

d04a8ef

Signed-off-by: Paul Dagnelie <[email protected]>

pcd1193182 force-pushed the raidz-detect-slow-outlier branch from 6b529ac to 97393b1 Compare September 9, 2025 21:10

github-actions bot removed the Status: Accepted Ready to integrate (reviewed, tested) label Sep 9, 2025

behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Sep 10, 2025

behlendorf approved these changes Sep 10, 2025

View reviewed changes

behlendorf closed this in 0620c97 Sep 10, 2025

Detect a slow raidz child during reads #17227

Detect a slow raidz child during reads #17227

Uh oh!

Conversation

robn commented Apr 8, 2025

Motivation and Context

Types of changes

Checklist:

Uh oh!

robn commented Apr 8, 2025

Uh oh!

tonyhutter commented Apr 17, 2025

Uh oh!

tonyhutter commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

behlendorf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pcd1193182 commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amotin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

behlendorf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

robn commented Sep 11, 2025

Uh oh!

Uh oh!

tonyhutter commented May 15, 2025 •

edited

Loading

pcd1193182 commented Aug 5, 2025 •

edited

Loading