Skip to content

Conversation

oshogbo
Copy link
Contributor

@oshogbo oshogbo commented Jun 20, 2025

Motivation and Context

Adding a knob allows disabling the slow_io check on a single vdev. Some users have reported that it breaks their enterprise configuration when one or more vdevs are using fiber channels with redundancy. Another reason to disable the check might be when a vdev is damaged, but we still want to force some resilvering from it.

Description

Add a knob to disable slow I/O event generation for a single vdev.

How Has This Been Tested?

New test has been added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Jun 24, 2025
@amotin
Copy link
Member

amotin commented Jun 24, 2025

I wonder if this feature partially duplicates slow_io_n/slow_io_t? Though I may not understand their meaning.

@oshogbo oshogbo force-pushed the oshogbo/n_knob branch 2 times, most recently from 3cdaf1b to a5a9411 Compare July 9, 2025 09:57
@oshogbo
Copy link
Contributor Author

oshogbo commented Jul 9, 2025

I wonder if this feature partially duplicates slow_io_n/slow_io_t? Though I may not understand their meaning.

I see this new feature as a supplement to slow_io_n/slow_io_t. Those settings serve two purposes: they let zpool status report how many slow I/Os have occurred, and they generate events for ZED/zfsd so it can decide whether to downgrade the pool. However, sometimes you don’t want the pool downgraded, but you still might want some accounting on slow I/O.

For example, in multipath environments (as has been pointed out multiple times on X/Twitter), pools have unexpectedly been downgraded. From my point of view, this feature is useful when you know a device is failing but still want to extract as much throughput as possible during recovery.

Simply raising slow_io_n/slow_io_t to very high values isn’t a great solution, and seems like an bypass not solution — how high is “high enough”? A single knob to control this behavior makes more sense in these scenarios.

@oshogbo oshogbo force-pushed the oshogbo/n_knob branch 3 times, most recently from 6b06023 to 0dcf677 Compare August 25, 2025 08:42
@behlendorf
Copy link
Contributor

We'll need to make sure this change and #17227 work well together. Depending on what order they end up being merged in, it looks like that will mainly entail making sure the new property covers disabling slow IO events from sitout events.

Introduce a new vdev property `VDEV_PROP_SLOW_IO_REPORTING` that
allows users to disable notifications for slow devices.
This prevents ZED and/or ZFSD from degrading the pool due to slow
I/O.

Signed-off-by: Mariusz Zaborski <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants