- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.9k
Add "zpool status -vv" #17502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add "zpool status -vv" #17502
Conversation
| On a first pass it looks good to me, thanks!. Though not really sure why the checks are not successful. Could you squash and re-push? | 
| Well, I think the "checkstyle" check is failing because I didn't update libzfs.abi. But I can find no instructions for how to do that. @ixhamza you were the last to do it. Could you please tell me how to update libzfs.abi due to a function prototype change? | 
| In addition to abi I see: You may get new abi here https://github.com/openzfs/zfs/actions/runs/16008660282 (see artifact, direct link to it https://github.com/openzfs/zfs/actions/runs/16008660282/artifacts/3443942581r ) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no critical objections, but few things I would do differently. Also please rebase it to the latest master and clean the commit history.
        
          
                cmd/zpool/zpool_main.c
              
                Outdated
          
        
      | if (cb->cb_verbosity < 2) { | ||
| errl[i] = safe_malloc(len); | ||
| zpool_obj_to_path(zhp, dsobj, | ||
| obj, errl[i++], len); | ||
| } else { | ||
| uint64_t lvl, blkid; | ||
|  | ||
| errnvl[i] = fnvlist_alloc(); | ||
| lvl = fnvlist_lookup_uint64(nv, | ||
| ZPOOL_ERR_LEVEL); | ||
| blkid = fnvlist_lookup_uint64( | ||
| nv, ZPOOL_ERR_BLKID); | ||
| zpool_obj_to_path(zhp, dsobj, | ||
| obj, pathbuf, len); | ||
| fnvlist_add_string(errnvl[i], | ||
| "path", pathbuf); | ||
| fnvlist_add_uint64(errnvl[i], | ||
| "level", lvl); | ||
| fnvlist_add_uint64(errnvl[i++], | ||
| "record", blkid); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we always do it the verbose way here to simplify the code, and just not print the additional information later? It does not look like very performance-sensitive code path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, because if verbosity is < 2, then nverrlist won't contain the "level" and "blockid" fields.  And I don't want for zpool_get_errlog to always supply those fields, because it could result in enormous nvlists if a single file has many corrupt records.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enormous nvlists if a single file has many corrupt records.
Along these same lines I'm a bit concerned with logging a line per block. That could be overwhelming.
Thinking about this from a user perspective, I really don't care about how ZFS decided to internally layout the file (objid, level, blkid). What is useful to me are the file offsets which are corrupt. That's a little more work to generate but shouldn't be too bad. Maybe something like:
errors: Permanent errors have been detected in the following files:
        /testpool/randfile.7 393216-524288,1048576-1179647,1966080-2097151
        /testpool/randfile.5 917504-1048575
        ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I don't want for
zpool_get_errlogto always supply those fields
I didn't mean to always supply all the ranges. Merely using the same data structure with nvlists, just with one vs many entries per file, if it allow to make code cleaner. But if no, I won't insist.
Specifying the verbose flag twice will display a list of all corrupt sectors within each corrupt file, as opposed to just the name of the file. Signed-off-by: Alan Somers <[email protected]> Sponsored by: ConnectWise
4560e4e    to
    b07f519      
    Compare
  
    | I applied your suggestions, rebased, and squashed, @amotin . | 
| @asomers You seem to address 1.5 of my comments. What's about other 1.5? | 
| 
 Do you mean the comment about "Couldn't we always do it the verbose way here to simplify the code"?  I explained why I thought that would be a bad idea.  Or do you mean that I didn't replace one  | 
| 
 I'm sorry if I lost it in context switches, but I don't remember. Could you point where? 
 This is also.  You've added error checks for two  | 
| 
 I was referring to #17502 (comment) . 
 Yes exactly.  I chose to use  | 
| 
 Right. I see my question, but not your response. Did you send it? ;) 
 No. I won't fight over it. | 
| 
 Ahh, that's it. It was stuck in the "Pending" state. It's posted now. | 
| errors_nvlist(zpool_handle_t *zhp, status_cbdata_t *cb, nvlist_t *item) | ||
| { | ||
| uint64_t nerr; | ||
| int verbosity = cb->cb_verbosity; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use cb->cb_verbosity throughout and remove the local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only reason I created the local variable is so I didn't have to split a long line at 80 columns. IMHO it looks better this way. But I'll change it if you want me to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just like to make sure we're using either the local variable or cb->cb_verbosity consistently in this function.  I don't feel strongly about which one, so if you want to stick with the local variable we should update the other places it's used.
        
          
                cmd/zpool/zpool_main.c
              
                Outdated
          
        
      | if (cb->cb_verbosity < 2) { | ||
| errl[i] = safe_malloc(len); | ||
| zpool_obj_to_path(zhp, dsobj, | ||
| obj, errl[i++], len); | ||
| } else { | ||
| uint64_t lvl, blkid; | ||
|  | ||
| errnvl[i] = fnvlist_alloc(); | ||
| lvl = fnvlist_lookup_uint64(nv, | ||
| ZPOOL_ERR_LEVEL); | ||
| blkid = fnvlist_lookup_uint64( | ||
| nv, ZPOOL_ERR_BLKID); | ||
| zpool_obj_to_path(zhp, dsobj, | ||
| obj, pathbuf, len); | ||
| fnvlist_add_string(errnvl[i], | ||
| "path", pathbuf); | ||
| fnvlist_add_uint64(errnvl[i], | ||
| "level", lvl); | ||
| fnvlist_add_uint64(errnvl[i++], | ||
| "record", blkid); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
enormous nvlists if a single file has many corrupt records.
Along these same lines I'm a bit concerned with logging a line per block. That could be overwhelming.
Thinking about this from a user perspective, I really don't care about how ZFS decided to internally layout the file (objid, level, blkid). What is useful to me are the file offsets which are corrupt. That's a little more work to generate but shouldn't be too bad. Maybe something like:
errors: Permanent errors have been detected in the following files:
        /testpool/randfile.7 393216-524288,1048576-1179647,1966080-2097151
        /testpool/randfile.5 917504-1048575
        ...
| 
 Me too. That's why I don't want it to be the default, but only opted-into with "-vv" 
 I have two thoughts about this: 
 | 
| That's fair. Yeah, now that you point it out I don't see a great solution for getting the record size. You could imagine either extending or adding a new ioctl interface, but that's more complexity and compatibility code I'd really prefer to avoid. Reporting block IDs it is. Perhaps then something just a little more concise?  | 
* Use a local variable more consistently * Condense error reports into runs of contiguous blocks
| @behlendorf with the latest push, error reports look like this: Combining runs of contiguous blocks is probably good. But I'm not sure that I like combining discontiguous runs onto a single line. That means a file with many discontiguous errors could end up being printed as an extremely long line. | 
| @asomers I finally realized why this felt familiar. PR #9781 was working on adding exactly this same functionality but it unfortunately ended up stalling out. #9781 is unsurprisingly very similar to yours, but it has a few additions we should incorporate. For example, the output format which they settled on I think is quite nice. It collapses contiguous ranges, prints the range byte offsets and even a nice summary of the damaged blocks. Here's the example output: The original PR extends the ZFS_IOC_OBJ_TO_STATS ioctl to accomplish this.  Now we can't do that exactly because it's one of the legacy ioctl interface and we don't want to break the user/kernel ioctl ABI by adding fields to  | 
| @behlendorf whoa I totally forgot about that old PR (which ironically is actually an updated version of an even older PR #8902)). @asomers feel free to revive whatever you want from that PR.  I remember using the range tree was a nice way to collapse error ranges.  If you do use bits from the old PR, please credit the original author:  Along with that, since we now support JSON ( | 
| @tonyhutter I forgot to mention that the PR as-is already works with JSON output. It looks like this:       "error_count": "2",
      "errlist": [
        {
          "path": "POOL/DATASET@SNAPSHOT1:/FILE",
          "level": 0,
          "record": 867511
        },
        {
          "path": "POOL/DATASET@SNAPSHOT2:/FILE",
          "level": 0,
          "record": 867511
        }, | 
| @asomers I started taking another look at the code, and then made a few changes... and one thing lead to another and I ended up adding in the byte error ranges.  This new version will work with all  My branch is here (which also includes your commits, rebased on master): https://github.com/tonyhutter/zfs/tree/mystatusv Do you want me to create a new PR and get this over the finish line? Or would you like to pull my commit into this PR? | 
| Awesome @tonyhutter ! Go ahead and create a new PR. I will review it tomorrow, and test it on one of my corrupt systems. | 
Print the byte error ranges with 'zpool status -vv'. This works with the normal zpool status formatting flags (-p, -j, --json-int). In addition: - Move range_tree/btree to common userspace/kernel code. - Modify ZFS_IOC_OBJ_TO_STATS ioctl to optionally return "extended" object stats. - Let zinject corrupt zvol data. - Add test case. This commit takes code from these PRs: openzfs#17502 openzfs#9781 openzfs#8902 Signed-off-by: Tony Hutter <[email protected]> Co-authored-by:: Alan Somers <[email protected]> Co-authored-by: TulsiJain <[email protected]>
Print the byte error ranges with 'zpool status -vv'. This works with the normal zpool status formatting flags (-p, -j, --json-int). In addition: - Move range_tree/btree to common userspace/kernel code. - Modify ZFS_IOC_OBJ_TO_STATS ioctl to optionally return "extended" object stats. - Let zinject corrupt zvol data. - Add test case. This commit takes code from these PRs: openzfs#17502 openzfs#9781 openzfs#8902 Signed-off-by: Tony Hutter <[email protected]> Co-authored-by:: Alan Somers <[email protected]> Co-authored-by: TulsiJain <[email protected]>
Specifying the verbose flag twice will display a list of all corrupt sectors within each corrupt file, as opposed to just the name of the file.
Signed-off-by: Alan Somers [email protected]
Sponsored by: ConnectWise
Motivation and Context
Displays the record number of every corrupt record in every corrupt file. I find this is very useful when cleaning up the fallout from #16626.
Description
The kernel already tracks the blkid of every corrupt record, and already transmits that information to userland. But libzfs has always thrown it away, until now. This PR adds a
-vvoption tozpool status. When used, it will print the level and blkid of every corrupt record. It works in combination with-j, too.How Has This Been Tested?
Manually tested on about half a dozen production datasets that had on-disk corruption as a result of #16626 , in both L0 and L1 blocks.
Manually tested on a test dataset that I intentionally corrupted. That one had multiple corrupted records on multiple files.
Example output, in human readable mode:
Example output, in json mode
{ "output_version": { "command": "zpool status", "vers_major": 0, "vers_minor": 1 }, "pools": { "testpool": { "name": "testpool", "state": "ONLINE", "pool_guid": "10305967396160717712", "txg": "1523", "spa_version": "5000", "zpl_version": "5", "status": "One or more devices has experienced an error resulting in data\n\tcorruption. Applications may be affected.\n", "action": "Restore the file in question if possible. Otherwise restore the\n\tentire pool from backup.\n", "msgid": "ZFS-8000-8A", "moreinfo": "https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A", "scan_stats": { "function": "SCRUB", "state": "FINISHED", "start_time": "Tue Jul 1 12:11:14 2025", "end_time": "Tue Jul 1 12:11:14 2025", "to_examine": "56.0M", "examined": "56.0M", "skipped": "92K", "processed": "0B", "errors": "12", "bytes_per_scan": "0B", "pass_start": "1751393474", "scrub_pause": "-", "scrub_spent_paused": "0", "issued_bytes_per_scan": "55.9M", "issued": "55.9M" }, "vdevs": { "testpool": { "name": "testpool", "vdev_type": "root", "guid": "10305967396160717712", "class": "normal", "state": "ONLINE", "alloc_space": "56.0M", "total_space": "112M", "def_space": "112M", "read_errors": "0", "write_errors": "0", "checksum_errors": "0", "vdevs": { "/tmp/zfs.img": { "name": "/tmp/zfs.img", "vdev_type": "file", "guid": "1719526601577822810", "path": "/tmp/zfs.img", "class": "normal", "state": "ONLINE", "alloc_space": "56.0M", "total_space": "112M", "def_space": "112M", "rep_dev_size": "116M", "self_healed": "1.50K", "phys_space": "128M", "read_errors": "0", "write_errors": "0", "checksum_errors": "27", "slow_ios": "0" } } } }, "error_count": "12", "errlist": [ { "path": "/testpool/randfile.7", "level": 0, "record": 3 }, { "path": "/testpool/randfile.7", "level": 0, "record": 9 }, { "path": "/testpool/randfile.7", "level": 0, "record": 16 }, { "path": "/testpool/randfile.9", "level": 0, "record": 8 }, { "path": "/testpool/randfile.9", "level": 0, "record": 15 }, { "path": "/testpool/randfile.10", "level": 0, "record": 3 }, { "path": "/testpool/randfile.10", "level": 0, "record": 11 }, { "path": "/testpool/randfile.5", "level": 0, "record": 17 }, { "path": "/testpool/randfile.8", "level": 0, "record": 11 }, { "path": "/testpool/randfile.8", "level": 0, "record": 19 }, { "path": "/testpool/randfile.6", "level": 0, "record": 3 }, { "path": "/testpool/randfile.6", "level": 0, "record": 12 } ] } } }Types of changes
Checklist:
Signed-off-by.