Add support for anyraid vdevs #17567

pcd1193182 · 2025-07-25T16:47:37Z

Sponsored by: Eshtek, creators of HexOS; Klara, Inc.

Motivation and Context

For industry/commercial use cases, the existing redundancy solutions in ZFS (mirrors and RAIDZ) work great. They provide high performance, reliable, efficient storage options. For enthusiast users, however, they have a drawback. RAIDZ and mirrors will use the size of the smallest drive that is part of the vdev as the size of every drive, so they can provide their reliability guarantees. If you can afford to buy a new box of drives for your pool, like large-scale enterprise users, that's fine. But if you already have a mix of hard drives, of various sizes, and you want to use all of the space they have available while still benefiting from ZFS's reliability and featureset, there isn't currently a great solution for that problem.

Description

The goal of Anyraid is to fill that niche. Anyraid allows devices of mismatched sizes to be combined together into a single top-level vdev. In the current version, Anyraid only supports mirror-type parity, but raidz-type parity is planned for the near future.

Anyraid works by dividing each of the disks that makes up the vdev into tiles. These tiles are the same size across all disks within a given anyraid vdev. The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger. These tiles are then combined together to form the logical vdev that anyraid presents, with sets of tiles from different disks acting as mini-mirrors, allowing the reliability guarantees to be preserved. Tiles are allocated on demand; when a write comes into a part of the logical vdev that doesn't have backing tiles yet, the Anyraid logic picks the nparity + 1 disks with the most unallocated tiles, and allocates one tile from each of them. These physical tiles are combined together into one logic tile, which is used to store data for that section of the logical vdev.

One important note with this design is that we need to understand this mapping from logical offset to tiles (and therefore to actual physical disk locations) in order to read anything from the pool. As a result, we cannot store the mapping in the MOS, since that would result in a bootstrap problem. To solve this issue, we allocate a region at the start of each disk where we store the Anyraid tile map. The tile map is made up of 4 copies of all the data necessary to reconstruct the tile map. These copies are updated in rotating order, like uberblocks. In addition, each disk has a full copy of all 4 maps, ensuring that as long as any drive's copy survives, the tile map for a given TXG can be read successfully. The size of one copy of the tile map is 64MiB; that size determines the maximum number of tiles an anyraid vdev can have, which is 2^24. This is made up of up to 2^8 disks, and up to 2^16 tiles per disk. This does mean that the largest device that can be fully used by an anyraid vdev is 1024 times the size of the smallest disk that was present at vdev creation time. This was considered to be an acceptable tradeoff, though it is a limit that could be alleviated in the future if needed; the primary difficulty is that either the tile map needs to grow substantially, or logic needs to be added to handle/prevent the tile map filling up.

Anyraid vdevs support all the operations that normal vdevs do. They can be resilvered, removed, and scrubbed. They also support expansion; new drives can be attached to the anyraid vdev, and their tiles will be used in future allocations. There is currently no support for rebalancing tiles onto new devices, although that is also planned. VDEV Contraction is also planned for the future.

New ZDB functionality was added to print out information about the anyraid mapping, to aid in debugging and understanding. A number of tests were also added, and ztest support for the new type of vdev was implemented.

How Has This Been Tested?

In addition to the tests added to the test suite and zloop runs, I also ran many manual tests of unusual configurations to verify that the tile layout behaves correctly. There was also some basic performance testing to verify that nothing was obviously wrong. Performance is not the primary design goal of anyraid, however, so in-depth analysis was not performed.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Quality assurance (non-breaking change which makes the code more robust against bugs)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

cmd/zdb/zdb.c

module/zfs/vdev_anyraid.c

module/zfs/spa.c

amotin · 2025-08-07T20:01:33Z

module/zfs/metaslab.c

+	vdev_t *vd = mg->mg_vd;
+	if (B_FALSE) {
+		weight = 2 * weight - (msp->ms_id * weight) / vd->vdev_ms_count;
+		weight = MIN(weight, METASLAB_MAX_WEIGHT);


I am suspicious about this math:

Unlike seems like linear space-based weight, segment-based weight is exponential. So doubling weight of the first metaslabs, you actually exponentially increasing it. I.e, if the first metaslab has only one free segment of only 1MB, and so weight with INDEX=20 and COUNT=1, doubling that will give INDEX=40 and COUNT=2, which would make it absolutely unbeatable for the most metaslabs, since it would require a free segment of up to 1TB.

all free metaslabs are identical and selected based on their offset, which is OK, if it is expected, but they also don't go through this path, so they have a very small chance to ever be used until all the earlier metaslabs are filled to the brim.

Hm, yeah doubling is probably overkill for this. But we do need something that will interpolate nicely later into the vdev. Perhaps what we do is something like "Add 3 - ((ms_id * 4) / vdev_ms_count) to the index and add 20 - (((ms_id * 20) / (vdev_ms_count / 5)) % 20) to the count? So for the first quarter we would add 3 to the count, then 2, 1, and 0. And within each quarter, we would add 19 to the first metaslab, 18 to the second, etc. It's not ideal, since larger vdevs will have large plateaus where the modifications are the same, but that's probably alright. We just want to try to concentrate writes generally earlier, a little mixing in adjacent metaslabs on large pools is probably fine. And with the largest index difference being 3, the last metaslabs only needs to have segments 8x as large as the first one to compete.

For purposes of spinning disks performance we don't really need a smooth curve. Having just several "speed zones" would be enough to get the most of performance. I propose to leave the first zone as it is, for the second zone to subtract 1 from index, while doubling the count to keep it equivalent, for the third zone subtract another one from index and again double the count, etc. This logic may not be great beyond 3-4 zones, but IMO that should be enough to get the most of speed, and I think it could actually be applied independently of anyraid to any rotating vdevs. We may not apply this to metaslabs with index below some threshold, since sequential speed there should not worth much, and we should better focus on lower fragmentation.

For purposes of anyraid's tiles allocation I think you only need to a certain degree prefer already used metaslabs to empty ones. Once some metaslabs are used, I am not sure (yet?) why would you really prefer one used metaslab to another, since single used block would be enough for tile to stay allocated no matter what. So I think it may be enough to just account free metaslabs not as one free segment (index of size and count of 1, that makes it unbeatable now), but split it few times in a way I described above for HDDs to a level just below (or equal to) the last speed zone of used metaslabs. Free metaslabs do not need zones, since current code already sort identical metaslabs by offset.

I agree we probably don't need gradations beyond the few top-level speed zones. I think the simplest algorithm that satisfies our goals is to just add N - 1 to the index in the first zone, N - 2 in the second, etc, for N zones. That will prefer earlier metaslabs, and since untouched metaslabs won't hit this code, they will naturally end up with nothing added to the index, and end up sorting with everything in the final zone (and then earlier ones will sort first, as you said). I see the idea behind decreasing the index and doubling the count to match, but I don't think that's actually important; we don't multiply the segment size and count together to get the available space anywhere, so we don't need to preserve that relationship.

This prefers earlier metaslabs for the general rotational performance bonus, and prefers already used metaslabs for anyraid. We can also disable this logic if the index is below a critical value (24?) so that if we're very fragmented, we abandon this and only focus on the actual free space.

We can also disable this logic if the index is below a critical value (24?) so that if we're very fragmented, we abandon this and only focus on the actual free space.

Yea. In unrelated context I was also recently thinking that we could do more when we reach some fragmentation or capacity threshold.

One annoying caveat about changing the weighting algorithm dynamically is that the assertions that the weight not decrease while a metaslab is unloaded means we have to be a little careful to design/test it with those constraints in mind.

include/sys/vdev_anyraid.h

module/zfs/vdev_anyraid.c

amotin · 2025-08-08T16:22:14Z

module/zfs/vdev_label.c

+	if (vd->vdev_parent->vdev_ops == &vdev_anyraid_ops) {
+		vdev_anyraid_write_map_sync(vd, zio, txg, good_writes, flags,
+		    status);


I suppose this will write all the maps every time? In addition to 2 one-sector uberblock writes per leaf vdev we now write up to 64MB per one?

Yes, we now write out the whole map every TXG. In theory that could be 64MiB, but in practice it's usually on the order of kilobytes; an anyraid with 32 disks, with an average of 256 tiles each, and 80% of them mapped would use 26KB. We have 64MB here because that's the maximum the mapping could possibly reach, not because we expect it to be anywhere close to that in practice.

We could add an optimization that doesn't update the map if nothing changed in a given txg; we'd still want to update the header, but the mapping itself could be left unmodified.

amotin · 2025-08-08T16:38:58Z

module/zfs/vdev_anyraid.c

+	void *buf = abd_borrow_buf(map_abd, SPA_MAXBLOCKSIZE);
+
+	rw_enter(&var->vd_lock, RW_READER);
+	anyraid_tile_t *cur = avl_first(&var->vd_tile_map);


As I understand, you have only one copy of a tile map, that covers all TXGs. It may be OK as long as you don't need precise accounting and not going to ever free the tiles. But what is not OK, I suppose, is that for writes done in open context (such as Direct I/O and ZIL) maps will not be written till the end of next committed TXG. Direct I/O might not care, but it may be impossible to replay the ZIL after crash if its blocks or blocks they reference were written to a new tiles that are not yet synced.

That's not correct, there are 4 copies per vdev, which rotate per txg. So there are plenty of copies of the map for each TXG, and there are 4 TXGs worth of maps.

As for the ZIL issue, that is a good point. We need to prevent ZIL blocks from ending up in unmapped tiles. While this probably wouldn't actually cause a problem in practice (when you import the pool again the tile would get mapped in the same way as before, since the vdev geometry hasn't changed), it definitely could in theory (if you had multiple new tiles in the same TXG, and they got remapped in a different order than they were originally).

The best fix I came up with is to prevent ZIL writes from being allocated to unmapped tiles in the first place. I also considered trying to stall those writes until the TXG synced, but that's slow and also technically annoying. I also considered having a journal of tile mappings that we would update immediately, but that adds a lot of complexity. Preventing the allocation nicely solves the problem, and if they can't find a place to allocate in all the mapped tiles, we already have logic to force the ZIL to fall back to txg_wait_synced.

tonyhutter · 2025-08-18T22:02:52Z

Overall this is a really nice feature! I haven't looked at the code yet, but did kick the tires a little and have some comments/questions.

Regarding:

The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger.

How did you arrive at the 16GiB min tile size? (forgive me if this is mentioned in the code comments) I ask, since it would be nice to have a smaller tile size to accommodate smaller vdevs (and give more free space, since it's rounded to tile-sized bounderies).

We should tell the user the minimum anyraid vdev size if they pass too small a vdev. Currently the error is:

$ sudo ./zpool create tank anyraid ./8gb_file1 ./8gb_file2
cannot create 'tank': one or more devices is out of space

We should document that autoexpand=on|off is ignored by anyraid to mitigate any confusion/ambiguity.
I was able to create an anyraid1 pool with an anyraid1 special device, which is nice. However, I could not create an anyraid1 pool with a mirror special device, even though they're the same redundancy level (special devices must have same redundancy level as the pool). We should update the checks to allow mirror/raidz/anyraid/anyraidz equivalent redundancy levels with special vdevs.
This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like?

anymirror, anymirror0, anymirror1, anymirror2

anyraidz, anyraidz1

That way there's no ambiguity if the anyraid TLD is a mirror or raidz flavor. It also opens the path to anyraidz1, which was mentioned in the Anyraid announcement:

"With ZFS AnyRaid, we will see at least two new layouts added: AnyRaid-Mirror and AnyRaid-Z1. The AnyRaid-Mirror feature will come first, and will allow users to have a pool of more than two disks of varying sizes while ensuring all data is written to two different disks. The AnyRaid-Z1 feature will apply the same concepts of ZFS RAID-Z1, but while supporting mixed size disks."

https://hexos.com/blog/introducing-zfs-anyraid-sponsored-by-eshtek

tonyhutter · 2025-08-18T23:55:27Z

I also noticed that the anyraid TLD names don't include the parity level. They all just say "anyraid":

	  anyraid-0                 ONLINE       0     0     0

We should have it match the raidz TLD convention where the parity level is included:

	  raidz1-0                  ONLINE       0     0     0
	  raidz2-0                  ONLINE       0     0     0
	  raidz3-0                  ONLINE       0     0     0

pcd1193182 · 2025-08-27T20:55:27Z

Regarding:

The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger.

How did you arrive at the 16GiB min tile size? (forgive me if this is mentioned in the code comments) I ask, since it would be nice to have a smaller tile size to accommodate smaller vdevs (and give more free space, since it's rounded to tile-sized bounderies).

16GiB was selected mostly because that would make the minimum line up with the standard fraction (1/64th) at a 1TiB disk. That's a nice round number, and a pretty reasonable size for "a normal size disk" these days. Anything less than 1TiB is definitely on the smaller side. The other effect of this value is that with this tile size, you can have any disk up to 1PiB in size and still be able to use all the space; any disk that's more than 2^24 tiles can't all be used.

It is possible to have smaller tile sizes; we do it in the test suite a bunch. There is a tunable, zfs_anyraid_min_tile_size, that controls this.

We should tell the user the minimum anyraid vdev size if they pass too small a vdev. Currently the error is:
$ sudo ./zpool create tank anyraid ./8gb_file1 ./8gb_file2
cannot create 'tank': one or more devices is out of space

That's fair, we could have a better error message for this case. I can work on that.

We should document that autoexpand=on|off is ignored by anyraid to mitigate any confusion/ambiguity.

I think autoexpand works like normal? It doesn't affect the tile size or anything, because the tile size is locked in immediately when the vdev is created, but it should affect the disk sizes like normal. Maybe the tile capacity doesn't change automatically? But that's probably a bug, if so. Did you run into this in your testing?

I was able to create an anyraid1 pool with an anyraid1 special device, which is nice. However, I could not create an anyraid1 pool with a mirror special device, even though they're the same redundancy level (special devices must have same redundancy level as the pool). We should update the checks to allow mirror/raidz/anyraid/anyraidz equivalent redundancy levels with special vdevs.

Interesting, I will investigate why that happened. Those should be able to mix for sure.

This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like?
anymirror, anymirror0, anymirror1, anymirror2

anyraidz, anyraidz1
That way there's no ambiguity if the anyraid TLD is a mirror or raidz flavor. It also opens the path to anyraidz1, which was mentioned in the Anyraid announcement:

...

I'm open to new naming options. My vague plan was to use anyraidz{1,2,3} for the RAID-Z-style parity when that support is added. But having mirror-parity have a clearer name does probably make sense. I'm open to anymirror; I was also think about anyraidm as a possibility.

I also noticed that the anyraid TLD names don't include the parity level. They all just say "anyraid":
	  anyraid-0                 ONLINE       0     0     0

Good point, I will fix that too.

junkbustr · 2025-08-30T05:20:08Z

This is a nit, but in the description I believe there is a typo:

"Anyraid works by diving each of the disks that makes up the vdev..."

I believe the intent was for dividing.

module/zfs/vdev_anyraid.c

include/sys/vdev_anyraid.h

tonyhutter · 2025-09-15T23:16:26Z

The validation logic will need to be tweaked to allow differing numbers of vdevs per anyraid TLD:

$ truncate -s 30G file1_30g
$ truncate -s 40G file2_40g
$ truncate -s 20G file3_20g
$ truncate -s 35G file4_35g
$ truncate -s 35G file5_35g
$ sudo ./zpool create tank anyraid ./file1_30g ./file2_40g ./file3_20g anyraid ./file4_35g ./file5_35g
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: both 3-way and 2-way anyraid vdevs are present

The anyraid TLD type string needs checks as well:

$ ./zpool create tank anyraid-this_should_not_work ./file1_30g
$ sudo ./zpool status
  pool: tank
 state: ONLINE
config:

	NAME                            STATE     READ WRITE CKSUM
	tank                            ONLINE       0     0     0
	  anyraid0-0                    ONLINE       0     0     0
	    /home/hutter/zfs/file1_30g  ONLINE       0     0     0

errors: No known data errors

Regarding:

This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like? anymirror, anymirror0, anymirror1, anymirror2, anyraidz, anyraidz1

I'm open to new naming options. My vague plan was to use anyraidz{1,2,3} for the RAID-Z-style parity when that support is added. But having mirror-parity have a clearer name does probably make sense. I'm open to anymirror; I was also think about anyraidm as a possibility.

I prefer the anymirror name over anyraidm, just to keep convention with mirror. Same with my preference for the future anyraidz name for the same reasons.

I don't know if this has anything to do with this PR, but I notice the rep_dev_size in the JSON was a little weird. Here I create an anyraid pool with 30GB, 40GB, and 20GB vdevs:

$ sudo ./zpool status -j | jq
 ...
              "vdevs": {
                "/home/hutter/zfs/file1_30g": {
                  "name": "/home/hutter/zfs/file1_30g",
                  "vdev_type": "file",
                  "guid": "2550367119017510955",
                  "path": "/home/hutter/zfs/file1_30g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "16.3G",
                  "phys_space": "30G",
...
                },
                "/home/hutter/zfs/file2_40g": {
                  "name": "/home/hutter/zfs/file2_40g",
                  "vdev_type": "file",
                  "guid": "17589174087940051454",
                  "path": "/home/hutter/zfs/file2_40g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "16.3G",
                  "phys_space": "40G",
...
                },
                "/home/hutter/zfs/file3_20g": {
                  "name": "/home/hutter/zfs/file3_20g",
                  "vdev_type": "file",
                  "guid": "6265258539420333029",
                  "path": "/home/hutter/zfs/file3_20g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "261M",
                  "phys_space": "20G",
...

I'm guessing first two vdevs report a rep_dev_size of 16.3G due to tile alignment. What I don't get is the 261M value for the 3rd vdev. I would have expected a 16.3GB value there.

Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Signed-off-by: Paul Dagnelie <[email protected]>

tonyhutter · 2025-10-03T20:46:11Z

cmd/zdb/zdb.c

+static int
+log_10(uint64_t v) {
+	char buf[32];
+	snprintf(buf, sizeof (buf), "%llu", (u_longlong_t)v);
+	return (strlen(buf));
+}


Could you use log10() from math.h?

tonyhutter · 2025-10-03T20:52:54Z

include/sys/fs/zfs.h

 #define	ZPOOL_CONFIG_DRAID_NGROUPS	"draid_ngroups"

+/* ANYRAID configuration */
+#define	ZPOOL_CONFIG_ANYRAID_PARITY_TYPE	"parity_type"


nit: should the string be "anyraid_parity_type", just so it's clear if someone is printing the config?

tonyhutter · 2025-10-03T21:06:44Z

module/zcommon/zfeature_common.c

 	}

+	zfeature_register(SPA_FEATURE_ANYRAID,
+	    "com.klarasystems:anyraid", "anyraid", "Support for anyraid VDEV",


I like that the overarching name for anymirror+anyraid is called "anyraid". For this feature flag though, do we need to be more specific and call it "anymirror", and then add another feature flag for the actual raid part follow-on ("anyraid")? Or can we use the same feature flag for both anymirror and the future anyraid?

tonyhutter · 2025-10-03T21:14:55Z

module/zfs/vdev_anyraid.c

+ * Initialize private VDEV specific fields from the nvlist.
+ */
+static int
+vdev_anyraid_init(spa_t *spa, nvlist_t *nv, void **tsd)


Should this be?:

- vdev_anyraid_init(spa_t *spa, nvlist_t *nv, void **tsd) + vdev_anyraid_init(spa_t *spa, nvlist_t *nv, vdev_anyraid_t **tsd)

tonyhutter · 2025-10-03T21:32:46Z

tests/zfs-tests/include/libtest.shlib

 {
 	echo $(zpool iostat -v $1 | awk '(NR > 4) {print $1}' | \
-	    grep -vEe '^-----' -e "^(mirror|raidz[1-3]|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")
+	    grep -vEe '^-----' -e "^(mirror|raidz[1-3]|anyraid|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")


Should be anymirror[0-3]?

tonyhutter · 2025-10-03T21:52:08Z

tests/zfs-tests/tests/functional/anyraid/anyraid_special_vdev_001_pos.ksh

+	log_must zpool create -f $TESTPOOL anymirror$parity $disks special mirror $sdisks
+	log_must poolexists $TESTPOOL
+
+	log_must dd if=/dev/urandom of=/$TESTPOOL/file.bin bs=1M count=128


You'll want to write two files, with one file's writes being small enough to land on special, and the other's large enough to land on anymirror. You can set special_small_blocks to threshold the size of writes for the special device. You can also write much less than 128MB here to make the test run faster.

tonyhutter · 2025-10-03T21:55:17Z

tests/zfs-tests/tests/functional/anyraid/anyraid_special_vdev_002_pos.ksh

+# Verify a variety of AnyRAID pools with a special VDEV AnyRAID.
+#
+# STRATEGY:
+# 1. Create an AnyRAID pool with a special VDEV AnyRAID.


This is very similiar to anyraid_special_vdev_001_pos.ksh You could probably combine the two tests. Also, you could try with a dedup device as well.

tonyhutter · 2025-10-03T21:58:14Z

tests/zfs-tests/tests/functional/cli_root/zfs_mount/zfs_mount.kshlib

 	if [[ $vdev != "" && \
 		$vdev != "mirror" && \
 		$vdev != "raidz" && \
+		$vdev != "anyraid" && \


"anymirror" ?

tonyhutter · 2025-10-03T22:00:00Z

tests/zfs-tests/tests/functional/cli_root/zpool_add/zpool_add_009_neg.ksh

+log_mustnot zpool add -f $TESTPOOL $disk0

-for type in "" "mirror" "raidz" "draid" "spare" "log" "dedup" "special" "cache"
+for type in "" "mirror" "raidz" "anyraid" "draid" "spare" "log" "dedup" "special" "cache"


"anymirror"?

tonyhutter · 2025-10-03T22:06:03Z

tests/zfs-tests/tests/functional/cli_root/zpool_get/zpool_get.cfg

    "feature@redaction_list_spill"
    "feature@dynamic_gang_header"
    "feature@physical_rewrite"
+    "feature@anyraid"


"feature@anymirror"?

tonyhutter · 2025-10-03T22:13:52Z

...s/tests/functional/cli_root/zpool_initialize/zpool_initialize_fault_export_import_online.ksh

+for type in "mirror" "anymirror1"; do
+	log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2
+	if [[ "$type" == "anymirror1" ]]; then
+		log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k


Is this dd necessary? 1:52 of the total 1:55 test time is the dd:

19:33:02.54 SUCCESS: zpool create -f testpool anymirror1 loop0 loop1 19:34:54.98 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=2k

tonyhutter · 2025-10-03T22:16:37Z

...tests/functional/cli_root/zpool_initialize/zpool_initialize_offline_export_import_online.ksh

-progress="$(initialize_progress $TESTPOOL $DISK1)"
-[[ -z "$progress" ]] && log_fail "Initializing did not start"
-log_mustnot eval "initialize_prog_line $TESTPOOL $DISK1 | grep suspended"
+	if [[ "$type" =~ "anyraid" ]]; then


if [[ "$type" =~ "anymirror" ]]; then

tonyhutter · 2025-10-03T22:20:51Z

tests/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_online_offline.ksh

+		log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2
+	else
+		log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
+		log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=400


/dev/urandom isn't that fast:

19:35:40.29 SUCCESS: zpool create -f testpool anymirror1 loop0 loop1 loop2 19:35:51.26 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=400

Consider using file_write -d R ... instead.

tonyhutter · 2025-10-03T22:22:37Z

...s-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_start_and_cancel_neg.ksh

+	log_must zpool list -v
+	log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
+	if [[ "$type" == "anymirror2" ]]; then
+		log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k


dd here is slow, consider alternatives.

19:36:10.46 SUCCESS: zpool create -f testpool anymirror2 loop0 loop1 loop2 19:38:23.81 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=2k

tonyhutter · 2025-10-03T22:23:25Z

tests/zfs-tests/tests/functional/cli_root/zpool_initialize/zpool_initialize_uninit.ksh

+	log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
+	status_check_all $TESTPOOL "uninitialized"
+	if [[ "$type" == "anymirror1" ]]; then
+		log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k


Slow dd, consider alternatives.

tonyhutter · 2025-10-03T22:34:58Z

tests/zfs-tests/tests/functional/anyraid/anyraid_faildisk_write_replace_resilver.ksh

+	log_must zfs set primarycache=none $TESTPOOL
+
+	# Write initial data
+	log_must dd if=/dev/urandom of=/$TESTPOOL/file1.bin bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))


slow dd

18:45:08.30 SUCCESS: zfs set primarycache=none testpool 18:45:33.54 SUCCESS: dd if=/dev/urandom of=/testpool/file1.bin bs=1M count=2048

tonyhutter · 2025-10-03T22:35:17Z

tests/zfs-tests/tests/functional/anyraid/anyraid_faildisk_write_replace_resilver.ksh

+
+	# Read initial data, write new data
+	dd if=/$TESTPOOL/file1.bin of=/dev/null bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))
+	log_must dd if=/dev/urandom of=/$TESTPOOL/file1.bin bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))


github-advanced-security bot found potential problems Jul 25, 2025

View reviewed changes

cmd/zdb/zdb.c Fixed Show fixed Hide fixed

module/zfs/vdev_anyraid.c Fixed Show fixed Hide fixed

pcd1193182 force-pushed the anyraid branch from e28f7cc to ed1b6b5 Compare July 25, 2025 17:40

behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Jul 25, 2025

allanjude requested review from behlendorf, grwilson, ahrens and amotin July 25, 2025 18:16

pcd1193182 force-pushed the anyraid branch 4 times, most recently from d8526e8 to c3b8110 Compare August 7, 2025 20:05

amotin reviewed Aug 7, 2025

View reviewed changes

amotin reviewed Aug 8, 2025

View reviewed changes

amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Aug 8, 2025

pcd1193182 force-pushed the anyraid branch from c3b8110 to ab47571 Compare September 2, 2025 21:14

github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Sep 2, 2025

pcd1193182 force-pushed the anyraid branch 3 times, most recently from 64b9223 to ae25ac3 Compare September 5, 2025 18:16

amotin reviewed Sep 8, 2025

View reviewed changes

module/zfs/vdev_anyraid.c Outdated Show resolved Hide resolved

pcd1193182 force-pushed the anyraid branch 4 times, most recently from 6e96d25 to 7c87ca3 Compare September 10, 2025 22:35

amotin reviewed Sep 11, 2025

View reviewed changes

module/zfs/vdev_anyraid.c Outdated Show resolved Hide resolved

include/sys/vdev_anyraid.h Outdated Show resolved Hide resolved

Paul Dagnelie added 12 commits October 3, 2025 10:28

Add weight biasing to segment based metaslabs

940d188

Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Change vdev ops to support anyraid

cddacb6

Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

New spa_misc functions for anyraid

0f61913

Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Anyraid implementation

7053f14

Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Implement rebuild support

42e2a60

Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Add support for anyraid in vdev properties

de47c72

Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

Add man page entry

ff7a0f9

Signed-off-by: Paul Dagnelie <[email protected]> Sponsored-by: Eshtek, creators of HexOS Sponsored-by: Klara, Inc.

improve byteswap logic

f3f9c60

Signed-off-by: Paul Dagnelie <[email protected]>

Use zinject to try to make test fully reliable

1f9da24

Signed-off-by: Paul Dagnelie <[email protected]>

Final byteswap handling

47c91c2

Signed-off-by: Paul Dagnelie <[email protected]>

Tony's feedback

4d0397d

Signed-off-by: Paul Dagnelie <[email protected]>

Fix test failures

2377cdc

Signed-off-by: Paul Dagnelie <[email protected]>

pcd1193182 force-pushed the anyraid branch from 00234aa to 2377cdc Compare October 3, 2025 18:17

tonyhutter reviewed Oct 3, 2025

View reviewed changes

Add support for anyraid vdevs #17567

Are you sure you want to change the base?

Add support for anyraid vdevs #17567

Conversation

pcd1193182 commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotin Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amotin Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amotin Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tonyhutter commented Aug 18, 2025

Uh oh!

tonyhutter commented Aug 18, 2025

Uh oh!

pcd1193182 commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junkbustr commented Aug 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tonyhutter commented Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tonyhutter Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

pcd1193182 commented Jul 25, 2025 •

edited

Loading

amotin Aug 8, 2025 •

edited

Loading

amotin Aug 8, 2025 •

edited

Loading

amotin Aug 8, 2025 •

edited

Loading

pcd1193182 commented Aug 27, 2025 •

edited

Loading

junkbustr commented Aug 30, 2025 •

edited

Loading

tonyhutter commented Sep 15, 2025 •

edited

Loading

tonyhutter Oct 3, 2025 •

edited

Loading