Skip to content

Conversation

pcd1193182
Copy link
Contributor

@pcd1193182 pcd1193182 commented Jul 25, 2025

Sponsored by: Eshtek, creators of HexOS; Klara, Inc.

Motivation and Context

For industry/commercial use cases, the existing redundancy solutions in ZFS (mirrors and RAIDZ) work great. They provide high performance, reliable, efficient storage options. For enthusiast users, however, they have a drawback. RAIDZ and mirrors will use the size of the smallest drive that is part of the vdev as the size of every drive, so they can provide their reliability guarantees. If you can afford to buy a new box of drives for your pool, like large-scale enterprise users, that's fine. But if you already have a mix of hard drives, of various sizes, and you want to use all of the space they have available while still benefiting from ZFS's reliability and featureset, there isn't currently a great solution for that problem.

Description

The goal of Anyraid is to fill that niche. Anyraid allows devices of mismatched sizes to be combined together into a single top-level vdev. In the current version, Anyraid only supports mirror-type parity, but raidz-type parity is planned for the near future.

Anyraid works by dividing each of the disks that makes up the vdev into tiles. These tiles are the same size across all disks within a given anyraid vdev. The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger. These tiles are then combined together to form the logical vdev that anyraid presents, with sets of tiles from different disks acting as mini-mirrors, allowing the reliability guarantees to be preserved. Tiles are allocated on demand; when a write comes into a part of the logical vdev that doesn't have backing tiles yet, the Anyraid logic picks the nparity + 1 disks with the most unallocated tiles, and allocates one tile from each of them. These physical tiles are combined together into one logic tile, which is used to store data for that section of the logical vdev.

One important note with this design is that we need to understand this mapping from logical offset to tiles (and therefore to actual physical disk locations) in order to read anything from the pool. As a result, we cannot store the mapping in the MOS, since that would result in a bootstrap problem. To solve this issue, we allocate a region at the start of each disk where we store the Anyraid tile map. The tile map is made up of 4 copies of all the data necessary to reconstruct the tile map. These copies are updated in rotating order, like uberblocks. In addition, each disk has a full copy of all 4 maps, ensuring that as long as any drive's copy survives, the tile map for a given TXG can be read successfully. The size of one copy of the tile map is 64MiB; that size determines the maximum number of tiles an anyraid vdev can have, which is 2^24. This is made up of up to 2^8 disks, and up to 2^16 tiles per disk. This does mean that the largest device that can be fully used by an anyraid vdev is 1024 times the size of the smallest disk that was present at vdev creation time. This was considered to be an acceptable tradeoff, though it is a limit that could be alleviated in the future if needed; the primary difficulty is that either the tile map needs to grow substantially, or logic needs to be added to handle/prevent the tile map filling up.

Anyraid vdevs support all the operations that normal vdevs do. They can be resilvered, removed, and scrubbed. They also support expansion; new drives can be attached to the anyraid vdev, and their tiles will be used in future allocations. There is currently no support for rebalancing tiles onto new devices, although that is also planned. VDEV Contraction is also planned for the future.

New ZDB functionality was added to print out information about the anyraid mapping, to aid in debugging and understanding. A number of tests were also added, and ztest support for the new type of vdev was implemented.

How Has This Been Tested?

In addition to the tests added to the test suite and zloop runs, I also ran many manual tests of unusual configurations to verify that the tile layout behaves correctly. There was also some basic performance testing to verify that nothing was obviously wrong. Performance is not the primary design goal of anyraid, however, so in-depth analysis was not performed.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Design Review Needed Architecture or design is under discussion label Jul 25, 2025
@pcd1193182 pcd1193182 force-pushed the anyraid branch 4 times, most recently from d8526e8 to c3b8110 Compare August 7, 2025 20:05
vdev_t *vd = mg->mg_vd;
if (B_FALSE) {
weight = 2 * weight - (msp->ms_id * weight) / vd->vdev_ms_count;
weight = MIN(weight, METASLAB_MAX_WEIGHT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am suspicious about this math:

  1. Unlike seems like linear space-based weight, segment-based weight is exponential. So doubling weight of the first metaslabs, you actually exponentially increasing it. I.e, if the first metaslab has only one free segment of only 1MB, and so weight with INDEX=20 and COUNT=1, doubling that will give INDEX=40 and COUNT=2, which would make it absolutely unbeatable for the most metaslabs, since it would require a free segment of up to 1TB.
  2. all free metaslabs are identical and selected based on their offset, which is OK, if it is expected, but they also don't go through this path, so they have a very small chance to ever be used until all the earlier metaslabs are filled to the brim.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah doubling is probably overkill for this. But we do need something that will interpolate nicely later into the vdev. Perhaps what we do is something like "Add 3 - ((ms_id * 4) / vdev_ms_count) to the index and add 20 - (((ms_id * 20) / (vdev_ms_count / 5)) % 20) to the count? So for the first quarter we would add 3 to the count, then 2, 1, and 0. And within each quarter, we would add 19 to the first metaslab, 18 to the second, etc. It's not ideal, since larger vdevs will have large plateaus where the modifications are the same, but that's probably alright. We just want to try to concentrate writes generally earlier, a little mixing in adjacent metaslabs on large pools is probably fine. And with the largest index difference being 3, the last metaslabs only needs to have segments 8x as large as the first one to compete.

Copy link
Member

@amotin amotin Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For purposes of spinning disks performance we don't really need a smooth curve. Having just several "speed zones" would be enough to get the most of performance. I propose to leave the first zone as it is, for the second zone to subtract 1 from index, while doubling the count to keep it equivalent, for the third zone subtract another one from index and again double the count, etc. This logic may not be great beyond 3-4 zones, but IMO that should be enough to get the most of speed, and I think it could actually be applied independently of anyraid to any rotating vdevs. We may not apply this to metaslabs with index below some threshold, since sequential speed there should not worth much, and we should better focus on lower fragmentation.

For purposes of anyraid's tiles allocation I think you only need to a certain degree prefer already used metaslabs to empty ones. Once some metaslabs are used, I am not sure (yet?) why would you really prefer one used metaslab to another, since single used block would be enough for tile to stay allocated no matter what. So I think it may be enough to just account free metaslabs not as one free segment (index of size and count of 1, that makes it unbeatable now), but split it few times in a way I described above for HDDs to a level just below (or equal to) the last speed zone of used metaslabs. Free metaslabs do not need zones, since current code already sort identical metaslabs by offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we probably don't need gradations beyond the few top-level speed zones. I think the simplest algorithm that satisfies our goals is to just add N - 1 to the index in the first zone, N - 2 in the second, etc, for N zones. That will prefer earlier metaslabs, and since untouched metaslabs won't hit this code, they will naturally end up with nothing added to the index, and end up sorting with everything in the final zone (and then earlier ones will sort first, as you said). I see the idea behind decreasing the index and doubling the count to match, but I don't think that's actually important; we don't multiply the segment size and count together to get the available space anywhere, so we don't need to preserve that relationship.

This prefers earlier metaslabs for the general rotational performance bonus, and prefers already used metaslabs for anyraid. We can also disable this logic if the index is below a critical value (24?) so that if we're very fragmented, we abandon this and only focus on the actual free space.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also disable this logic if the index is below a critical value (24?) so that if we're very fragmented, we abandon this and only focus on the actual free space.

Yea. In unrelated context I was also recently thinking that we could do more when we reach some fragmentation or capacity threshold.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One annoying caveat about changing the weighting algorithm dynamically is that the assertions that the weight not decrease while a metaslab is unloaded means we have to be a little careful to design/test it with those constraints in mind.

Comment on lines +1880 to +1884
if (vd->vdev_parent->vdev_ops == &vdev_anyraid_ops) {
vdev_anyraid_write_map_sync(vd, zio, txg, good_writes, flags,
status);
Copy link
Member

@amotin amotin Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this will write all the maps every time? In addition to 2 one-sector uberblock writes per leaf vdev we now write up to 64MB per one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we now write out the whole map every TXG. In theory that could be 64MiB, but in practice it's usually on the order of kilobytes; an anyraid with 32 disks, with an average of 256 tiles each, and 80% of them mapped would use 26KB. We have 64MB here because that's the maximum the mapping could possibly reach, not because we expect it to be anywhere close to that in practice.

We could add an optimization that doesn't update the map if nothing changed in a given txg; we'd still want to update the header, but the mapping itself could be left unmodified.

void *buf = abd_borrow_buf(map_abd, SPA_MAXBLOCKSIZE);

rw_enter(&var->vd_lock, RW_READER);
anyraid_tile_t *cur = avl_first(&var->vd_tile_map);
Copy link
Member

@amotin amotin Aug 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand, you have only one copy of a tile map, that covers all TXGs. It may be OK as long as you don't need precise accounting and not going to ever free the tiles. But what is not OK, I suppose, is that for writes done in open context (such as Direct I/O and ZIL) maps will not be written till the end of next committed TXG. Direct I/O might not care, but it may be impossible to replay the ZIL after crash if its blocks or blocks they reference were written to a new tiles that are not yet synced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct, there are 4 copies per vdev, which rotate per txg. So there are plenty of copies of the map for each TXG, and there are 4 TXGs worth of maps.

As for the ZIL issue, that is a good point. We need to prevent ZIL blocks from ending up in unmapped tiles. While this probably wouldn't actually cause a problem in practice (when you import the pool again the tile would get mapped in the same way as before, since the vdev geometry hasn't changed), it definitely could in theory (if you had multiple new tiles in the same TXG, and they got remapped in a different order than they were originally).

The best fix I came up with is to prevent ZIL writes from being allocated to unmapped tiles in the first place. I also considered trying to stall those writes until the TXG synced, but that's slow and also technically annoying. I also considered having a journal of tile mappings that we would update immediately, but that adds a lot of complexity. Preventing the allocation nicely solves the problem, and if they can't find a place to allocate in all the mapped tiles, we already have logic to force the ZIL to fall back to txg_wait_synced.

@amotin amotin added the Status: Revision Needed Changes are required for the PR to be accepted label Aug 8, 2025
@tonyhutter
Copy link
Contributor

Overall this is a really nice feature! I haven't looked at the code yet, but did kick the tires a little and have some comments/questions.

  1. Regarding:

The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger.

How did you arrive at the 16GiB min tile size? (forgive me if this is mentioned in the code comments) I ask, since it would be nice to have a smaller tile size to accommodate smaller vdevs (and give more free space, since it's rounded to tile-sized bounderies).

  1. We should tell the user the minimum anyraid vdev size if they pass too small a vdev. Currently the error is:
$ sudo ./zpool create tank anyraid ./8gb_file1 ./8gb_file2
cannot create 'tank': one or more devices is out of space
  1. We should document that autoexpand=on|off is ignored by anyraid to mitigate any confusion/ambiguity.

  2. I was able to create an anyraid1 pool with an anyraid1 special device, which is nice. However, I could not create an anyraid1 pool with a mirror special device, even though they're the same redundancy level (special devices must have same redundancy level as the pool). We should update the checks to allow mirror/raidz/anyraid/anyraidz equivalent redundancy levels with special vdevs.

  3. This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like?

anymirror, anymirror0, anymirror1, anymirror2

anyraidz, anyraidz1

That way there's no ambiguity if the anyraid TLD is a mirror or raidz flavor. It also opens the path to anyraidz1, which was mentioned in the Anyraid announcement:

"With ZFS AnyRaid, we will see at least two new layouts added: AnyRaid-Mirror and AnyRaid-Z1. The AnyRaid-Mirror feature will come first, and will allow users to have a pool of more than two disks of varying sizes while ensuring all data is written to two different disks. The AnyRaid-Z1 feature will apply the same concepts of ZFS RAID-Z1, but while supporting mixed size disks."

https://hexos.com/blog/introducing-zfs-anyraid-sponsored-by-eshtek

@tonyhutter
Copy link
Contributor

I also noticed that the anyraid TLD names don't include the parity level. They all just say "anyraid":

	  anyraid-0                 ONLINE       0     0     0

We should have it match the raidz TLD convention where the parity level is included:

	  raidz1-0                  ONLINE       0     0     0
	  raidz2-0                  ONLINE       0     0     0
	  raidz3-0                  ONLINE       0     0     0

@pcd1193182
Copy link
Contributor Author

pcd1193182 commented Aug 27, 2025

  1. Regarding:

The size of a tile is 1/64th of the size of the smallest disk present at creation time, or 16GiB, whichever is larger.

How did you arrive at the 16GiB min tile size? (forgive me if this is mentioned in the code comments) I ask, since it would be nice to have a smaller tile size to accommodate smaller vdevs (and give more free space, since it's rounded to tile-sized bounderies).

16GiB was selected mostly because that would make the minimum line up with the standard fraction (1/64th) at a 1TiB disk. That's a nice round number, and a pretty reasonable size for "a normal size disk" these days. Anything less than 1TiB is definitely on the smaller side. The other effect of this value is that with this tile size, you can have any disk up to 1PiB in size and still be able to use all the space; any disk that's more than 2^24 tiles can't all be used.

It is possible to have smaller tile sizes; we do it in the test suite a bunch. There is a tunable, zfs_anyraid_min_tile_size, that controls this.

  1. We should tell the user the minimum anyraid vdev size if they pass too small a vdev. Currently the error is:
$ sudo ./zpool create tank anyraid ./8gb_file1 ./8gb_file2
cannot create 'tank': one or more devices is out of space

That's fair, we could have a better error message for this case. I can work on that.

  1. We should document that autoexpand=on|off is ignored by anyraid to mitigate any confusion/ambiguity.

I think autoexpand works like normal? It doesn't affect the tile size or anything, because the tile size is locked in immediately when the vdev is created, but it should affect the disk sizes like normal. Maybe the tile capacity doesn't change automatically? But that's probably a bug, if so. Did you run into this in your testing?

  1. I was able to create an anyraid1 pool with an anyraid1 special device, which is nice. However, I could not create an anyraid1 pool with a mirror special device, even though they're the same redundancy level (special devices must have same redundancy level as the pool). We should update the checks to allow mirror/raidz/anyraid/anyraidz equivalent redundancy levels with special vdevs.

Interesting, I will investigate why that happened. Those should be able to mix for sure.

  1. This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like?
anymirror, anymirror0, anymirror1, anymirror2

anyraidz, anyraidz1

That way there's no ambiguity if the anyraid TLD is a mirror or raidz flavor. It also opens the path to anyraidz1, which was mentioned in the Anyraid announcement:

...

I'm open to new naming options. My vague plan was to use anyraidz{1,2,3} for the RAID-Z-style parity when that support is added. But having mirror-parity have a clearer name does probably make sense. I'm open to anymirror; I was also think about anyraidm as a possibility.

I also noticed that the anyraid TLD names don't include the parity level. They all just say "anyraid":

	  anyraid-0                 ONLINE       0     0     0

Good point, I will fix that too.

@junkbustr
Copy link

junkbustr commented Aug 30, 2025

This is a nit, but in the description I believe there is a typo:

"Anyraid works by diving each of the disks that makes up the vdev..."

I believe the intent was for dividing.

@github-actions github-actions bot removed the Status: Revision Needed Changes are required for the PR to be accepted label Sep 2, 2025
@pcd1193182 pcd1193182 force-pushed the anyraid branch 3 times, most recently from 64b9223 to ae25ac3 Compare September 5, 2025 18:16
@pcd1193182 pcd1193182 force-pushed the anyraid branch 4 times, most recently from 6e96d25 to 7c87ca3 Compare September 10, 2025 22:35
@tonyhutter
Copy link
Contributor

tonyhutter commented Sep 15, 2025

  1. The validation logic will need to be tweaked to allow differing numbers of vdevs per anyraid TLD:
$ truncate -s 30G file1_30g
$ truncate -s 40G file2_40g
$ truncate -s 20G file3_20g
$ truncate -s 35G file4_35g
$ truncate -s 35G file5_35g
$ sudo ./zpool create tank anyraid ./file1_30g ./file2_40g ./file3_20g anyraid ./file4_35g ./file5_35g
invalid vdev specification
use '-f' to override the following errors:
mismatched replication level: both 3-way and 2-way anyraid vdevs are present
  1. The anyraid TLD type string needs checks as well:
$ ./zpool create tank anyraid-this_should_not_work ./file1_30g
$ sudo ./zpool status
  pool: tank
 state: ONLINE
config:

	NAME                            STATE     READ WRITE CKSUM
	tank                            ONLINE       0     0     0
	  anyraid0-0                    ONLINE       0     0     0
	    /home/hutter/zfs/file1_30g  ONLINE       0     0     0

errors: No known data errors
  1. Regarding:

This PR uses anyraid, anyraid0, anyraid1, anyraid2 naming for the TLD type. What if we copied the current "mirror"/"raidz" naming convention, like? anymirror, anymirror0, anymirror1, anymirror2, anyraidz, anyraidz1

I'm open to new naming options. My vague plan was to use anyraidz{1,2,3} for the RAID-Z-style parity when that support is added. But having mirror-parity have a clearer name does probably make sense. I'm open to anymirror; I was also think about anyraidm as a possibility.

I prefer the anymirror name over anyraidm, just to keep convention with mirror. Same with my preference for the future anyraidz name for the same reasons.

  1. I don't know if this has anything to do with this PR, but I notice the rep_dev_size in the JSON was a little weird. Here I create an anyraid pool with 30GB, 40GB, and 20GB vdevs:
$ sudo ./zpool status -j | jq
 ...
              "vdevs": {
                "/home/hutter/zfs/file1_30g": {
                  "name": "/home/hutter/zfs/file1_30g",
                  "vdev_type": "file",
                  "guid": "2550367119017510955",
                  "path": "/home/hutter/zfs/file1_30g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "16.3G",
                  "phys_space": "30G",
...
                },
                "/home/hutter/zfs/file2_40g": {
                  "name": "/home/hutter/zfs/file2_40g",
                  "vdev_type": "file",
                  "guid": "17589174087940051454",
                  "path": "/home/hutter/zfs/file2_40g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "16.3G",
                  "phys_space": "40G",
...
                },
                "/home/hutter/zfs/file3_20g": {
                  "name": "/home/hutter/zfs/file3_20g",
                  "vdev_type": "file",
                  "guid": "6265258539420333029",
                  "path": "/home/hutter/zfs/file3_20g",
                  "class": "normal",
                  "state": "ONLINE",
                  "rep_dev_size": "261M",
                  "phys_space": "20G",
...

I'm guessing first two vdevs report a rep_dev_size of 16.3G due to tile alignment. What I don't get is the 261M value for the 3rd vdev. I would have expected a 16.3GB value there.

Paul Dagnelie added 12 commits October 3, 2025 10:28
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
Sponsored-by: Eshtek, creators of HexOS
Sponsored-by: Klara, Inc.
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Signed-off-by: Paul Dagnelie <[email protected]>
Comment on lines +9330 to +9335
static int
log_10(uint64_t v) {
char buf[32];
snprintf(buf, sizeof (buf), "%llu", (u_longlong_t)v);
return (strlen(buf));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use log10() from math.h?

#define ZPOOL_CONFIG_DRAID_NGROUPS "draid_ngroups"

/* ANYRAID configuration */
#define ZPOOL_CONFIG_ANYRAID_PARITY_TYPE "parity_type"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should the string be "anyraid_parity_type", just so it's clear if someone is printing the config?

}

zfeature_register(SPA_FEATURE_ANYRAID,
"com.klarasystems:anyraid", "anyraid", "Support for anyraid VDEV",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the overarching name for anymirror+anyraid is called "anyraid". For this feature flag though, do we need to be more specific and call it "anymirror", and then add another feature flag for the actual raid part follow-on ("anyraid")? Or can we use the same feature flag for both anymirror and the future anyraid?

* Initialize private VDEV specific fields from the nvlist.
*/
static int
vdev_anyraid_init(spa_t *spa, nvlist_t *nv, void **tsd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be?:

- vdev_anyraid_init(spa_t *spa, nvlist_t *nv, void **tsd)
+ vdev_anyraid_init(spa_t *spa, nvlist_t *nv, vdev_anyraid_t **tsd)

{
echo $(zpool iostat -v $1 | awk '(NR > 4) {print $1}' | \
grep -vEe '^-----' -e "^(mirror|raidz[1-3]|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")
grep -vEe '^-----' -e "^(mirror|raidz[1-3]|anyraid|draid[1-3]|spare|log|cache|special|dedup)|\-[0-9]$")
Copy link
Contributor

@tonyhutter tonyhutter Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be anymirror[0-3]?

log_must zpool create -f $TESTPOOL anymirror$parity $disks special mirror $sdisks
log_must poolexists $TESTPOOL

log_must dd if=/dev/urandom of=/$TESTPOOL/file.bin bs=1M count=128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to write two files, with one file's writes being small enough to land on special, and the other's large enough to land on anymirror. You can set special_small_blocks to threshold the size of writes for the special device. You can also write much less than 128MB here to make the test run faster.

# Verify a variety of AnyRAID pools with a special VDEV AnyRAID.
#
# STRATEGY:
# 1. Create an AnyRAID pool with a special VDEV AnyRAID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very similiar to anyraid_special_vdev_001_pos.ksh You could probably combine the two tests. Also, you could try with a dedup device as well.

if [[ $vdev != "" && \
$vdev != "mirror" && \
$vdev != "raidz" && \
$vdev != "anyraid" && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anymirror" ?

log_mustnot zpool add -f $TESTPOOL $disk0

for type in "" "mirror" "raidz" "draid" "spare" "log" "dedup" "special" "cache"
for type in "" "mirror" "raidz" "anyraid" "draid" "spare" "log" "dedup" "special" "cache"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"anymirror"?

"feature@redaction_list_spill"
"feature@dynamic_gang_header"
"feature@physical_rewrite"
"feature@anyraid"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"feature@anymirror"?

for type in "mirror" "anymirror1"; do
log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2
if [[ "$type" == "anymirror1" ]]; then
log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this dd necessary? 1:52 of the total 1:55 test time is the dd:

19:33:02.54 SUCCESS: zpool create -f testpool anymirror1 loop0 loop1             
19:34:54.98 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=2k 

progress="$(initialize_progress $TESTPOOL $DISK1)"
[[ -z "$progress" ]] && log_fail "Initializing did not start"
log_mustnot eval "initialize_prog_line $TESTPOOL $DISK1 | grep suspended"
if [[ "$type" =~ "anyraid" ]]; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if [[ "$type" =~ "anymirror" ]]; then

log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2
else
log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=400
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/dev/urandom isn't that fast:

19:35:40.29 SUCCESS: zpool create -f testpool anymirror1 loop0 loop1 loop2       
19:35:51.26 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=400  

Consider using file_write -d R ... instead.

log_must zpool list -v
log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
if [[ "$type" == "anymirror2" ]]; then
log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dd here is slow, consider alternatives.

19:36:10.46 SUCCESS: zpool create -f testpool anymirror2 loop0 loop1 loop2       
19:38:23.81 SUCCESS: dd if=/dev/urandom of=/testpool/f1 bs=1M count=2k  

log_must zpool create -f $TESTPOOL $type $DISK1 $DISK2 $DISK3
status_check_all $TESTPOOL "uninitialized"
if [[ "$type" == "anymirror1" ]]; then
log_must dd if=/dev/urandom of=/$TESTPOOL/f1 bs=1M count=2k
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slow dd, consider alternatives.

log_must zfs set primarycache=none $TESTPOOL

# Write initial data
log_must dd if=/dev/urandom of=/$TESTPOOL/file1.bin bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slow dd

18:45:08.30 SUCCESS: zfs set primarycache=none testpool                          
18:45:33.54 SUCCESS: dd if=/dev/urandom of=/testpool/file1.bin bs=1M count=2048  


# Read initial data, write new data
dd if=/$TESTPOOL/file1.bin of=/dev/null bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))
log_must dd if=/dev/urandom of=/$TESTPOOL/file1.bin bs=1M count=$(( DEVSIZE / 2 / 1024 / 1024 ))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slow dd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Design Review Needed Architecture or design is under discussion
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants