Sunday, 15 May 2022

The journey to ZFS raidz1 with different sized disks (On NetBSD) (Wheelbarrow optional)

The joy of having a redundant remote backup machine is that if it dies you do not immediately lose any data, just some redundancy.

On the other hand, sufficient remoteness can make the process of rebuilding annoying enough to encourage a burning desire to avoid repeating it.

So... I find myself wanting to setup a NetBSD machine with a ZFS raidz1 - but without sufficient equally sized large disks or enough drive bays to pack in many smaller disks. In the world of ZFS, that's a pretty big liability.

So, what are our assets?

  • Five drive bays
  • One 8TB disk
  • Two 6TB disks
  • A seemingly unlimited number of 2TB and smaller disks

That's it? Impossible. If I had a month to plan, maybe I could come up with something. But this... I mean, if we only had a wheelbarrow, that would be something.

Now the question of how to use different sized disks with ZFS comes up reasonably often, and the answer is invariably "You don't", quite often followed by:

  • Various ways in which you would not be able to do it
  • One way in which you could do it, but everything pretends to be the size of the smallest disk, so you have much less space
  • Some options to build a stack of ZFS on something on ZFS on something on ... with a note that you may want to supply the disk data blocks with good reading material and detailed maps in case they get lost on the long trek to and from the disk
  • (and my personal favourite) Split up all the devices into the largest divisible unit common to all disks, determine just the right way to stack and assemble all the units (I think that would come to 12 separate units in my case), wrap the result up in a holocaust cloak, set it on fire and push it towards a castle gate

These all seemed to either not give the desired result, or to be the wrong type (*) of complexity (or quite often both), so I thought I'd give it a go

*: Criticising other solutions as too complex while I contemplate my own particular combination of cogs, pulleys, and string, would be... unsporting

Anyway, I did consider using RAIDframe, but to bolt together each 6 & 2 disk to pretend to be an 8, but then... I recalled ccd - a delightful little artisanal driver from the last couple of decades of the 1900s. ccd can be used to stripe together multiple devices, concatenate them, or even a combination (stripe until one until it runs out then just use the other).

Since I'm just trying to pretend to ZFS that "No, no, really you have three identically sized disks. Just Trust Me Here", I elected to concatenate them.

And as a final gust of pretending this Might All Be A Serious Attempt At Making Something Useful, I decided to use labelled wedges, for both the ccd and the ZFS elements, so when something inevitably fails and devices just go away, or everything gets randomly crammed into another box the system might have a vague chance of determining how to fit together whatever pieces remained.

So, before the script (there always has to be a script, encoding the necessary magic with cryptic, misleading and occasionally completely incorrect comments for the edification of one's future self), did it work?

It seems to work fine, taking:
wd0: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors
wd1: 7452 GB, 15504021 cyl, 16 head, 63 sec, 512 bytes/sect x 15628053168 sectors
wd2: 1863 GB, 3876021 cyl, 16 head, 63 sec, 512 bytes/sect x 3907029168 sectors
wd3: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors
wd4: 5589 GB, 11628021 cyl, 16 head, 63 sec, 512 bytes/sect x 11721045168 sectors

... into

# zpool list -v  
NAME               SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
onyx              21.8T  5.94M  21.7T         -     0%     0%  1.00x  ONLINE  -
  raidz1          21.8T  3.83M  21.7T         -     0%     0%
    wedges/onyx0      -      -      -         -      -      -
    wedges/onyx1      -      -      -         -      -      -
    wedges/onyx2      -      -      -         -      -      -

Right now its booted off an old USB key which appears to think its a QL microdrive, so syncthing is busy maxing out IOPs to that while its downloading the data to rebuild the backup, but that's a problem for another day.

Oh, yes - the script. I suspect primarily for my benefit so as when something goes horribly wrong in three years time and I have no memory of how I set things up:

#!/bin/sh -e
# Setup raidz1 pool from one large disk, and two smaller paired disks using ccd
# - everything created in labelled wedges  
# - vdev size is capped to the first large disk

DISK0=wd1
DISK1a=wd3
DISK1b=wd2
DISK2a=wd4
DISK2b=wd0
DISKS="$DISK0 $DISK1a $DISK1b $DISK2a $DISK2b"

calculate_needed()
{
disk1_size="$(gptsize $1)"
disk2_size="$(gptsize $2)"
# Add 4096 for ccd header
expr $disk1_size - $disk2_size + 4096
}

create_ccd()
{
wedge=$1
ccd=$2
diska=$3
diskb=$4
ccdconfig $ccd 0 0 NAME=$diska NAME=$diskb
# gpt destroy falls over if reusing with different ccd size
(yes y | fdisk -i $ccd) > /dev/null
gpt destroy $ccd || true
gpt create -f $ccd
gpt add -a 1m -t fbsd-zfs -l $wedge $ccd
}

gptsize()
{
gpt show -i 1 $1 | awk '/Size:/{print $2}'
}

zpool destroy onyx || true
ccdconfig -u ccd1 || true
ccdconfig -u ccd2 || true
for disk in $DISKS; do
  gpt destroy $disk || true
  gpt create $disk
done
gpt add -a 1m -t fbsd-zfs -l onyx0 $DISK0

gpt add -a 1m -t ccd -l onyx1a $DISK1a
extrasize="$(calculate_needed $DISK0 $DISK1a)"
gpt add -a 1m -t ccd -l onyx1b -s $extrasize $DISK1b
create_ccd onyx1 ccd1 onyx1a onyx1b

gpt add -a 1m -t ccd -l onyx2a $DISK2a
extrasize="$(calculate_needed $DISK0 $DISK2a)"
gpt add -a 1m -t ccd -l onyx2b -s $extrasize $DISK2b
create_ccd onyx2 ccd2 onyx2a onyx2b

sleep 1 # Allow devpubd to catch up
zpool create onyx raidz1 /dev/wedges/onyx0 /dev/wedges/onyx1 /dev/wedges/onyx2

# Calculate ccd size, remembering 4096 header
# Cannot use "ccdconfig -g ccd1 ccd2" as it loses wedge names
disk0_size=$(gptsize $DISK0)
ccd_size=$(expr \( $disk0_size + 4096 \) \* 512)

cat > /etc/ccd.conf <<END
ccd1            0      none  NAME=onyx1a NAME=onyx1b
ccd2            0      none  NAME=onyx2a NAME=onyx2b
END