I might make a post of my recent endeavours into Open Media Vault and iSCSI but this post will focus on a “weird” one. For this post I must advise that following my instructions blindly and without a proper backup very likely will result in the total loss of your ZFS pool. Full disclosure, I’m also not certain if there is a better way.
I recently setup a FreeNAS box after trying to get ZFS on Linux to run appropriately. I have an SA120 with 2 SAS links to the FreeNAS box. These links allow for added active/active SAS paths to my drives. Unfortunately when I did the install of FreeNAS I was under the impression it would set up my multipathing for me. While this could be the case in normal installs I had an existing ZFS pool in my SA120.
The issue is that my ZFS pool was created with direct access to my drives and it seemed to chose which paths randomly as I have some on one path and some on another. I assume if either of the two SAS links were to go down it would be game over for my data. I did some digging around the FreeNAS and FreeBSD forums as well as some pre existing ZFS information I have and I compiled this information.
The utility to manipulate multipathing in FreeBSD is gmultipath. It can be set up in manual (create) or automatic (label) modes. The issue with using label seems to be that it writes metadata to the drive. The metadata is stored in the same place that GPT stores its metadata and unfortunately also where ZFS stores its metadata. It is discouraged to use label to set up your multipaths instead using the create argument.
A missing element in the forum posts I found is that they don’t address setting the multipath device up for active/active mode. Without setting this it will automatically use failover, active/passive. This may not be allowed for everyone but I’m using SAS drives in a SAS JBOD with a SAS host bus adapter. There is only one machine that is writing to the drives at once so I’m comfortable setting the links to active/active for increased throughput.
The last issue that needs to be addressed is that my ZFS drives are live and in production. Obviously I’m taking some risk by manipulating the devices underlying the ZFS pool however I have faith in my assumption that I can effectively upgrade my ZFS pool life for the following reasons; one the ZFS pool is live and has redundant connections, two I know ZFS is continuously monitoring each bit of data that is read and will report immediately if there is an error, three I had this ZFS pool imported this way on the Linux incarnation prior to installing FreeNAS and four I am exporting ZFS snapshots to another ZFS volume as well as backups taken locally on the hypervisors.
To begin we need to find which /dev/da* devices are paired. If you have more than two active SAS paths you will likely have triplets or more, e.g. one /dev/da entry per SAS drive per link. In manual mode you have to discover the paths for yourself. Fortunately this is usually made easier by the fact that each link is loaded in order as well as each LUN on each link. My SA120 has 2 active paths to my server. That means I have /dev/da0 to /dev/da23 therefore /dev/da0 is likely the same physical disk as /dev/da12. To verify this run the following command…
FreeNAS# camcontrol inquiry da0 -S
3SJ1BE73 00009038RZKW
FreeNAS# camcontrol inquiry da12 -S
3SJ1BE73 00009038RZKW
Here we see da0 and da12 have the same serial number. We now know we can use these two drives to create a new multipath target using gmultipath. The only issue now is that one or the other are in use by the ZFS pool. We need to figure out which is in use and remove it from the pool. To do that we need to issue a zpool status or similar command as follows to get the details
FreeNAS# zpool status
pool: sa120_0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
sa120_0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da23 ONLINE 0 0 0
da19 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da14 ONLINE 0 0 0
da13 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
da9 ONLINE 0 0 0
da12 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
da16 ONLINE 0 0 0
da20 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
da15 ONLINE 0 0 0
da17 ONLINE 0 0 0
spares
da18 AVAIL
The output shows that in mirror-2 the da12 device is in use. The scary part begins now that we need to remove da12 from mirror-2 leaving da9 as the remaining device for the stripe. To do that we need to issue a detach command. The following command will remove the device and the subsequent zpool status will give an output that doesn’t immediately make sense however it’s very logical.
FreeNAS# zpool detach sa120_0 da12
FreeNas# zpool status
pool: sa120_0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
sa120_0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da23 ONLINE 0 0 0
da19 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da14 ONLINE 0 0 0
da13 ONLINE 0 0 0
da9 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
da16 ONLINE 0 0 0
da20 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
da15 ONLINE 0 0 0
da17 ONLINE 0 0 0
spares
da18 AVAIL
Do not be alarmed that the output no longer shows a mirror-2. The reason for this is that the remaining drive, da9, is basically functioning as a single member mirror. This is why this part is the scariest part. Until we create our multipath device if da9 is ejected from the array then the entire ZFS pool will be lost. For my build a resilver is very fast; taking about 40 minutes to complete. I have much faith in ZFS as it’s designed to test the drives on every read. The likelihood of a failure is therefore probably low however this is no reason to skip making a backup. As stated before I have a ZFS snapshot as well as local, non ZFS based, backups. If anything were to go wrong I could restore a backup until the ZFS pool could be restored.
The next step is to create the multipath device. As mentioned previously we will use the command gmultipath to create the device. Then we will use the newly created device entry to readd the physical disk to the ZFS pool using the following commands.
FreeNAS# gmultipath create -A -v disk0 /dev/da0 /dev/da12
Done.
FreeNAS# zpool status
pool: sa120_0
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
sa120_0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da23 ONLINE 0 0 0
da19 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da14 ONLINE 0 0 0
da13 ONLINE 0 0 0
da9 ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
da16 ONLINE 0 0 0
da20 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
da15 ONLINE 0 0 0
da17 ONLINE 0 0 0
spares
da18 AVAIL
errors: No known data errors
FreeNAS# zpool attach sa120_0 da9 /dev/multipath/disk0
pool: sa120_0
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Mon Jun 19 11:08:34 2017
1.81G scanned out of 318G at 88.5M/s, 1h1m to go
370M resilvered, 0.57% done
config:
NAME STATE READ WRITE CKSUM
sa120_0 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
da23 ONLINE 0 0 0
da19 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
da14 ONLINE 0 0 0
da13 ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
da9 ONLINE 0 0 0
multipath/disk0 ONLINE 0 0 0 (resilvering)
mirror-3 ONLINE 0 0 0
da16 ONLINE 0 0 0
da20 ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
da15 ONLINE 0 0 0
da17 ONLINE 0 0 0
spares
da18 AVAIL
Here we can see mirror-2 is back in the list with da9 and multipath/disk0 being resilvered. It is important to note that the -A flag indicates that all paths for disk0 should be active at once. The following command will show the configuration and status of your multipath devices.
FreeNAS# gmultipath list
Geom name: disk0
Type: MANUAL
Mode: Active/Active
UUID: (null)
State: OPTIMAL
Providers:
1. Name: multipath/disk0
Mediasize: 300000000000 (279G)
Sectorsize: 512
Mode: r1w1e1
State: OPTIMAL
Consumers:
1. Name: da0
Mediasize: 300000000000 (279G)
Sectorsize: 512
Mode: r2w2e2
State: ACTIVE
2. Name: da12
Mediasize: 300000000000 (279G)
Sectorsize: 512
Mode: r2w2e2
State: ACTIVE
We can see disk0 is created, its type is manual and its mode is Active/Active. Then we can see its state is optimal, the provider being the new device we created and the consumers being the constituent device paths that make up the multipath.
The process can be repeated until all drives in the array are multipath drives. There are two approaches to replacing every drive in the ZFS pool. First you can do the drives all sequentially, da0-11 until they’re all done. Alternative they can be done per mirror so that at any given moment each mirror could be resilvering. I highly recommend the take it slow and easy approach and replace one device at a time.