I can check on Monday, but the system will probably panic... which
doesn't really help :-)
Am I right in thinking failmode=wait is still the default? If so,
that should be how it's set as this testing was done on a clean
install of snv_106. From what I've seen, I don't think this is a
problem with the zfs failmode. It's more of an issue of what happens
in the period *before* zfs realises there's a problem and applies the
This time there was just a window of a couple of minutes while
commands would continue. In the past I've managed to stretch it out
To me the biggest problems are:
- ZFS accepting writes that don't happen (from both before and after
the drive is removed)
- No logging or warning of this in zpool status
I appreciate that if you're using cache, some data loss is pretty much
inevitable when a pool fails, but that should be a few seconds worth
of data at worst, not minutes or hours worth.
Also, if a pool fails completely and there's data in the cache that
hasn't been committed to disk, it would be great if Solaris could
- immediately dumping the cache to any (all?) working storage
- prompting the user to fix the pool, or save the cache before
powering down the system
On Fri, Feb 6, 2009 at 5:49 PM, Richard Elling <richard.elling@xxxxxxxxx> wrote:
> Ross, this is a pretty good description of what I would expect when
> failmode=continue. What happens when failmode=panic?
> -- richard
> Ross wrote:
>> Ok, it's still happening in snv_106:
>> I plugged a USB drive into a freshly installed system, and created a
>> single disk zpool on it:
>> # zpool create usbtest c1t0d0
>> I opened the (nautilus?) file manager in gnome, and copied the /etc/X11
>> folder to it. I then copied the /etc/apache folder to it, and at 4:05pm,
>> disconnected the drive.
>> At this point there are *no* warnings on screen, or any indication that
>> there is a problem. To check that the pool was still working, I created
>> duplicates of the two folders on that drive. That worked without any
>> errors, although the drive was physically removed.
>> I ran zpool status, the pool is actually showing as unavailable, so at
>> least that has happened faster than my last test.
>> The folder is still open in gnome, however any attempt to copy files to or
>> from it just hangs the file transfer operation window.
>> /usbtest is still visible in gnome
>> Also, I can still open a console and use the folder:
>> # cd usbtest
>> # ls
>> X11 X11 (copy) apache apache (copy)
>> I also tried:
>> # mv X11 X11-test
>> That hung, but I saw the X11 folder disappear from the graphical file
>> manager, so the system still believes something is working with this pool.
>> The main GUI is actually a little messed up now. The gnome file manager
>> window looking at the /usbtest folder has hung. Also, right-clicking the
>> desktop to open a new terminal hangs, leaving the right-click menu on
>> The main menu still works though, and I can still open a new terminal.
>> Commands such as ls are finally hanging on the pool.
>> At this point I tried to reboot, but it appears that isn't working. I
>> used system monitor to kill everything I had running and tried again, but
>> that didn't help.
>> I had to physically power off the system to reboot.
>> After the reboot, as expected, /usbtest still exists (even though the
>> drive is disconnected). I removed that folder and connected the drive.
>> ZFS detects the insertion and automounts the drive, but I find that
>> although the pool is showing as online, and the filesystem shows as mounted
>> at /usbtest. But the /usbtest directory doesn't exist.
>> I had to export and import the pool to get it available, but as expected,
>> I've lost data:
>> # cd usbtest
>> # ls
>> even worse, zfs is completely unaware of this:
>> # zpool status -v usbtest
>> pool: usbtest
>> state: ONLINE
>> scrub: none requested
>> NAME STATE READ WRITE CKSUM
>> usbtest ONLINE 0 0 0
>> c1t0d0 ONLINE 0 0 0
>> errors: No known data errors
>> So in summary, there are a good few problems here, many of which I've
>> already reported as bugs:
>> 1. ZFS still accepts read and write operations for a faulted pool, causing
>> data loss that isn't necessarily reported by zpool status.
>> 2. Even after writes start to hang, it's still possible to continue
>> reading data from a faulted pool.
>> 3. A faulted pool causes unwanted side effects in the GUI, making the
>> system hard to use, and impossible to reboot.
>> 4. After a hard reset, ZFS does not recover cleanly. Unused mountpoints
>> are left behind.
>> 5. Automatic mounting of pools doesn't seem to work reliably.
>> 6. zfs status doesn't inform of any problems mounting the pool.
zfs-discuss mailing list