Brent Jones wrote:
A crash dump from the receiving server with the stuck receives would be highly
useful, if you can get it. Reboot -d would be best, but it might just hang.
You can try savecore -L.
I had been running snv_106 for about 3 or 4 months on a pair of X4540's.
I would ship snapshots from the primary server to the secondary server
nightly, which was working really well.
However, I have upgraded to 2009.06, and my replication scripts appear
to "hang" when performing zfs send/recv.
When one zfs send/recv process hangs, you cannot send any other
snapshots from any other filesystem to the remote host.
I have about 20 file systems I snapshots and replicate nightly.
The script I use to perform the snapshots is here:
On the remote side, I end up with many "hung" processes, like this:
bjones 11676 11661 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd pdxfilu02
bjones 11673 11660 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd pdxfilu02
bjones 11664 11653 0 01:30:03 ? 0:00 /sbin/zfs recv -vFd pdxfilu02
bjones 13727 13722 0 14:21:20 ? 0:00 /sbin/zfs recv -vFd pdxfilu02
And so on, one for each file system.
On the receiving end, 'zfs list' shows one filesystem attempting to
receive a snapshot, but I cannot stop it:
$ zfs list
NAME USED AVAIL REFER MOUNTPOINT
pdxfilu02/data/fs01/%20090605-00:30:00 1.74G 27.2T 208G
On the sending side, I CAN kill the ZFS send process, but the remote
side leaves its processes going, and I CANNOT kill -9 them. I also
cannot reboot the receiving system, at init 6, the system will just
hang trying to unmount the file systems.
I have to physically cut power to the server, but a couple days later,
this issue will occur again.
I'f I boot to my snv_106 BE, everything works fine, this issue has
never occurred on that version.
zfs-discuss mailing list