On Sun, Jun 7, 2009 at 3:50 AM, Ian Collins<ian@xxxxxxxxxxxx> wrote:
> Ian Collins wrote:
>> Tim Haley wrote:
>>> Brent Jones wrote:
>>>> On the sending side, I CAN kill the ZFS send process, but the remote
>>>> side leaves its processes going, and I CANNOT kill -9 them. I also
>>>> cannot reboot the receiving system, at init 6, the system will just
>>>> hang trying to unmount the file systems.
>>>> I have to physically cut power to the server, but a couple days later,
>>>> this issue will occur again.
>>> A crash dump from the receiving server with the stuck receives would be
>>> highly useful, if you can get it. Reboot -d would be best, but it might just
>>> hang. You can try savecore -L.
>> I tried a reboot -d (I even had kmem-flags=0xf set), but it did hang. I
>> didn't try savecore.
>> One thing I didn't try was scat on the running system. What should I look
>> for (with scat) if this happens again?
> I now have a system with a hanging zfs receive, any hints on debugging it?
I haven't figured out a way to identify the problem, still trying to
find a 100% way to reproduce this problem.
Seemingly the more snapshots I send at a given time, the likelihood of
this happening goes up, but, correlation is not causation :)
I might try to open a support case with Sun (have a support contract),
but Opensolaris doesn't seem to be well understood by the support
folks yet, so not sure how far it will get.
storage-discuss mailing list