|
|
Eric Schrock wrote:
> On Tue, Dec 12, 2006 at 02:08:57PM -0500, James F. Hranicky wrote:
>> Sure, but that's what I want to avoid. The FMA agent should do this by
>> itself, but it's not, so I guess I'm just wondering why, or if there's
>> a good way to get to do so. If this happens in the middle of the night I
>> don't want to have to run the commands by hand.
>
> Yes, the FMA agent should do this. Can you run 'fmdump -v' and see if
> the DE correctly identified the faulted devices?
Here you go:
# fmdump -v
TIME UUID SUNW-MSG-ID
Nov 29 16:29:12.1947 e50198f2-2eb9-c58b-d7c5-87aaae5cb935 ZFS-8000-D3
100% fault.fs.zfs.device
Problem in: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c
Affects: zfs://pool=8e63f0b8e4263e71/vdev=9272c0973ecdb27c
FRU: -
Nov 30 10:31:48.8844 1a44a780-05c0-cb6e-d44f-f1d8999f40e5 ZFS-8000-D3
100% fault.fs.zfs.device
Problem in: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54
Affects: zfs://pool=51f1caf6cad1aa2f/vdev=769276842b0efd54
FRU: -
Dec 11 14:04:57.8803 c46d21e0-200d-43a1-e5db-ae9c9ebf3482 ZFS-8000-D3
100% fault.fs.zfs.device
Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15
Affects: zfs://pool=2646e20c1cb0a9d0/vdev=52070de44ec80c15
FRU: -
Dec 11 14:42:32.1271 1319464e-7a8c-e65b-962e-db386e90f7f2 ZFS-8000-D3
100% fault.fs.zfs.device
Problem in: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
Affects: zfs://pool=2646e20c1cb0a9d0/vdev=724c128cdbc17745
FRU: -
I'm not really sure what it means.
>> For instance, the zpool command hanging or the system hanging trying to
>> reboot normally.
>
> If the SCSI commands hang forever, then there is nothing that ZFS can
> do, as a single write will never return. The more likely case is that
> the commands are continually timining out with very long response times,
> and ZFS will continue to talk to them forever. The future FMA
> integration I mentioned will solve this problem. In the meantime, you
> should be able to 'zpool offline' the affected devices by hand.
Well, as long as I know which device is affected :-> If "zpool status"
doesn't return it may be difficult to figure out.
Do you know if the SATA controllers in a Thumper can better handle this
problem?
> There is also associated work going on to better handle asynchrounous
> reponse times across devices. Currently, a single slow device will slow
> the entire pool to a crawl.
Do you have an idea as to when this might be available?
Thanks for all your input,
Jim
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|