|
|
> Hello Michael,
>
> Am 24.1.2007 14:36 Uhr, Michael Schuster schrieb:
>
>>> --------------------------------------------------------------
>>> root@newponit # zpool status
>>> pool: pool0
>>> state: ONLINE
>>> scrub: none requested
>>> config:
>>
>> [...]
>>
>>> Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=30000e81600:
>>> Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
>>> lack of DiskSuite state
>>> Jan 23 18:51:38 newponit database replicas. Fewer than 50% of the total
>>> were available,
>>> Jan 23 18:51:38 newponit so panic to ensure data integrity.
>>
>> this message shows (and the rest of the stack prove) that your panic
>> happened in SVM. It has NOTHING to do with zfs. So either you pulled the
>> wrong disk, or the disk you pulled also contained SVM volumes (next to
>> ZFS).
>
> I noticed that the panic was in SVM and I'm wondering, why the machine
> was hanging. SVM is only running on the internal disks (c0) and I pulled
> a disk from the D1000:
so the device that was affected had nothing to do with SVM at all.
fine ... I have the exact same cconfig here. Internal SVM and
then external ZFS on two disk arrays on two controllers.
> Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/scsi@5/sd@9,0 (sd50):
> Jan 23 17:24:14 newponit SCSI transport failed: reason 'incomplete':
> retrying command
> Jan 23 17:24:14 newponit scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/scsi@5/sd@9,0 (sd50):
> Jan 23 17:24:14 newponit disk not responding to selection
> Jan 23 17:24:18 newponit scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/scsi@5/sd@9,0 (sd50):
> Jan 23 17:24:18 newponit disk not responding to selection
>
> This is clearly the disk with ZFS on it: SVM has nothing to do with this
> disk. A minute later, the troubles started with the internal disks:
OKay .. so are we back to looking at ZFS or ZFS and the SVM components or
some interaction between these kernel modules. At this point I have to be
careful not to fall into a pit of blind ignorance as I grobe for the
answer. Perhaps some data would help. Was there a core file in
/var/crash/newponit ?
> Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
> (glm0):
> Jan 23 17:25:26 newponit Cmd (0x60000a3ed10) dump for Target 0 Lun 0:
> Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
> (glm0):
> Jan 23 17:25:26 newponit cdb=[ 0x28 0x0 0x0 0x78 0x6 0x30 0x0
> 0x0 0x10
> 0x0 ]
> Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
> (glm0):
> Jan 23 17:25:26 newponit pkt_flags=0x4000 pkt_statistics=0x60
> pkt_state=0x7
> Jan 23 17:25:26 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
> (glm0):
> Jan 23 17:25:26 newponit pkt_scbp=0x0 cmd_flags=0x860
> Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/scsi@3 (glm0):
> Jan 23 17:25:26 newponit Disconnected tagged cmd(s) (1) timeout for
> Target 0.0
so a pile of scsi noise above there .. one would expect that from a
suddenly missing scsi device.
> Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
> fault detected in device; service still available
> Jan 23 17:25:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
> Disconnected tagged cmd(s) (1) timeout for Target 0.0
NCR scsi controllers .. what OS revision is this ? Solaris 10 u 3 ?
Solaris Nevada snv_55b ?
> Jan 23 17:25:26 newponit glm: [ID 401478 kern.warning] WARNING:
> ID[SUNWpd.glm.cmd_timeout.6018]
> Jan 23 17:25:26 newponit scsi: [ID 107833 kern.warning] WARNING:
> /pci@1f,4000/scsi@3 (glm0):
> Jan 23 17:25:26 newponit got SCSI bus reset
> Jan 23 17:25:26 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
> fault detected in device; service still available
>
> SVM and ZFS disks are on a seperate SCSI bus, so theoretically there
> should be any impact on the SVM disks when I pull out a ZFS disk.
I still feel that you hit a bug in ZFS somewhere. Under no circumstances
should a Solaris server panic and crash simply because you pulled out a
single disk that was totally mirrored. In fact .. I will reproduce those
conditions here and then see what happens for me.
Dennis
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
|
|