zfs-discuss@opensolaris.org
[Top] [All Lists]

[zfs-discuss] panic with zfs

Subject: [zfs-discuss] panic with zfs
From: Ihsan Dogan
Date: Wed, 24 Jan 2007 14:29:45 +0100
Hello,

We're setting up a new mailserver infrastructure and decided, to run it
on zfs. On a E220R with a D1000, I've setup a storage pool with four
mirrors:

--------------------------------------------------------------
root@newponit # zpool status
  pool: pool0
 state: ONLINE
 scrub: none requested
config:

        NAME         STATE     READ WRITE CKSUM
        pool0        ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t0d0   ONLINE       0     0     0
            c5t8d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t1d0   ONLINE       0     0     0
            c5t9d0   ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t2d0   ONLINE       0     0     0
            c5t10d0  ONLINE       0     0     0
          mirror     ONLINE       0     0     0
            c1t3d0   ONLINE       0     0     0
            c5t11d0  ONLINE       0     0     0

errors: No known data errors
--------------------------------------------------------------

Before we start to install any software on it, we've got to the idea, to
see how zfs behaves when something goes wrong. So we pulled out a disk,
while a mkfile was running. What happened then, was not that, what we
expected. The system was hanging for more than an hour and finally it
paniced:

--------------------------------------------------------------
Jan 23 18:49:26 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
got SCSI bus reset
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Jan 23 18:50:36 newponit        Cmd (0x60000a3ed10) dump for Target 1 Lun 0:
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Jan 23 18:50:36 newponit                cdb=[ 0x2a 0x0 0x2 0x1b 0x2c 0x93 0x0 
0x0 0x1
0x0 ]
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Jan 23 18:50:36 newponit        pkt_flags=0xc000 pkt_statistics=0x60 
pkt_state=0x7
Jan 23 18:50:36 newponit scsi: [ID 365881 kern.info] /pci@1f,4000/scsi@3
(glm0):
Jan 23 18:50:36 newponit        pkt_scbp=0x0 cmd_flags=0x1860
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Jan 23 18:50:36 newponit        Disconnected tagged cmd(s) (1) timeout for
Target 1.0
Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available
Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
Disconnected tagged cmd(s) (1) timeout for Target 1.0
Jan 23 18:50:36 newponit glm: [ID 401478 kern.warning] WARNING:
ID[SUNWpd.glm.cmd_timeout.6018]
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3 (glm0):
Jan 23 18:50:36 newponit        got SCSI bus reset
Jan 23 18:50:36 newponit genunix: [ID 408822 kern.info] NOTICE: glm0:
fault detected in device; service still available
Jan 23 18:50:36 newponit genunix: [ID 611667 kern.info] NOTICE: glm0:
got SCSI bus reset
Jan 23 18:50:36 newponit scsi: [ID 107833 kern.warning] WARNING:
/pci@1f,4000/scsi@3/sd@1,0 (sd0):
Jan 23 18:50:36 newponit        SCSI transport failed: reason 'timeout': giving 
up
Jan 23 18:50:36 newponit md: [ID 312844 kern.warning] WARNING: md: state
database commit failed
Jan 23 18:50:36 newponit last message repeated 1 time
Jan 23 18:51:38 newponit unix: [ID 836849 kern.notice]
Jan 23 18:51:38 newponit ^Mpanic[cpu2]/thread=30000e81600:
Jan 23 18:51:38 newponit unix: [ID 268973 kern.notice] md: Panic due to
lack of DiskSuite state
Jan 23 18:51:38 newponit  database replicas. Fewer than 50% of the total
were available,
Jan 23 18:51:38 newponit  so panic to ensure data integrity.
Jan 23 18:51:38 newponit unix: [ID 100000 kern.notice]
Jan 23 18:51:38 newponit genunix: [ID 723222 kern.notice]
000002a1003c1230 md:mddb_commitrec_wrapper+a8 (a, 30000e81600, 18e9250,
12ecc00, 18e9000, 1)
Jan 23 18:51:38 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000030 0000000000000000 0000000000000002 0000060000a8e6c8
Jan 23 18:51:38 newponit   %l4-7: 0000000000000000 00000000012ecf48
0000000000000002 00000000012ecc00
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
000002a1003c12e0 md_mirror:mirror_mark_resync_region+290 (0, 0,
600008dacc0, 600008da980, 0, 1)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 00000600008e9e80 0000000000000001 0000000000000000
Jan 23 18:51:39 newponit   %l4-7: 0000000000000001 0000000000000000
000000000183d400 0000000000000002
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
000002a1003c1390 md_mirror:mirror_write_strategy+5c0 (60000885108, 0, 0,
0, 600008dad20, 0)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 00000300000c33b8 00000600008f5e08 0000000000008000
Jan 23 18:51:39 newponit   %l4-7: 00000000018e9a28 00000600008da980
0000000000000000 0000000000000000
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
000002a1003c1440 md:mdstrategy+d4 (60000885108, 1, 18e9800, 2000101,
2000101, 18e4400)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
00000000018e4800 00000000018e4888 0000000002000101 000000000130f500
Jan 23 18:51:39 newponit   %l4-7: 00000000018eafa8 00000600008c3dc8
0000000000000008 0000000000000001
Jan 23 18:51:39 newponit genunix: [ID 723222 kern.notice]
000002a1003c14f0 ufs:ldl_strategy+1a0 (1, 60002d92bc0, 12382a0, 18bcb58,
0, 0)
Jan 23 18:51:39 newponit genunix: [ID 179002 kern.notice]   %l0-3:
00000300003e3e00 0000060000883f20 0000000000000400 0000000000000370
Jan 23 18:51:39 newponit   %l4-7: 0000000000000000 000002a1003c15d0
000002a1003c15d8 0000060000885108
Jan 23 18:51:40 newponit genunix: [ID 723222 kern.notice]
000002a1003c15e0 ufs:push_dirty_bp+c (60000859840, 60002d92bc0, 0, 0, 0,
30000e8e000)
Jan 23 18:51:40 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000000 000000000006dc00 000000000000036e 0000000000000000
Jan 23 18:51:40 newponit   %l4-7: 0000030000e8e000 0000000000000000
000000000183d400 00000000018aa698
Jan 23 18:51:40 newponit genunix: [ID 723222 kern.notice]
000002a1003c1690 ufs:logmap_commit+8c (a, 32, 33, 16, 600008595c0,
60000859840)
Jan 23 18:51:40 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000001770 0000000000000000 00000000018a6800 0000000045b636bd
Jan 23 18:51:40 newponit   %l4-7: 0000000045b636bc 00000000018a5800
0000000000000000 0000000000000000
Jan 23 18:51:40 newponit genunix: [ID 723222 kern.notice]
000002a1003c17b0 ufs:top_end_sync+e4 (0, 2a1003c193c, 60000859960, 32,
60000859840, 600008596a0)
Jan 23 18:51:40 newponit genunix: [ID 179002 kern.notice]   %l0-3:
00000600008595c0 0000000000000000 00000600008596b0 0000000000000000
Jan 23 18:51:40 newponit   %l4-7: 0000000000000000 0000000000000033
0000000000000003 0000000000000001
Jan 23 18:51:40 newponit genunix: [ID 723222 kern.notice]
000002a1003c1880 ufs:ufs_update+2f4 (0, 60000a8e6c8, 1883a68,
300003e3e00, 1d, 2a1003c193c)
Jan 23 18:51:41 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000000030 0000000000000000 0000000000000002 0000060000a8e6c8
Jan 23 18:51:41 newponit   %l4-7: 000000000000ffbf 0000000000000000
0000060001b02000 0000000000000000
Jan 23 18:51:41 newponit genunix: [ID 723222 kern.notice]
000002a1003c1940 ufs:ufs_sync+2c (0, 1, 0, 1, 185de00, 1000000000000)
Jan 23 18:51:41 newponit genunix: [ID 179002 kern.notice]   %l0-3:
0000000000001497 000000000002ecfb 0000000000000000 000000000014c2e1
Jan 23 18:51:41 newponit   %l4-7: 00000000008bbccc 000000000000213f
00000000008bde0b 0000000000000001
Jan 23 18:51:41 newponit genunix: [ID 723222 kern.notice]
000002a1003c1a00 genunix:fsflush+4e0 (2, 18a5800, 1864f20, 1080,
185de28, 8000)
Jan 23 18:51:41 newponit genunix: [ID 179002 kern.notice]   %l0-3:
000000000185dda8 0000000000000bb8 0000000000000001 00000000018ad2a8
Jan 23 18:51:41 newponit   %l4-7: 000000000185dea8 000000000185ee28
0000000000000100 0000000000000100
Jan 23 18:51:41 newponit unix: [ID 100000 kern.notice]
Jan 23 18:51:41 newponit genunix: [ID 672855 kern.notice] syncing file
systems...
Jan 23 18:51:41 newponit genunix: [ID 904073 kern.notice]  done
Jan 23 18:51:42 newponit genunix: [ID 111219 kern.notice] dumping to
/dev/md/dsk/d1, offset 215220224, content: kernel
Jan 23 18:52:10 newponit genunix: [ID 409368 kern.notice] ^M100% done:
51733 pages dumped, compression ratio 9.67,
Jan 23 18:52:10 newponit genunix: [ID 851671 kern.notice] dump succeeded
--------------------------------------------------------------

After the system booted up, there was one disk on zfs. But really
intersting was this:

--------------------------------------------------------------
root@newponit # metastat
d3: Mirror
    Submirror 0: d13
      State: Needs maintenance
    Submirror 1: d23
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 4198392 blocks (2.0 GB)

d13: Submirror of d3
    State: Needs maintenance
    Invoke: metasync d3
    Size: 4198392 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s3          0     No            Okay   Yes


d23: Submirror of d3
    State: Needs maintenance
    Invoke: metasync d3
    Size: 4198392 blocks (2.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s3          0     No            Okay   Yes


d1: Mirror
    Submirror 0: d11
      State: Needs maintenance
    Submirror 1: d21
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 2101552 blocks (1.0 GB)

d11: Submirror of d1
    State: Needs maintenance
    Invoke: metasync d1
    Size: 2101552 blocks (1.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s1          0     No            Okay   Yes


d21: Submirror of d1
    State: Needs maintenance
    Invoke: metasync d1
    Size: 2101552 blocks (1.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s1          0     No            Okay   Yes


d0: Mirror
    Submirror 0: d10
      State: Needs maintenance
    Submirror 1: d20
      State: Needs maintenance
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 29035344 blocks (13 GB)

d10: Submirror of d0
    State: Needs maintenance
    Invoke: metasync d0
    Size: 29035344 blocks (13 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t0d0s0          0     No            Okay   Yes


d20: Submirror of d0
    State: Needs maintenance
    Invoke: metasync d0
    Size: 29035344 blocks (13 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c0t1d0s0          0     No            Okay   Yes


Device Relocation Information:
Device   Reloc  Device ID
c0t1d0   Yes    id1,sd@SSEAGATE_ST318305LSUN18G_3JKQ57TD000023076LHH
c0t0d0   Yes    id1,sd@SSEAGATE_ST318404LSUN18G_3BT2PJ950000220439SK
--------------------------------------------------------------

Why was the system hanging for such a long time and why I've got a
problem with the internal disk? And why the system paniced?


Full messages: http://ihsan.dogan.ch/files/messages

format:
--------------------------------------------------------------
       0. c0t0d0 <SEAGATE-ST318404LSUN18G-4203 cyl 7506 alt 2 hd 19 sec 248>
          /pci@1f,4000/scsi@3/sd@0,0
       1. c0t1d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
          /pci@1f,4000/scsi@3/sd@1,0
       2. c1t0d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,2000/scsi@1/sd@0,0
       3. c1t1d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,2000/scsi@1/sd@1,0
       4. c1t2d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,2000/scsi@1/sd@2,0
       5. c1t3d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,2000/scsi@1/sd@3,0
       6. c5t8d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,4000/scsi@5/sd@8,0
       7. c5t9d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,4000/scsi@5/sd@9,0
       8. c5t10d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,4000/scsi@5/sd@a,0
       9. c5t11d0 <SEAGATE-ST318305LSUN18G-0641-16.87GB>
          /pci@1f,4000/scsi@5/sd@b,0
--------------------------------------------------------------




Ihsan




-- 
ihsan@xxxxxxxx          http://ihsan.dogan.ch/
                        http://gallery.dogan.ch/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@xxxxxxxxxxxxxxx
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

<Prev in Thread] Current Thread [Next in Thread>