netbsd-bugs@netbsd.org
[Top] [All Lists]

Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown

Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system shutdown
From: Greg Oster
Date: Sat, 07 Feb 2009 00:50:18 UTC
Newsgroups: fa.netbsd.bugs

The following reply was made to PR kern/40569; it has been noted by GNATS.

From: Greg Oster <oster@xxxxxxxxxxx>
To: gnats-bugs@xxxxxxxxxx
Cc: 
Subject: Re: kern/40569: Faild RAIDframe parity rewrite prevents system 
shutdown 
Date: Fri, 06 Feb 2009 18:47:48 -0600

 tron@xxxxxxxxxxxxx writes:
 > >Number:         40569
 > >Category:       kern
 > >Synopsis:       Faild RAIDframe parity rewrite prevents system shutdown
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Fri Feb 06 23:05:00 +0000 2009
 > >Originator:     Matthias Scheler
 > >Release:        NetBSD 5.0_RC1 2009-02-03 sources
 > >Organization:
 > Matthias Scheler                                  http://zhadum.org.uk/
 > >Environment:
 > System: NetBSD colwyn.zhadum.org.uk 5.0_RC1 NetBSD 5.0_RC1 (COLWYN.64) #0: Fr
 > i Feb 6 17:59:15 GMT 2009 tron@xxxxxxxxxxxxxxxxxxxx:/src/sys/compile/COLWYN.6
 > 4 amd64
 > Architecture: x86_64
 > Machine: amd64
 > >Description:
 > One of the SATA disks in my server had a few write errors and was ejected
 > for a RAIDframe RAID 1 a few days ago. When I finally noticed this
 > morning I initiated a parity rewrite with "raidctl -R /dev/wd2e raid1".
 > The rebuild failed unfortunately:
 > 
 > raid1: initiating in-place reconstruction on column 0
 > wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455; 
 > cn 266305 tn 0 sn 15), retrying
 > [...]
 > wd2e: error writing fsbn 268435392 of 268435392-268435519 (wd2 bn 268435455; 
 > cn 266305 tn 0 sn 15)
 > wd2: (id not found)
 > raid1: Recon write failed!
 > raid1: reconstruction failed.
 > 
 > I retried the parity rewrite but it was rejected by "raidctl" because of
 > an invalid I/O control. 
 
 Do you have a bit more info on exactly what you tried here and what 
 the error was?  A parity rewrite shouldn't have bumped 
 reconInProgress.
 
 > The reconstruction was not tried again. When
 > I later tried to shutdown the system (to check the cabling) the kernel
 > stopped while unmounting the file systems with this message:
 > 
 > unmounting file systems...raid1: Waiting for reconstruction to stop...
 > 
 > I had to remove the power hard at this point.
 > 
 > >How-To-Repeat:
 > Use "raidctl -R /dev/<x> raid<y>" and try to shutdown the system afterwards.
 
 I suspect the reconstruction also needs to fail, and you may need to 
 attempt to do something else again.. but I'm not sure yet... 
 (I can't see how reconInProgress is non-zero in rf_driver.c unless 
 there really is a reconstruction going on... From what you describe 
 here there wasn't an active reconstruction going on, and so I have no 
 clue how it could get into that state... :( )
 
 Later...
 
 Greg Oster
 
 

<Prev in Thread] Current Thread [Next in Thread>