netbsd-bugs@netbsd.org
[Top] [All Lists]

Re: kern/39297: mfi calls tsleep() from mfi_intr()

Subject: Re: kern/39297: mfi calls tsleep) from mfi_intr(
From: Greg Oster
Date: Wed, 13 Aug 2008 22:25:19 UTC
Newsgroups: fa.netbsd.bugs

The following reply was made to PR kern/39297; it has been noted by GNATS.

From: Greg Oster <oster@xxxxxxxxxxx>
To: gnats-bugs@xxxxxxxxxx
Cc: 
Subject: Re: kern/39297: mfi calls tsleep() from mfi_intr() 
Date: Wed, 13 Aug 2008 16:21:32 -0600

 Greg Oster writes:
 > The following reply was made to PR kern/39297; it has been noted by GNATS.
 > 
 > From: Greg Oster <oster@xxxxxxxxxxx>
 > To: gnats-bugs@xxxxxxxxxx
 > Cc: 
 > Subject: Re: kern/39297: mfi calls tsleep() from mfi_intr() 
 > Date: Fri, 08 Aug 2008 13:58:01 -0600
 > 
 >  This is a multipart MIME message.
 >  
 >  --==_Exmh_1218225451_220690
 >  Content-Type: text/plain; charset=us-ascii
 >  
 >  oster@xxxxxxxxxx writes:
 >  > >Number:         39297
 >  > >Category:       kern
 >  > >Synopsis:       mfi driver calls tsleep() from mfi_intr()
 >  > >Confidential:   no
 >  > >Severity:       critical
 >  > >Priority:       high
 >  > >Responsible:    kern-bug-people
 >  > >State:          open
 >  > >Class:          sw-bug
 >  > >Submitter-Id:   net
 >  > >Arrival-Date:   Tue Aug 05 17:25:00 +0000 2008
 >  > >Originator:     Greg Oster
 >  > >Release:        NetBSD 4.99.71
 >  > >Organization:
 >  > >Environment:
 >  > System: NetBSD hapi 4.99.71 NetBSD 4.99.71 (GENERIC) #0: Thu Jul 31 11:15:
 > 42 
 >  > CST 2008  root@hapi:/u1/builds/build247/src/sys/arch/amd64/compile/obj/GEN
 > ERI
 >  > C amd64
 >  > Architecture: amd64
 >  > Machine: amd64
 >  > >Description:
 >  > 
 >  >   Running 4.99.71 (and some revisions earlier) on a machine with
 >  > using the mfi will result in the machine eventually locking up.  Breaking
 >  > into ddb yields the following:
 >  > 
 >  > login: fatal breakpoint trap in supervisor mode
 >  > trap type 1 code 0 rip ffffffff804dba45 cs 8 rflags 202 cr2  ffff8000720a8
 > 000
 >  >  cp
 >  > l 8 rsp ffff800062c4b7f8
 >  > Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x5:  leave
 >  > db{0}> tr
 >  > breakpoint() at netbsd:breakpoint+0x5
 >  > comintr() at netbsd:comintr+0x53a
 >  > Xintr_ioapic_edge6() at netbsd:Xintr_ioapic_edge6+0xef
 >  > --- interrupt ---
 >  > mutex_spin_retry() at netbsd:mutex_spin_retry+0x5a
 >  > ltsleep() at netbsd:ltsleep+0xe5
 >  > mfi_mgmt() at netbsd:mfi_mgmt+0xe1
 >  > mfi_scsipi_request() at netbsd:mfi_scsipi_request+0x331
 >  > scsipi_run_queue() at netbsd:scsipi_run_queue+0x16e
 >  > mfi_intr() at netbsd:mfi_intr+0xc0
 >  > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 >  > Xintr_ioapic_level2() at netbsd:Xintr_ioapic_level2+0xf7
 >  [snip
 >  > 
 >  > >How-To-Repeat:
 >  > 
 >  >   Boot -current on a Dell PowerEdge 2950.
 >  >   Extract a tar file.  
 >  >   Or attempt a build.sh.  
 >  >   Or just wait.
 >  >         Observe system is completely locked up.
 >  >   Enter ddb.
 >  >   Observe that ltsleep() has been called from mfi_intr().
 >  > 
 >  > >Fix:
 >  >   Figure out a different way of doing whatever mfi_mgmt() thinks
 >  > needs to be done by sleeping?
 >  
 >  For now, the following patch is sufficient to allow the machine to 
 >  run for more than a few minutes -- it's actually been able to do 4 
 >  ./build.sh's in a row without locking up hard... 
 >  
 >  It's a great fix, but it at least makes the box usable... 
 
 FWIW, the reason this issue is seen more often is I've enabled WAPBL 
 on the filesystems, and WAPBL tells the underlying device to "flush 
 its cache"... that ends up calling SCSI_SYNCHRONIZE_CACHE_10 on mfi, 
 which causes the problem noted in this PR... 
 
 Later...
 
 Greg Oster
 
 

<Prev in Thread] Current Thread [Next in Thread>