netbsd-bugs@netbsd.org
[Top] [All Lists]

Re: kern/39297: mfi calls tsleep() from mfi_intr()

Subject: Re: kern/39297: mfi calls tsleep) from mfi_intr(
From: Greg Oster
Date: Fri, 8 Aug 2008 20:00:08 +0000 UTC
The following reply was made to PR kern/39297; it has been noted by GNATS.

From: Greg Oster <oster@xxxxxxxxxxx>
To: gnats-bugs@xxxxxxxxxx
Cc: 
Subject: Re: kern/39297: mfi calls tsleep() from mfi_intr() 
Date: Fri, 08 Aug 2008 13:58:01 -0600

 This is a multipart MIME message.
 
 --==_Exmh_1218225451_220690
 Content-Type: text/plain; charset=us-ascii
 
 oster@xxxxxxxxxx writes:
 > >Number:         39297
 > >Category:       kern
 > >Synopsis:       mfi driver calls tsleep() from mfi_intr()
 > >Confidential:   no
 > >Severity:       critical
 > >Priority:       high
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Tue Aug 05 17:25:00 +0000 2008
 > >Originator:     Greg Oster
 > >Release:        NetBSD 4.99.71
 > >Organization:
 > >Environment:
 > System: NetBSD hapi 4.99.71 NetBSD 4.99.71 (GENERIC) #0: Thu Jul 31 11:15:42 
 > CST 2008  root@hapi:/u1/builds/build247/src/sys/arch/amd64/compile/obj/GENERI
 > C amd64
 > Architecture: amd64
 > Machine: amd64
 > >Description:
 > 
 >      Running 4.99.71 (and some revisions earlier) on a machine with
 > using the mfi will result in the machine eventually locking up.  Breaking
 > into ddb yields the following:
 > 
 > login: fatal breakpoint trap in supervisor mode
 > trap type 1 code 0 rip ffffffff804dba45 cs 8 rflags 202 cr2  ffff8000720a8000
 >  cp
 > l 8 rsp ffff800062c4b7f8
 > Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x5:  leave
 > db{0}> tr
 > breakpoint() at netbsd:breakpoint+0x5
 > comintr() at netbsd:comintr+0x53a
 > Xintr_ioapic_edge6() at netbsd:Xintr_ioapic_edge6+0xef
 > --- interrupt ---
 > mutex_spin_retry() at netbsd:mutex_spin_retry+0x5a
 > ltsleep() at netbsd:ltsleep+0xe5
 > mfi_mgmt() at netbsd:mfi_mgmt+0xe1
 > mfi_scsipi_request() at netbsd:mfi_scsipi_request+0x331
 > scsipi_run_queue() at netbsd:scsipi_run_queue+0x16e
 > mfi_intr() at netbsd:mfi_intr+0xc0
 > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
 > Xintr_ioapic_level2() at netbsd:Xintr_ioapic_level2+0xf7
 [snip
 > 
 > >How-To-Repeat:
 > 
 >      Boot -current on a Dell PowerEdge 2950.
 >      Extract a tar file.  
 >      Or attempt a build.sh.  
 >      Or just wait.
 >         Observe system is completely locked up.
 >      Enter ddb.
 >      Observe that ltsleep() has been called from mfi_intr().
 > 
 > >Fix:
 >      Figure out a different way of doing whatever mfi_mgmt() thinks
 > needs to be done by sleeping?
 
 For now, the following patch is sufficient to allow the machine to 
 run for more than a few minutes -- it's actually been able to do 4 
 ./build.sh's in a row without locking up hard... 
 
 It's a great fix, but it at least makes the box usable... 
 
 Later...
 
 Greg Oster
 
 
 --==_Exmh_1218225451_220690
 Content-Type: text/plain ; name="mfi.c.diff"; charset=us-ascii
 Content-Description: mfi.c.diff
 
 Index: mfi.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/mfi.c,v
 retrieving revision 1.18
 diff -c -p -r1.18 mfi.c
 *** mfi.c      24 Jun 2008 10:08:43 -0000      1.18
 --- mfi.c      8 Aug 2008 17:37:00 -0000
 *************** mfi_scsipi_request(struct scsipi_channel
 *** 1007,1013 ****
 --- 1007,1015 ----
        struct scsipi_rw_10     *rwb;
        uint32_t                blockno, blockcnt;
        uint8_t                 target;
 + #if 0
        uint8_t                 mbox[MFI_MBOX_SIZE];
 + #endif
        int                     s;
   
        switch (req) {
 *************** mfi_scsipi_request(struct scsipi_channel
 *** 1072,1077 ****
 --- 1074,1083 ----
                }
                break;
   
 + #if 0
 +              mfi_mgmt() calls tsleep, and this routine gets called
 +                      from mfi_intr(), so we can't do this here!!!
 + 
        case SCSI_SYNCHRONIZE_CACHE_10:
                mfi_put_ccb(ccb); /* we don't need this */
   
 *************** mfi_scsipi_request(struct scsipi_channel
 *** 1086,1092 ****
                splx(s);
                return;
                /* NOTREACHED */
 ! 
        /* hand it of to the firmware and let it deal with it */
        case SCSI_TEST_UNIT_READY:
                /* save off sd? after autoconf */
 --- 1092,1098 ----
                splx(s);
                return;
                /* NOTREACHED */
 ! #endif
        /* hand it of to the firmware and let it deal with it */
        case SCSI_TEST_UNIT_READY:
                /* save off sd? after autoconf */
 
 --==_Exmh_1218225451_220690--
 
 

<Prev in Thread] Current Thread [Next in Thread>