[email protected]
[Top] [All Lists]

Re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error

Subject: Re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error
From: Takeshi Nakayama
Date: Fri, 7 May 2010 22:55:01 +0000 UTC
The following reply was made to PR kern/43274; it has been noted by GNATS.

From: Takeshi Nakayama <[email protected]>
To: [email protected], [email protected]
Cc: [email protected], [email protected],
 [email protected]
Subject: Re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error
Date: Sat, 08 May 2010 06:04:40 +0900 (JST)

 >>> [email protected] wrote
 
 >      ultra10 crashed earlier today with this on the console:
 > 
 >      login: psycho0: uncorrectable DMA error AFAR 11b8450 AFSR 
 > 0x410000ff40800000<BLK,P_DTE,P_DRD>
 >      psycho0: IOVA c0114000 IOTTE 3fc84012
 >      Stopped in pid 0.3 (system) at  netbsd:cpu_Debugger+0x4:        nop
 >      db{0}> bt
 >      sparc_interrupt(ffffffffffffffe0, 20, 1000000, 6, 4, 3aa6840) at 
 > netbsd:sparc_interrupt+0x1e8
 >      _bus_dmamap_unload(1819140, 2f36000, 0, 5ea, 8, 7fffffffffffffff) at 
 > netbsd:_bus_dmamap_unload+0x74
 >      iommu_dvmamap_unload(2df5880, 2f36000, 6000, 5ea, 8, 0) at 
 > netbsd:iommu_dvmamap_unload+0x28
 >      re_txeof(c57a000, c, c17364c, 3fc84000, 0, 5ea) at netbsd:re_txeof+0x108
 >      re_intr(c57a000, 42d2e70, 5ea, 0, 5, 401) at netbsd:re_intr+0x134
 >      intr_biglock_wrapper(2df4a00, 0, e0017ed0, 10, 114b0e0, c173668) at 
 > netbsd:intr_biglock_wrapper+0x10
 >      sparc_interrupt(0, 42d2e70, 1f4, 0, 2, 0) at 
 > netbsd:sparc_interrupt+0x1e8
 >      ifq_enqueue(c57a008, 0, 2, 2, c1739a2, 1000000) at 
 > netbsd:ifq_enqueue+0xa8
 >      ether_output(0, 42d2e70, 3c19a20, 3a97650, 2810, 3aa6840) at 
 > netbsd:ether_output+0x6bc
 >      ip_output(14, 0, 3c19a20, c57a008, 3c08a00, 4326810) at 
 > netbsd:ip_output+0xfa4
 >      ip_forward(42d86a0, 1, c4dac08, 0, c4dac08, ac101837) at 
 > netbsd:ip_forward+0x158
 >      ip_input(5dc, 0, 0, c050e00, 114b0e0, c053b70) at netbsd:ip_input+0xb84
 >      ipintr(1879c00, 0, c053740, 6, 34, de) at netbsd:ipintr+0x34
 >      softint_thread(c02e230, c053740, 0, c050e00, 1296780, c052bf0) at 
 > netbsd:softint_thread+0x64
 >      lwp_trampoline(f0067458, fffa9cf8, 111800, 110728, fffa9df8, 1) at 
 > netbsd:lwp_trampoline+0x8
 >      db{0}> c
 
 I see a similar problem on tlp(4) on Netra X1.  So please try this
 workaround.
 
 
 Index: sys/arch/sparc64/dev/iommu.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
 retrieving revision 1.98
 diff -u -d -r1.98 iommu.c
 --- sys/arch/sparc64/dev/iommu.c       11 Mar 2010 03:54:56 -0000      1.98
 +++ sys/arch/sparc64/dev/iommu.c       7 May 2010 14:07:08 -0000
 @@ -358,8 +358,10 @@
                 * eliminating the next line, but the page is mapped
                 * until the next iommu_enter call.
                 */
 +#if 0 /* XXX */
                is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] &= ~IOTTE_V;
                membar_storestore();
 +#endif
                bus_space_write_8(is->is_bustag, is->is_iommu,
                        IOMMUREG(iommu_flush), va);
                va += PAGE_SIZE;
 
 
 As I noted as comment in iommu.c around this workaround, it seems
 that unmapping an IOMMU page which is used by a device causes an
 uncorrectable DMA error.
 
 I could not figure out the problem other than this workaround.
 
 -- Takeshi Nakayama
 

<Prev in Thread] Current Thread [Next in Thread>