netbsd-bugs@netbsd.org
[Top] [All Lists]

Re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error

Subject: Re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error
From: Takeshi Nakayama
Date: Sat, 08 May 2010 06:04:40 +0900 JST
>>> mrg@xxxxxxxxxxxxx wrote

>       ultra10 crashed earlier today with this on the console:
> 
>       login: psycho0: uncorrectable DMA error AFAR 11b8450 AFSR 
> 0x410000ff40800000<BLK,P_DTE,P_DRD>
>       psycho0: IOVA c0114000 IOTTE 3fc84012
>       Stopped in pid 0.3 (system) at  netbsd:cpu_Debugger+0x4:        nop
>       db{0}> bt
>       sparc_interrupt(ffffffffffffffe0, 20, 1000000, 6, 4, 3aa6840) at 
> netbsd:sparc_interrupt+0x1e8
>       _bus_dmamap_unload(1819140, 2f36000, 0, 5ea, 8, 7fffffffffffffff) at 
> netbsd:_bus_dmamap_unload+0x74
>       iommu_dvmamap_unload(2df5880, 2f36000, 6000, 5ea, 8, 0) at 
> netbsd:iommu_dvmamap_unload+0x28
>       re_txeof(c57a000, c, c17364c, 3fc84000, 0, 5ea) at netbsd:re_txeof+0x108
>       re_intr(c57a000, 42d2e70, 5ea, 0, 5, 401) at netbsd:re_intr+0x134
>       intr_biglock_wrapper(2df4a00, 0, e0017ed0, 10, 114b0e0, c173668) at 
> netbsd:intr_biglock_wrapper+0x10
>       sparc_interrupt(0, 42d2e70, 1f4, 0, 2, 0) at 
> netbsd:sparc_interrupt+0x1e8
>       ifq_enqueue(c57a008, 0, 2, 2, c1739a2, 1000000) at 
> netbsd:ifq_enqueue+0xa8
>       ether_output(0, 42d2e70, 3c19a20, 3a97650, 2810, 3aa6840) at 
> netbsd:ether_output+0x6bc
>       ip_output(14, 0, 3c19a20, c57a008, 3c08a00, 4326810) at 
> netbsd:ip_output+0xfa4
>       ip_forward(42d86a0, 1, c4dac08, 0, c4dac08, ac101837) at 
> netbsd:ip_forward+0x158
>       ip_input(5dc, 0, 0, c050e00, 114b0e0, c053b70) at netbsd:ip_input+0xb84
>       ipintr(1879c00, 0, c053740, 6, 34, de) at netbsd:ipintr+0x34
>       softint_thread(c02e230, c053740, 0, c050e00, 1296780, c052bf0) at 
> netbsd:softint_thread+0x64
>       lwp_trampoline(f0067458, fffa9cf8, 111800, 110728, fffa9df8, 1) at 
> netbsd:lwp_trampoline+0x8
>       db{0}> c

I see a similar problem on tlp(4) on Netra X1.  So please try this
workaround.


Index: sys/arch/sparc64/dev/iommu.c
===================================================================
RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
retrieving revision 1.98
diff -u -d -r1.98 iommu.c
--- sys/arch/sparc64/dev/iommu.c        11 Mar 2010 03:54:56 -0000      1.98
+++ sys/arch/sparc64/dev/iommu.c        7 May 2010 14:07:08 -0000
@@ -358,8 +358,10 @@
                 * eliminating the next line, but the page is mapped
                 * until the next iommu_enter call.
                 */
+#if 0 /* XXX */
                is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] &= ~IOTTE_V;
                membar_storestore();
+#endif
                bus_space_write_8(is->is_bustag, is->is_iommu,
                        IOMMUREG(iommu_flush), va);
                va += PAGE_SIZE;


As I noted as comment in iommu.c around this workaround, it seems
that unmapping an IOMMU page which is used by a device causes an
uncorrectable DMA error.

I could not figure out the problem other than this workaround.

-- Takeshi Nakayama

<Prev in Thread] Current Thread [Next in Thread>