[email protected]
[Top] [All Lists]

re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error

Subject: re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error
From: matthew green
Date: Sat, 8 May 2010 07:50:04 +0000 UTC
The following reply was made to PR kern/43274; it has been noted by GNATS.

From: matthew green <[email protected]>
To: Takeshi Nakayama <[email protected]>
Cc: [email protected], [email protected],
    [email protected], [email protected]
Subject: re: kern/43274: re(4) crash on ultra10 - uncorrectable DMA error
Date: Sat, 08 May 2010 17:46:20 +1000

    
    >   ultra10 crashed earlier today with this on the console:
    > 
    >   login: psycho0: uncorrectable DMA error AFAR 11b8450 AFSR 
0x410000ff40800000<BLK,P_DTE,P_DRD>
    >   psycho0: IOVA c0114000 IOTTE 3fc84012
    [ .. ]
    
    I see a similar problem on tlp(4) on Netra X1.  So please try this
    workaround.
    
    
    Index: sys/arch/sparc64/dev/iommu.c
    ===================================================================
    RCS file: /cvsroot/src/sys/arch/sparc64/dev/iommu.c,v
    retrieving revision 1.98
    diff -u -d -r1.98 iommu.c
    --- sys/arch/sparc64/dev/iommu.c    11 Mar 2010 03:54:56 -0000      1.98
    +++ sys/arch/sparc64/dev/iommu.c    7 May 2010 14:07:08 -0000
    @@ -358,8 +358,10 @@
                 * eliminating the next line, but the page is mapped
                 * until the next iommu_enter call.
                 */
    +#if 0 /* XXX */
                is->is_tsb[IOTSBSLOT(va,is->is_tsbsize)] &= ~IOTTE_V;
                membar_storestore();
    +#endif
                bus_space_write_8(is->is_bustag, is->is_iommu,
                        IOMMUREG(iommu_flush), va);
                va += PAGE_SIZE;
    
    
    As I noted as comment in iommu.c around this workaround, it seems
    that unmapping an IOMMU page which is used by a device causes an
    uncorrectable DMA error.
    
    I could not figure out the problem other than this workaround.
 
 i noticed that open solaris never removes the valid bit from the
 iotte's.  i think we should commit the #if 0 or just remove that
 code entirely...
 
 
 .mrg.
 

<Prev in Thread] Current Thread [Next in Thread>