[email protected]
[Top] [All Lists]

Re: [PATCH, i386]: Vectorize copysign for x86

Subject: Re: [PATCH, i386]: Vectorize copysign for x86
From: Uros Bizjak
Date: Wed, 15 Jul 2009 17:38:41 +0200
On 07/14/2009 07:17 PM, Uros Bizjak wrote:

Attached patch enables vectorization of copysign function for x86 SSE targets.

Actually, since vectorized inverted sign bit mask value, created with ix86_build_signbit_mask exactly represents inverted value of the vectorized sign bits that is already available in a register, i.e. for SFmode:

~0x8000800080008000 = 0x7FFF7FFF7FFF7FFF

 we can reuse non-inverted value by using and-not SSE instruction.

copysign loop then vectorizes to:

    movaps    .LC0(%rip), %xmm1
    movaps    %xmm1, %xmm0
    movaps    b(%rax), %xmm2
    andnps    a(%rax), %xmm0
    andps    %xmm1, %xmm2
    orps    %xmm2, %xmm0
    movaps    %xmm0, r(%rax)
    addq    $16, %rax
    cmpq    $64, %rax
    jne    .L3

    .section    .rodata.cst16,"aM",@progbits,16
    .align 16
    .long    2147483648
    .long    2147483648
    .long    2147483648
    .long    2147483648
    .align 16

2009-07-15  Uros Bizjak <[email protected]>

    * config/i386/sse.md (copysign<mode>3): Use "and-not" SSE instruction
    instead of "and" with inverted sign bit mask value.  Use
    "nonimmediate_operand" for operand 1 and operand 2 predicate.
    Allocate registers only for operand 4 and operand 5.

Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}. Patch was committed to mainline SVN.


Attachment: p.diff.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>