|
|
On 07/14/2009 07:17 PM, Uros Bizjak wrote:
Attached patch enables vectorization of copysign function for x86 SSE
targets.
Actually, since vectorized inverted sign bit mask value, created with
ix86_build_signbit_mask exactly represents inverted value of the
vectorized sign bits that is already available in a register, i.e. for
SFmode:
~0x8000800080008000 = 0x7FFF7FFF7FFF7FFF
we can reuse non-inverted value by using and-not SSE instruction.
copysign loop then vectorizes to:
movaps .LC0(%rip), %xmm1
.L3:
movaps %xmm1, %xmm0
movaps b(%rax), %xmm2
andnps a(%rax), %xmm0
andps %xmm1, %xmm2
orps %xmm2, %xmm0
movaps %xmm0, r(%rax)
addq $16, %rax
cmpq $64, %rax
jne .L3
.section .rodata.cst16,"aM",@progbits,16
.align 16
.LC0:
.long 2147483648
.long 2147483648
.long 2147483648
.long 2147483648
.align 16
2009-07-15 Uros Bizjak <ubizjak@xxxxxxxxx>
* config/i386/sse.md (copysign<mode>3): Use "and-not" SSE instruction
instead of "and" with inverted sign bit mask value. Use
"nonimmediate_operand" for operand 1 and operand 2 predicate.
Allocate registers only for operand 4 and operand 5.
Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
{,-m32}. Patch was committed to mainline SVN.
Uros.
p.diff.txt
Description: Text document
|
|