On Wed, Nov 11, 2009 at 10:41:42AM +0000, Richard Earnshaw wrote:
> On Tue, 2009-11-10 at 16:24 -0500, Daniel Jacobowitz wrote:
> > This isn't obviously a win, even for -mvectorize-with-neon-quad.
> > Should I limit vdup to the non-constant case instead? I had hoped to
> > avoid the constant pool entry by using movw / movt / vdup, but GCC
> > doesn't realize (or doesn't agree) that such a sequence is cheaper
> > than a constant pool load.
> movw/movt will need a core register, so that's going to generate a
> 4-instruction sequence in most cases (with an instruction to move the
> result to the Neon reg bank); I would have thought that was unlikely to
> be a win.
No, just three (I hope):
0: e3000001 movw r0, #1 ; 0x1
4: e3400002 movt r0, #2 ; 0x2
8: eea00b10 vdup.32 q0, r0
I've checked this in as-is. If someone wants to rip out that
particular part of the patch, I won't feel in the least offended :-)