------- Comment #5 from jacob at math dot jussieu dot fr 2006-12-13 20:22
Nope... with -O3 -ffast-math I get 1.9 seconds in average (this is a laptop
with CPU frequency scaling, so it's difficult to get precise numbers). Adding
-funroll-loops in addition to -ffast-math doesn't seem to make a difference.
We're very far from the 0.3 seconds I get with -DUNROLL.
Also, trying again -O3 -funroll-loops, I get again 1.9 seconds, so I think
-funroll-loops didn't make any difference and I had been fooled by CPU
The problem with the multiplication is not important to me, it's just something
I used in this example. I could as well have written
for( int i = 0; i < 3; i++ )
for( int j = 0; j < 3; j++ )
(*this)(i, j) = (i == j) ? factor : 0;
But this turns out to be even slower. I presume that's because, as the loops
don't get both unrolled, the test i==j ?: makes branches at run-time.
Anyway thanks for being supportive and having looked into my problem. May I ask
again, can I hope for a fully-unrolling-nested-loops g++ in the near future?