------- Comment #4 from tkoenig at gcc dot gnu dot org 2008-08-23 13:18 -------
Created an attachment (id=16134)
Actually, the test cases were a bit unfair, because
the middle-end decided not to calculate the
values of c that were never used.
Attached is a better test case.
Timings on x86_64-unknown-linux-gnu:
matmul = 12.840802 s
subroutine without explicit interface: 0.88805580 s
subroutine with explicit interface: 0.87605572 s
inline with sum 2.0721283 s
While inlining is still much better than matmul, a hand-rolled
3*3 subroutine is much faster overall, which I find a bit surprising.