|
|
On Fri, 21 Jan 2005, Daniel Berlin wrote:
>
>
> On Fri, 21 Jan 2005, Richard Guenther wrote:
>
> > On Thu, 20 Jan 2005, Zdenek Dvorak wrote:
> >
> >>>> The right fix seems to be to add the second SRA pass in the middle of
> >>>> loop
> >>>> optimizations (just immediately after cunroll). You would also need to
> >>>> schedule constant propagation pass there (which should just work)
> >>>> and preferably also cfg_cleanup (the variation from tcb branch that
> >>>> preserves loop structures).
> >>>
> >>> Yes, I tried this - actually just adding SRA and redphi after cunroll,
> >>> but this caused verify failures about not the right ssa form or so. So
> >>> I guessed SRA may be not ready to preserve invariants the loop
> >>> optimizers need.
> >>
> >> you probably need to rerun the loop closed ssa form creation afterwards
> >> (rewrite_into_loop_closed_ssa).
> >
> > Ok, tried this again (see proof of concept patch below). With
> > -O2 -funroll-loops this solves the original testcase of PR18754,
> > but fails on the C++ testcase verifying the ssa form:
> >
> > scalar_loops.cpp: In function 'void foo(const Array<2>&, const
> > Array<2>&)':
> > scalar_loops.cpp:32: internal compiler error: tree check: expected
> > ssa_name, have var_decl in verify_ssa, at tree-ssa.c:690
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See <URL:http://gcc.gnu.org/bugs.html> for instructions.
> >
> > Any ideas what is going wrong? This doesn't change, if I remove
> > the rename_ssa_copies() call.
> >
> > Thanks,
> > Richard.
> >
> rename_ssa_copies coalesces ssa variables, not renaming.
> Since you don't have a valid ssa form at that point, it can't possibly
> work right :)
>
> Call rewrite_into_ssa (false);
Ah ok, yes, that fixes the ICE. Now I still do not get ivopts
to optimize the sra'ed stuff, and sra doesn't catch all stuff it
could. It seems complete unrolling leaves us with lots of
optimization opportunities -- this is also why with early loop
unrolling adding a dominator pass after it exposes the optimization
opportunities only. Scheduling a ccp pass before sra helps somewhat,
putting dom there segfaults the compiler (probably it alters the cfg,
and the loop optimizer is not happy about this).
One problem is/may be we have stuff like
<L13>:;
D.1999_171 = I.data[i_204];
D.2000_174 = dX.data[i_204];
D.2001_176 = D.1999_171 + D.2000_174;
res.data[i_204] = D.2001_176;
i_178 = i_204 + 1;
ivtmp.27_163 = ivtmp.27_196 - 1;
if (0) goto <L60>; else goto <L16>;
<L60>:;
goto <bb 3> (<L13>);
Invalid sum of incoming frequencies 10000, should be 5000
<L16>:;
after cunroll - i.e. the BBs are not merged and we still have
the loop exit test there. So, until we get a cfg_cleanup that
preserves loop information, scheduling SRA after cunroll and before
ivopts doesn't help very much.
Zdenek - I remember you posted a patch for loop cfg_cleanup
sometime ago, is this suitable for 4.0? I also remember some
other ivopts patches that may be suitable now, as we're back
to regular stage 3.
Thanks,
Richard.
--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/
|
|