gcc-patches@gcc.gnu.org
[Top] [All Lists]

Re: [PATCH] Fix PR18754: add early loop pass, 2nd try

Subject: Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
From: Richard Guenther
Date: Fri, 21 Jan 2005 16:04:34 +0100 CET
On Fri, 21 Jan 2005, Daniel Berlin wrote:

>
>
> On Fri, 21 Jan 2005, Richard Guenther wrote:
>
> > On Thu, 20 Jan 2005, Zdenek Dvorak wrote:
> >
> >>>> The right fix seems to be to add the second SRA pass in the middle of 
> >>>> loop
> >>>> optimizations (just immediately after cunroll).  You would also need to
> >>>> schedule constant propagation pass there (which should just work)
> >>>> and preferably also cfg_cleanup (the variation from tcb branch that
> >>>> preserves loop structures).
> >>>
> >>> Yes, I tried this - actually just adding SRA and redphi after cunroll,
> >>> but this caused verify failures about not the right ssa form or so.  So
> >>> I guessed SRA may be not ready to preserve invariants the loop
> >>> optimizers need.
> >>
> >> you probably need to rerun the loop closed ssa form creation afterwards
> >> (rewrite_into_loop_closed_ssa).
> >
> > Ok, tried this again (see proof of concept patch below).  With
> > -O2 -funroll-loops this solves the original testcase of PR18754,
> > but fails on the C++ testcase verifying the ssa form:
> >
> > scalar_loops.cpp: In function 'void foo(const Array<2>&, const
> > Array<2>&)':
> > scalar_loops.cpp:32: internal compiler error: tree check: expected
> > ssa_name, have var_decl in verify_ssa, at tree-ssa.c:690
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > See <URL:http://gcc.gnu.org/bugs.html> for instructions.
> >
> > Any ideas what is going wrong?  This doesn't change, if I remove
> > the rename_ssa_copies() call.
> >
> > Thanks,
> > Richard.
> >
> rename_ssa_copies coalesces ssa variables, not renaming.
> Since you don't have a valid ssa form at that point, it can't possibly
> work right :)
>
> Call rewrite_into_ssa (false);

Ah ok, yes, that fixes the ICE.  Now I still do not get ivopts
to optimize the sra'ed stuff, and sra doesn't catch all stuff it
could.  It seems complete unrolling leaves us with lots of
optimization opportunities -- this is also why with early loop
unrolling adding a dominator pass after it exposes the optimization
opportunities only.  Scheduling a ccp pass before sra helps somewhat,
putting dom there segfaults the compiler (probably it alters the cfg,
and the loop optimizer is not happy about this).

One problem is/may be we have stuff like

<L13>:;
  D.1999_171 = I.data[i_204];
  D.2000_174 = dX.data[i_204];
  D.2001_176 = D.1999_171 + D.2000_174;
  res.data[i_204] = D.2001_176;
  i_178 = i_204 + 1;
  ivtmp.27_163 = ivtmp.27_196 - 1;
  if (0) goto <L60>; else goto <L16>;

<L60>:;
  goto <bb 3> (<L13>);

Invalid sum of incoming frequencies 10000, should be 5000
<L16>:;


after cunroll - i.e. the BBs are not merged and we still have
the loop exit test there.  So, until we get a cfg_cleanup that
preserves loop information, scheduling SRA after cunroll and before
ivopts doesn't help very much.

Zdenek - I remember you posted a patch for loop cfg_cleanup
sometime ago, is this suitable for 4.0?  I also remember some
other ivopts patches that may be suitable now, as we're back
to regular stage 3.

Thanks,
Richard.

--
Richard Guenther <richard dot guenther at uni-tuebingen dot de>
WWW: http://www.tat.physik.uni-tuebingen.de/~rguenth/

<Prev in Thread] Current Thread [Next in Thread>