gcc-patches@gcc.gnu.org
[Top] [All Lists]

Re: [PATCH] Fix PR18754: add early loop pass, 2nd try

Subject: Re: [PATCH] Fix PR18754: add early loop pass, 2nd try
From: Zdenek Dvorak
Date: Fri, 21 Jan 2005 16:14:19 +0100
Hello,

> > >>>> The right fix seems to be to add the second SRA pass in the middle of 
> > >>>> loop
> > >>>> optimizations (just immediately after cunroll).  You would also need to
> > >>>> schedule constant propagation pass there (which should just work)
> > >>>> and preferably also cfg_cleanup (the variation from tcb branch that
> > >>>> preserves loop structures).
> > >>>
> > >>> Yes, I tried this - actually just adding SRA and redphi after cunroll,
> > >>> but this caused verify failures about not the right ssa form or so.  So
> > >>> I guessed SRA may be not ready to preserve invariants the loop
> > >>> optimizers need.
> > >>
> > >> you probably need to rerun the loop closed ssa form creation afterwards
> > >> (rewrite_into_loop_closed_ssa).
> > >
> > > Ok, tried this again (see proof of concept patch below).  With
> > > -O2 -funroll-loops this solves the original testcase of PR18754,
> > > but fails on the C++ testcase verifying the ssa form:
> > >
> > > scalar_loops.cpp: In function 'void foo(const Array<2>&, const
> > > Array<2>&)':
> > > scalar_loops.cpp:32: internal compiler error: tree check: expected
> > > ssa_name, have var_decl in verify_ssa, at tree-ssa.c:690
> > > Please submit a full bug report,
> > > with preprocessed source if appropriate.
> > > See <URL:http://gcc.gnu.org/bugs.html> for instructions.
> > >
> > > Any ideas what is going wrong?  This doesn't change, if I remove
> > > the rename_ssa_copies() call.
> > >
> > > Thanks,
> > > Richard.
> > >
> > rename_ssa_copies coalesces ssa variables, not renaming.
> > Since you don't have a valid ssa form at that point, it can't possibly
> > work right :)
> >
> > Call rewrite_into_ssa (false);
> 
> Ah ok, yes, that fixes the ICE.  Now I still do not get ivopts
> to optimize the sra'ed stuff, and sra doesn't catch all stuff it
> could.  It seems complete unrolling leaves us with lots of
> optimization opportunities -- this is also why with early loop
> unrolling adding a dominator pass after it exposes the optimization
> opportunities only.  Scheduling a ccp pass before sra helps somewhat,
> putting dom there segfaults the compiler (probably it alters the cfg,
> and the loop optimizer is not happy about this).
> 
> One problem is/may be we have stuff like
> 
> <L13>:;
>   D.1999_171 = I.data[i_204];
>   D.2000_174 = dX.data[i_204];
>   D.2001_176 = D.1999_171 + D.2000_174;
>   res.data[i_204] = D.2001_176;
>   i_178 = i_204 + 1;
>   ivtmp.27_163 = ivtmp.27_196 - 1;
>   if (0) goto <L60>; else goto <L16>;
> 
> <L60>:;
>   goto <bb 3> (<L13>);
> 
> Invalid sum of incoming frequencies 10000, should be 5000
> <L16>:;
> 
> 
> after cunroll - i.e. the BBs are not merged and we still have
> the loop exit test there.  So, until we get a cfg_cleanup that
> preserves loop information, scheduling SRA after cunroll and before
> ivopts doesn't help very much.
> 
> Zdenek - I remember you posted a patch for loop cfg_cleanup
> sometime ago, is this suitable for 4.0?

I don't know.  It is definitely pretty important for cunroll, but
probably does not fit in stage 3 criteria.  There is a version of the
patch commited to tcb branch, so at least for your experiments you may
use it; cfgcleanup + ccp after cunroll should basically get you as far
as it goes, dom should not help that much (it would be necessary
to play a bit with the jump threading inside it to make it preserve
loop structures).

Zdenek

<Prev in Thread] Current Thread [Next in Thread>