> > >>>> The right fix seems to be to add the second SRA pass in the middle of
> > >>>> loop
> > >>>> optimizations (just immediately after cunroll). You would also need to
> > >>>> schedule constant propagation pass there (which should just work)
> > >>>> and preferably also cfg_cleanup (the variation from tcb branch that
> > >>>> preserves loop structures).
> > >>>
> > >>> Yes, I tried this - actually just adding SRA and redphi after cunroll,
> > >>> but this caused verify failures about not the right ssa form or so. So
> > >>> I guessed SRA may be not ready to preserve invariants the loop
> > >>> optimizers need.
> > >>
> > >> you probably need to rerun the loop closed ssa form creation afterwards
> > >> (rewrite_into_loop_closed_ssa).
> > >
> > > Ok, tried this again (see proof of concept patch below). With
> > > -O2 -funroll-loops this solves the original testcase of PR18754,
> > > but fails on the C++ testcase verifying the ssa form:
> > >
> > > scalar_loops.cpp: In function 'void foo(const Array<2>&, const
> > > Array<2>&)':
> > > scalar_loops.cpp:32: internal compiler error: tree check: expected
> > > ssa_name, have var_decl in verify_ssa, at tree-ssa.c:690
> > > Please submit a full bug report,
> > > with preprocessed source if appropriate.
> > > See <URL:http://gcc.gnu.org/bugs.html> for instructions.
> > >
> > > Any ideas what is going wrong? This doesn't change, if I remove
> > > the rename_ssa_copies() call.
> > >
> > > Thanks,
> > > Richard.
> > >
> > rename_ssa_copies coalesces ssa variables, not renaming.
> > Since you don't have a valid ssa form at that point, it can't possibly
> > work right :)
> > Call rewrite_into_ssa (false);
> Ah ok, yes, that fixes the ICE. Now I still do not get ivopts
> to optimize the sra'ed stuff, and sra doesn't catch all stuff it
> could. It seems complete unrolling leaves us with lots of
> optimization opportunities -- this is also why with early loop
> unrolling adding a dominator pass after it exposes the optimization
> opportunities only. Scheduling a ccp pass before sra helps somewhat,
> putting dom there segfaults the compiler (probably it alters the cfg,
> and the loop optimizer is not happy about this).
> One problem is/may be we have stuff like
> D.1999_171 = I.data[i_204];
> D.2000_174 = dX.data[i_204];
> D.2001_176 = D.1999_171 + D.2000_174;
> res.data[i_204] = D.2001_176;
> i_178 = i_204 + 1;
> ivtmp.27_163 = ivtmp.27_196 - 1;
> if (0) goto <L60>; else goto <L16>;
> goto <bb 3> (<L13>);
> Invalid sum of incoming frequencies 10000, should be 5000
> after cunroll - i.e. the BBs are not merged and we still have
> the loop exit test there. So, until we get a cfg_cleanup that
> preserves loop information, scheduling SRA after cunroll and before
> ivopts doesn't help very much.
> Zdenek - I remember you posted a patch for loop cfg_cleanup
> sometime ago, is this suitable for 4.0?
I don't know. It is definitely pretty important for cunroll, but
probably does not fit in stage 3 criteria. There is a version of the
patch commited to tcb branch, so at least for your experiments you may
use it; cfgcleanup + ccp after cunroll should basically get you as far
as it goes, dom should not help that much (it would be necessary
to play a bit with the jump threading inside it to make it preserve