> We are not sure how much there is to clean up at such an early stage,
> and does not seem to justify transforming to tree-ssa.
I have mostly the cleanup cfg and dead code ellimination in mind. THese
two optimizations alone do a lot to function bodies sizes especially if
you have language with higher abstraction, so they will considerably
modify the inlining decisions. We also need to profile the functions
and/or estimate the profile that itself needs nontrivial modification of
Prettiest way would be to use our existing SSA optimizers to do the job,
however if this turns out to be disaster performance wise (it should not
- SSA build without aliasing is cheap as is to throw it away
afterwards), we might impleemnt specialized pass similar to what
delete_trivially_dead_insns does on RTL.
We might also want to do constant propagation tougth.
> > I believe cloning and inliner should be the last IPO passes to run,
> > cloning before inlining, or simultaneously (dunno what works, or if
> > it makes a difference). You definitely want to be able to inline
> > a clone, of course.
> We agree, Inlining does not affect IPA, but :
Unforutnately my understanding of Kenneth's vision is that it does
affect IPA in important way. I can imagine this easilly in Java/C++
environment. For instance you have function that have object as an
argument and calls it's virtual method. It is likely that you pass
specific inline clone specific instance of object so you might
devirutalizae the function call in the inlined clone (but can't do that
globally) opening more IPA possibilities.
If you do devirtualization before inlining, you will end up with missing
oppurtunities. If you do it after inlining, you will introduce
posibilities for inliner that won't be taken.
One answer is probably to first do devirutalization, inlining,
devirtualization again and inlining again. Thiss still can be done with
my scheme (if devirtualization can cope with clones right) but ineed it
is not very nice.
Other approach probably would involve making the inliner smart about
this kinds of scenarios. I was thinking about the easier case of
function being passed as an arguemnt (for instance to for_each_rtx) and
it is not at all dificult to deal with in the inliner. The
devirtualization is not much trickier, but of course this scheme will
restrict us to several specific cases of this more general problem. The
question is whether catching few most common cases is enought or not.
> Jan's framework seems reasonable to us for whole program:
> - Gathering intraprocedural properties (on low gimple).
> - Performing IPA (recording its decisions/results).
> Making inlining decisions.
> - Trasformation function by function, starting with materializing the
> clones and inlining.
> We'll send you pointers to relevant papers shortly.
I am looking forward for the papers. I went across some of
implementation documents and sources I was able to find.
They seems to be mixture of both possible approaches none covering the
scalability issues of the approach requiring all functions in RAM
seriously (tought there are optimizers that do have memory management
able to swap the function body to disk again or compress it in memory
showing that they run into this kind of limitation)