[email protected]
[Top] [All Lists]

RE: Patch to Avoid Bad Prefetching

Subject: RE: Patch to Avoid Bad Prefetching
From: "Shobaki, Ghassan"
Date: Thu, 4 Jun 2009 12:24:18 -0700
I agree with Zdenek that the first heuristic is imprecise and needs 
improvement. At least it has to distinguish between simple integer arithmetic 
ops and expensive FP ops. I am planning on doing this very soon. As I explained 
in the previous email, that will require computing an 
instruction-by-instruction time estimate using the estimates that GCC already 
computes but adapting them to properly account for the costs of cache missing 
memory ops.

I ran stream and it turns out that Zdenek's guess was right. The patched 
compiler caused regressions on two out of four scores (Add and Triad). However, 
the regressions went away when I set the prefetch_min_insn_to_mem_ratio 
parameter to 3 instead of the current default value of 4. Here are the scores I 
got. The numbers below are improvements in avg. time relative to no 

                    Copy Scale Add Triad 
Current Prefetcher  28%  29%   31%  31%
Patch with param=4  27%  28%   1%   1% 
Patch with param=3  28%  29%   31%  31%

Since a parameter value of 3 worked well on stream, I tested CPU2006 with a 
parameter value of 3 instead of 4. The results were the same on INT2006 but the 
improvement was slightly less on FP2006. Nevertheless, the patch still gave a 
significant improvement relative to the current GCC prefercher with no 
significant regressions. Here are the improvement numbers relative to the 
current GCC prefetcher:

Patch with param = 4:  INT2006 3.62%   FP2006 8.72%
Patch with param = 3:  INT2006 3.66%   FP2006 7.59%

Here are the benchmarks that are significantly affected by this parameter. The 
numbers below are percentage improvements relative to no prefetching:

Benchmark    current_prefetch  patch_param=4     patch_param=3
Hmmer              -29%             0%                2%
Zeusmp             -10%            -2%               -5%
Leslie             -14%             0%               -6%
Calculix           -11%            -1%               -4%
Tonto              -9%             -1%               -7%

All other benchmarks had the same improvements relative to the current 
prefetcher with both param=3 and param=4. So, the results for those are the 
same as those reported in my first email.   

In conclusion, the patch with param=3 gives good results on both benchmarks and 
does cause any regressions relative to what is currently in trunk. So, I 
suggest that I check in this patch with a default value of 3 for the parameter 
prefetch_min_insn_to_mem_ratio and then work on enhancing the first heuristic 
in a subsequent patch. 

I have fixed the formatting error caught by Zdeneatk and added entries for the 
two parameters in doc/invoke.texi. The patch bootstrapped successfully using 
the latest trunk (checked out last night). I am currently running "make check" 
with the patch. I will send the slightly revised patch after it passes. 


-----Original Message-----
From: Richard Guenther [mailto:[email protected]] 
Sent: Wednesday, June 03, 2009 7:20 AM
To: Zdenek Dvorak
Cc: Shobaki, Ghassan; [email protected]
Subject: Re: Patch to Avoid Bad Prefetching

On Wed, Jun 3, 2009 at 2:54 PM, Zdenek Dvorak <[email protected]> wrote:
> Hi,
> could you please test the patch on the stream benchmark?  It exhibits
> several opportunities for prefetching/movnt instructions that we would
> not like to miss, and I somewhat suspect that the first heuristic could
> cause us to.  The patch is OK assuming the results are fine.  It would
> be nice to improve the heuristics later, especially the first one looks
> a bit too imprecise to me.
>> +      if(insn_to_prefetch_ratio < MIN_INSN_TO_PREFETCH_RATIO)
>> +     return false;
>> +      return true;
> space after if; or better, return insn_to_prefetch_ratio >= 
>> Index: params.def
>> ===================================================================
>> --- params.def        (revision 145634)
>> +++ params.def        (working copy)
>>         "max basic blocks number in loop for loop invariant motion",
>>         10000, 0, 0)
>> +       "min-insn-to-prefetch-ratio",
>> +       "min. ratio of insns to prefetches to enable prefetching for "
>> +          "a loop with an unknown trip count",
>> +       10, 0, 0)
>> +
>> +       "prefetch-min-insn-to-mem-ratio",
>> +       "min. ratio of insns to mem ops to enable prefetching in a loop",
>> +       4, 0, 0)
>> +
> You also need to document the params in doc/invoke.texi.

Indeed.  The patch also needs to pass bootstrap & regtest on the
current trunk.


<Prev in Thread] Current Thread [Next in Thread>