> On Sun 2008-08-24 11:03, Thomas M. DuBuisson wrote:
> > Yay, the multicore version pays off when the workload is non-trivial.
> > CPU utilization is still rather low for the -N2 case (70%). I think the
> > Haskell threads have an affinity for certain OS threads (and thus a
> > CPU). Perhaps it results in a CPU having both tokens of work and the
> > other having none?
> This must be obvious to everyone but the original thread-ring cannot
> possibly be faster with multiple OS thread since a thread can only be
> running if it has the token, otherwise it is just blocked on the token.
> If there are threads executing simultaneously, the token must at least
> be written to the shared cache if not to main memory. With the single
> threaded runtime, the token may never leave L1. The difference between
> -threaded -N1 and -nothreaded may be influenced by the effectiveness of
> prefetching the next thread (since presumably not all 503 threads can
> reside in L1).
Simon Marlow sez:
The thread-ring benchmark needs careful scheduling to get a speedup
on multiple CPUs. I was only able to get a speedup by explicitly
locking half of the ring onto each CPU. You can do this using
GHC.Conc.forkOnIO in GHC 6.8.x, and you'll also need +RTS -qm -qw.
Also make sure that you're not using the main thread for any part of
the main computation, because the main thread is a bound thread and
runs in its own OS thread, so communication between the main thread
and any other thread is slow.
Haskell-Cafe mailing list