[Top] [All Lists]

## [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine E

 Subject: [Haskell-cafe] Re: Code and Perf. Data for Prime Finders was: Genuine Eratosthenes sieve Will Ness Tue, 29 Dec 2009 03:09:09 +0000 UTC
 ```apfelmus quantentunnel.de> writes: > > Dave Bayer wrote: > > What I'm calling a "venturi" > > > > venturi :: Ord a => [[a]] -> [a] > > > > merges an infinite list of infinite lists into one list, under the > > assumption that each list, and the heads of the lists, are in > > increasing order. > > > > I wrote this as an experiment in coding data structures in Haskell's > > lazy evaluation model, rather than as explicit data. The majority of the > > work done by this code is done by "merge"; the multiples of each prime > > percolate up through a tournament consisting of a balanced tree of > > suspended "merge" function calls. In order to create an infinite lazy > > balanced tree of lists, the datatype > > > > data List a = A a (List a) | B [a] > > > > is used as scaffolding. One thinks of the root of the infinite tree as > > starting at the leftmost child, and crawling up the left spine as > > necessary. > > After some pondering, the List a data structure for merging is really > ingenious! :) Here's a try to explain how it works: > > The task is to merge a number of sorted and infinite lists when it's > known that their heads are in increasing order. In particular, we want > to write > > primes = (2:) \$ diff [3..] \$ venturi \$ map multiple primes > > Thus, we have a (maybe infinite) list > > xss = [xs1, xs2, xs3, ...] > > of infinite lists with the following properties > > all sorted xss > sorted (map head xss) > > where sorted is a function that returns True if the argument is a > sorted list. A first try to implement the merging function is > > venturi xss = foldr1 merge xss > = xs1 `merge` (xs2 `merge` (xs3 `merge` ... > > where merge is the standard function to merge to sorted lists. > > However, there are two problems. The first problem is that this doesn't > work for infinite lists since merge is strict in both arguments. But > the property head xs1 < head xs2 < head xs3 < ... we missed to exploit > yet can now be used in the following way > > venturi xss = foldr1 merge' xss > > merge' (x:xt) ys = x : merge xt ys > > In other words, merge' is biased towards the left element > > merge' (x:_|_) _|_ = x : _|_ > > which is correct since we know that (head xs < head ys). > > The second problem is that we want the calls to merge to be arranged > as a balanced binary tree since that gives an efficient heap. It's not > so difficult to find a good shape for the infinite tree, the real > problem is to adapt merge' to this situation since it's not associative: > > ...... > > The problem is that the second form doesn't realize that y is also > smaller than the third argument. In other words, the second form has to > treat more than one element as "privileged", namely x1,x2,... and y. > This can be done with the aforementioned list data structure > > data People a = VIP a (People a) | Crowd [a] > > The people (VIPs and crowd) are assumed to be _sorted_. Now, we can > start to implement > > merge' :: Ord a => People a -> People a -> People a Hi, ... replying to a two-years-old post here, :) :) and after consulting the full "VIP" version in haskellwiki/Prime_Numers#Implicit_Heap ... It is indeed the major problem with the merged multiples removing code (similar one to Richard Bird's code from Melissa O'Neill's JFP article) - the linear nature of foldr, requiring an (:: a->b->b) merge function. To make it freely composable to rearrange the list into arbitrary form tree it must indeed be type uniform (:: a->a->a) first, and associative second. The structure of the folded tree should be chosen to better suit the primes multiples production. I guestimate the total cost as Sum (1/p)*d, where p is a generating prime at the leaf, and d the leaf's depth, i.e. the amount of merge nodes its produced multiple must pass on its way to the top. The structure used in your VIP code, 1+(2+(4+(8+...))), can actually be improved upon with another, (2+4)+( (4+8)+( (8+16)+...)), for which the estimated cost is about 10%-12% lower. This can be expressed concisely as the following: primes :: () -> [Integer] primes () = 2:primes' where primes' = [3,5] ++ drop 2 [3,5..] `minus` comps mults = map (\p-> fromList [p*p,p*p+2*p..]) \$ primes' (comps,_) = tfold mergeSP (pairwise mergeSP mults) fromList (x:xs) = ([x],xs) tfold f (a: ~(b: ~(c:xs))) = (a `f` (b `f` c)) `f` tfold f (pairwise f xs) pairwise f (x:y:ys) = f x y : pairwise f ys mergeSP (a,b) ~(c,d) = let (bc,b') = spMerge b c in (a ++ bc, merge b' d) where spMerge [email protected](x:xs) [email protected](y:ys) = case compare x y of LT -> (x:c,d) where (c,d) = spMerge xs w EQ -> (x:c,d) where (c,d) = spMerge xs ys GT -> (y:c,d) where (c,d) = spMerge u ys spMerge u [] = ([], u) spMerge [] w = ([], w) with ''merge'' and ''minus'' defined in the usual way. Its run times are indeed improved 10%-12% over the VIP code from the haskellwiki page. Testing was done by running the code, interpreted, inside GHCi. The ordered "split pairs" representing ordered lists here as pairs of a known, finite (so far) prefix and the rest of list, form a _monoid_ under mergeSP. Or with wheel, primes :: () -> [Integer] primes () = 2:3:5:7:primes' where primes' = [11,13] ++ drop 2 (rollFrom 11) `minus` comps mults = map (\p-> fromList \$ map (p*) \$ rollFrom p) \$ primes' (comps,_) = tfold mergeSP (pairwise mergeSP mults) fromList (x:xs) = ([x],xs) rollFrom n = let x = (n-11) `mod` 210 (y,_) = span (< x) wheelSums in roll n \$ drop (length y) wheel wheelSums = roll 0 wdiffs roll = scanl (+) wheel = wdiffs ++ wheel wdiffs = 2:4:2:4:6:2:6:4:2:4:6:6:2:6:4:2:6:4:6:8:4:2:4:2: 4:8:6:4:6:2:4:6:2:6:6:4:2:4:6:2:6:4:2:4:2:10:2:10:wdiffs Now _this_, when tested as interpreted code in GHCi, runs about 2.5x times faster than Priority Queue based code from Melissa O'Neill's ZIP package, with about half used memory reported, in producing 10,000 to 300,000 primes. It is faster than BayerPrimes.hs from the ZIP package too, in the tested range, at about 35 lines of code. _______________________________________________ Haskell-Cafe mailing list [email protected] http://www.haskell.org/mailman/listinfo/haskell-cafe ```
 Current Thread [Haskell-cafe] Re: Code and Perf. Data for Prime Finders (was: Genuine Eratosthenes sieve), Will Ness <=