[email protected]
[Top] [All Lists]

Re: [Haskell-cafe] Boxed Mutable Arrays

Subject: Re: [Haskell-cafe] Boxed Mutable Arrays
From: Edward Kmett
Date: Wed, 16 Dec 2009 14:58:02 -0500
On Wed, Dec 16, 2009 at 2:42 PM, Richard Kelsall <[email protected]> wrote:
Brad Larsen wrote:
I have considered using Data.IntMap to implement a sort of faux hash
table, e.g., introduce a Hashable class, and then use an IntMap to
lists of Hashable. ÂThe result would be a pure, persistent ``hash
table''. ÂIn such a case, however, lookup/insert/delete operations
would have average complexity logarithmic in the number of buckets.


I'll throw in an idea that has been nagging me about hash tables.
You could have a list of hash tables of increasing size. On an insert
collision in a smaller table all the entries get put into the next one
up. For a lookup you would have to check all the tables until you find
the hash.

e.g.

With three hash table in the list of these example sizes.
Table size 2^2 = 2 probable entries before collision.
     2^4 = 4
     2^6 = 8

h..h
...h.....h.h...h
h......h.......h....h...........h......h.h..........h...........


I expect if it's any good it has already been done.


Your asymptotics will take a hit if the height of the tower is logarithmic in the array size, and you'll take a constant hit for bounded-height towers.

Consider that if you have density such that the biggest array has an expected population, then your smaller arrays will be relatively drastically over populated, so you'll bump up to the next largest array in more cases, paying extra seeks against disproportionally heavily loaded buckets, whereas the time could have been more gainfully employed seeking against the more uniformly distributed larger bucket.
Â
If you have a decent hash function, you won't have lumps in your distribution, so having lumps in your bucket densities is purely a pessimization. Especially when you've ensured that those buckets that are going to be overpopulated are exactly the ones you are checking first.

You might consider looking at Witold Litwin's Sorted Linear Hash Table. I have a hackish implementation on hackage somewhere, but bear in mind it was the first piece of Haskell I'd ever written and the STM version I have bottlenecks on the root of the tree, so would be better served by multiple root TVars.

I can't find it on Hackage, but here it is:

http://comonad.com/haskell/thash/

-Edward Kmett

_______________________________________________
Haskell-Cafe mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/haskell-cafe
<Prev in Thread] Current Thread [Next in Thread>