sorry to bring up an old chestnut, but I’m trying to improve my Haskell-fu by
writing a small program, and chose Peter Norvig’s spelling checker as my
exercise material (http://norvig.com/spell-correct.html).
While I’ve gotten it working, and found it quite illuminating, I also found it
to to be very slow. And then I discovered that, of course, others have been
here well before me ( and ). Those discussions were very interesting, but
the code they refer to is mostly not available, so the most I’ve gotten out of
it so far is that:
(a) I should be using strict left folds and strict Map insertions for the
word frequency map (this shaved off about a second: ~5s -> ~4s for a single
word on my 2.4GHz MacBook Pro, GHC 6.10.4)
(b) I should probably be using ByteString’s
(c) Using Set’s for the edit permutations probably isn’t worth it (although I
found using plain lists made it about a second slower)
(b) is difficult because I’ve used matching patterns plus list comprehensions
to generate the potential edits, and I really like how elegantly it pans out
that way. Because ByteString’s are not lists, I can’t see a way to keep the
current structure and use them.
The code is at  (link to version at time of post). Profiling  shows:
$ ./spelling becuase +RTS -p
becuase -> because
$ cat spelling.prof
total time = 4.02 secs (201 ticks @ 20 ms)
total alloc = 1,544,257,792 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
train Main 52.7 19.7
readFile Main 28.9 8.6
wordsBy Main 10.9 49.5
toLower Main 7.0 21.8
So it appears that “train" (building the freq map) and “readFile” in “nwords"
are the places to hone. I will look at using Bloom Filters or Trie’s instead of
Data.Map, but I wonder if readFile should be taking nearly %30 of the run time,
even for a 6MB file?
Sorry to dump such a long post on the list — I’ll understand if no one can be
bothered rehashing this. But, in summary I’d like to know:
(a) how could I use ByteString’s for this to speed up I/O and reduce memory
usage without losing the nice readability?
(b) should readFile be so slow?
(c) any other tips
Possibly all my questions could be answered if someone has the code from the
Haskell-Cafe mailing list