[email protected]
[Top] [All Lists]

Performance issues with ConjunctionScorer

Subject: Performance issues with ConjunctionScorer
From: Andrzej Bialecki
Date: Tue, 22 Nov 2005 12:49:45 +0100

I've been profiling a Nutch installation, and to my surprise the largest amount of throwaway allocations and the most time spent was not in Nutch specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method. This method operates on a LinkedList, which seems to be a huge bottleneck. Perhaps it would be possible to replace LinkedList with a table?
Nutch Summarizer also needlessly re-tokenizes the text over and over
again - perhaps it would be better to save already tokenized text in
parse_text, instead of the raw plain text? After all, the only use for
that text is to index it and then build the summaries.
Please see the profiles here:


Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>