[email protected]
[Top] [All Lists]

Re: cache persistent Hits

Subject: Re: cache persistent Hits
From: "Erick Erickson"
Date: Tue, 26 Sep 2006 14:46:22 -0400
Well, my index is over 1.4G, and others are reporting very large indexes in
the 10s of gigabytes. So I suspect your index size isn't the issue. I'd be
very, very, very surprised if it was.

Three things spring immediately to mind.

First, opening an IndexSearcher is a slow operation. Are you opening a new
IndexSearcher for each query? If so, don't <G>. You can re-use the same
searcher across threads without fear and you should *definitely* keep it
open between queries.

Second, your query could just be very, very interesting. It would be more
helpful if you posted an example of the code where you take your timings
(including opening the IndexSearcher).

Third, if you're using a Hits object to iterate over many documents, be
aware that it re-executes the query every hundred results or so. You want to
use one of the  HitCollector/TopDocs/TopDocsCollector classes if you are
iterating over all the returned documents. And you really *don't* want to do
an IndexReader.doc(doc#) or Searcher.doc(doc#) on every document.

If none of this helps, please post some code fragments and I'm sure others
will chime in.


On 9/26/06, Gaston <[email protected]> wrote:


Lucene has itself  volatile caching mechanism provided by a weak
HashMap. Is there a possibilty to serialize the Hits Object? I think of
a HashMap that for each found result, caches the first 100 results. Is
it possible to implement such a feature or is there such an extension?
My problem is that the searching of my application with an index with
the size of 212MB takes to much time, despite I set the BooleanOperator
from OR to AND

I am happy about every suggestion.



To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>