Re: Pre-filtering for expensive query

Subject: Re: Pre-filtering for expensive query
From: Matt Ronge
Date: Sat, 30 Aug 2008 11:22:50 -0500

On Aug 30, 2008, at 6:13 AM, Paul Elschot wrote:

Op Saturday 30 August 2008 03:34:01 schreef Matt Ronge:
Hi all,

I am working on implementing a new Query, Weight and Scorer that is
expensive to run. I'd like to limit the number of documents I run
this query on by first building a candidate set of documents with a
boolean query. Once I have that candidate set, I was hoping I could
build a filter off of it, and issue that along with my expensive
query. However, after reading the code I see that filtering is done
during the search, and not before hand.

Correct. I suppose you mean the filtering code in IndexSearcher?

Yes, that's exactly what I mean.

So my initial boolean query
won't help in limiting the number of documents scored by my expensive

The trick of filtering is the use of skipTo() on both the filter and
the scorer to skip superfluous work as much as possible.
So when you make your scorer implement skipTo() efficiently,
filtering it should reduce the amount of scoring done.

Implementing skipTo() efficiently is normally done by using
TermScorer.skipTo() on the leafs of a scorer structure. So,
in case you implement your own TermScorer, take a serious
look at TermScorer.skipTo().

Normally, score value computations are not the bottleneck,
but accessing the index is, and this is where skipTo() does
the real work. At the moment avoiding score value computations
is a nice extra.

I was not aware of this. Where can I find the code that uses the filter to determine what values to feed to skipTo (I'm trying to get a better understand of the Lucene source)?

Or should I just implement something myself in a custom scorer?

In case you have a better way than skipTo(), or something
to improve on this issue to allow a Filter as clause to BooleanQuery:
let us know.

Thanks, if the skipTo approach doesn't work, I'll take a look at this.


