java-user@lucene.apache.org
[Top] [All Lists]

Re: Term numbering and range filtering

Subject: Re: Term numbering and range filtering
From: Michael McCandless
Date: Tue, 11 Nov 2008 15:55:45 -0500

Also, one nice optimization we could do with the "term number column- stride array" is do bit packing (borrowing from the PFOR code) dynamically.

Ie since we know there are X unique terms in this segment, when populating the array that maps docID to term number we could use exactly the right number of bits. Enumerated fields with not many unique values (eg, country, state) would take relatively little RAM. With LUCENE-1231, where the fields are stored column stride on disk, we could do this packing during index such that loading at search time is very fast.

Mike

Paul Elschot wrote:

Op Tuesday 11 November 2008 11:29:27 schreef Michael McCandless:

The other part of your proposal was to somehow "number" term text
such that term range comparisons can be implemented fast int
comparison.
...

  http://fontoura.org/papers/paramsearch.pdf

However that'd be quite a bit deeper change to Lucene.

The cheap version is hierarchical prefixing here:

http://wiki.apache.org/jakarta-lucene/DateRangeQueries

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>