[email protected]
[Top] [All Lists]

Re: bytecount as String and prefix length

Subject: Re: bytecount as String and prefix length
From: Marvin Humphrey
Date: Tue, 1 Nov 2005 20:52:28 -0800
On Nov 1, 2005, at 9:51 AM, Doug Cutting wrote:

Another approach might be to, instead of converting to UTF-8 to strings right away, change things to convert lazily, if at all.
During index merging such conversion should never be needed.

There ought to be some gains possible there, then. No predictions as to how much, though.
You needn't do this systematically throughout Lucene, but only where it makes a big difference. For example, if you could avoid strings in SegmentMerger.mergeTermInfos() it might make a huge difference. This might be as simple as changing SegmentMergeInfo to use a TermBuffer instead of a Term. Does that make sense?
Abundant sense. I'm not as familiar with SegmentMerger as I am with
other parts of the org.apache.lucene.index package, because I haven't
ported it yet. But conceptually I understand exactly why this should
require fewer resources.
I'll take a swing at SegmentMerger and submit a comprehensive diff.

Thanks for the suggestions,

Marvin Humphrey
Rectangular Research

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>