Mark Miller wrote:
Glen Newton wrote:
Hey Mr McCandless, whats up with that? Can IndexWriter be made to be
as efficient as using Multiple Writers? Where do you suppose the
hold up is? Number of threads doing merges? Sync contention? I hate
the idea of multiple IndexWriter/Readers being more efficient than a
single instance. In an ideal Lucene world, a single instance would
hide the complexity and use the number of threads needed to match
multiple instance performance.
2008/10/23 Mark Miller <[email protected]>:
It sounds like you might have some thread synchronization issues
Lucene. To simplify things a bit, you might try just using one
If I remember right, the IndexWriter is now pretty efficient, and
isn't much need to index to smaller indexes and then merge. There
is a lot
of juggling to get wrong with that approach.
While I agree it is easier to have a single IndexWriter, if you have
multiple cores you will get significant speed-ups with multiple
IndexWriters, even with the impact of merging at the end.
#IndexWriters = # physical cores is an reasonable rule of thumb.
General speed-up estimate: # cores * 0.6 - 0.8 over single
When I get around to it, I'll re-run my tests varying the # of
IndexWriters & post.
Honestly this surprises me: I would expect a single IndexWriter with
multiple threads to be as fast (or faster, considering the extra merge
time at the end) than multiple IndexWriters.
IndexWriter's concurrency has improved alot lately, with
ConcurrentMergeScheduler. The only serious operation that is not
concurrent is flushing the RAM buffer as a new segment; but in a well
tuned indexing process (large RAM buffer) the time spent there should
be quite small, especially with a fast IO system.
Actually, addIndexes is also not concurrent in that if multiple
threads call it, only one can run at once. But normally you would
call it with all the indices you want to add, and then the merging is
Glen, in your single IndexWriter test, is it possible there was
accidental thread contention during document preparation or analysis?
I do agree that we should strive to have enough concurrency in
IndexWriter and IndexReader so that you don't get any real benefit by
using separate instances. Eg in 2.4.0 you can now open read-only
IndexReaders, and on Unix you can use NIOFSDirectory, both of which
should go a long ways towards fixing IndexReader's concurrency issue.
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]