[email protected]
[Top] [All Lists]

Re: Performance of never optimizing

Subject: Re: Performance of never optimizing
From: Justus Pendleton
Date: Mon, 3 Nov 2008 16:49:32 +1100
On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote:
Why are you optimizing? Trying to make the search faster? I would try to avoid optimizing during high usage periods.

I assume that the original, long-ago, decision to optimize was made to improve searching performance.

One thing that you might not have tried is the constant re-opening of the IndexReader, which you'll need to do if you want to see index changes instantly.

We do keep track of when the index has been updated and re-open IndexReaders so that they see the updates instantly.

So you indexed once and then measured search performance? Or did you measure indexing performance? I can't quite tell from your email. And in one case you optimized before searching and in the other you did not optimize?

Yes, I indexed once and then measured search performance. (The actual algorithm used can be seen at http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs) For my current purposes I don't care about indexing performance.

1. Why does the merge factor of 4 appear to be faster than the merge factor of

Faster for indexing or searching? If indexing, then it's because 4 means fewer segment merges than 2. If searching, then I don't know, unless you had indexing and searching happening in parallel, which then means less IO for 4.

For searching. The index and search should not have been happening in parallel. However, multiple searches are occurring in parallel.

Did you index fit in RAM, by the way?

The machine has, I believe, 4 GB of RAM and the benchmark suite reports than 700 MB were used, so it does appear to have fit into RAM.

2. Why does non-optimized searching appear to be faster than optimized searching
once the index hits ~500,000 documents?

Not sure without seeing the index/machine.

The machine is an 8-core Mac Pro. If you'd like, I can provide the indexes online somewhere. Or if you can provide pointers on what to look for, I'm more than happy to investigate this myself.

It sounds like you were measuring search performance while at the same time increasing the index size by incrementally adding more docs?

No documents were being added to the index while the searching was being performed. I was trying to measure only the search performance.

20 reqs/sec sounds very low. How large is your index, how much RAM, and how about heap size?
What were your queries like? random?  from log?

The queries were generated by the ReutersQueryMaker. I am not sure what the heap size used as various stages were. (I ran the benchmarks over the weekend; they took several days.)

I'm confused by what exactly you did and measured, but it could just be that I'm tired.

My apologies for not being clearer in my initial email. I appreciate the help,


To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>