On 03/11/2008, at 4:27 PM, Otis Gospodnetic wrote:
Why are you optimizing? Trying to make the search faster? I would
try to avoid optimizing during high usage periods.
I assume that the original, long-ago, decision to optimize was made to
improve searching performance.
One thing that you might not have tried is the constant re-opening
of the IndexReader, which you'll need to do if you want to see index
We do keep track of when the index has been updated and re-open
IndexReaders so that they see the updates instantly.
So you indexed once and then measured search performance? Or did
you measure indexing performance? I can't quite tell from your email.
And in one case you optimized before searching and in the other you
did not optimize?
Yes, I indexed once and then measured search performance. (The actual
algorithm used can be seen at http://confluence.atlassian.com/display/JIRACOM/Lucene+graphs)
For my current purposes I don't care about indexing performance.
1. Why does the merge factor of 4 appear to be faster than the
merge factor of
Faster for indexing or searching? If indexing, then it's because 4
means fewer segment merges than 2. If searching, then I don't know,
unless you had indexing and searching happening in parallel, which
then means less IO for 4.
For searching. The index and search should not have been happening in
parallel. However, multiple searches are occurring in parallel.
Did you index fit in RAM, by the way?
The machine has, I believe, 4 GB of RAM and the benchmark suite
reports than 700 MB were used, so it does appear to have fit into RAM.
2. Why does non-optimized searching appear to be faster than
once the index hits ~500,000 documents?
Not sure without seeing the index/machine.
The machine is an 8-core Mac Pro. If you'd like, I can provide the
indexes online somewhere. Or if you can provide pointers on what to
look for, I'm more than happy to investigate this myself.
It sounds like you were measuring search performance while at the
same time increasing the index size by incrementally adding more docs?
No documents were being added to the index while the searching was
being performed. I was trying to measure only the search performance.
20 reqs/sec sounds very low. How large is your index, how much RAM,
and how about heap size?
What were your queries like? random? from log?
The queries were generated by the ReutersQueryMaker. I am not sure
what the heap size used as various stages were. (I ran the benchmarks
over the weekend; they took several days.)
I'm confused by what exactly you did and measured, but it could just
be that I'm tired.
My apologies for not being clearer in my initial email. I appreciate
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]