java-user@lucene.apache.org
[Top] [All Lists]

Re: setTermInfosIndexDivisor

Subject: Re: setTermInfosIndexDivisor
From: "Ganesh"
Date: Thu, 25 Jun 2009 15:39:11 +0530
What about setTermInfosIndexDivisor?? 

Directory dir = FSDirectory.getDirectory(indexPath);
IndexReader reader = IndexReader.open(dir, true);
reader.setTermInfosIndexDivisor(5);

It supposed to load only one fifth of the terms available?? But there is no 
difference in memory consumption with / without settings this parameter.

I reopen the IndexReader whenever there is any document added to Index.  Do i 
need to set setTermInfosIndexDivisor(5); during re-opening of the index also. I 
tried this, first time it accepted and second time onwards it throws "terms 
already loaded" expection

Regards
Ganesh

----- Original Message ----- 
From: "Michael McCandless" <lucene@xxxxxxxxxxxxxxxxxx>
To: <java-user@xxxxxxxxxxxxxxxxx>; <simon.willnauer@xxxxxxxxx>
Sent: Thursday, June 25, 2009 2:56 PM
Subject: Re: setTermInfosIndexDivisor


setTermIndexInterval only helps appreciably when an index has a truly
immense number of terms (often, "by accident" eg your document
filtering/analysis process accidentally allowed binary terms into the
index); it's meant primarily as a "safety" for such situations.

If you run CheckIndex, it prints the number of unique terms per segment.

The other big things that use RAM while searching are 1) deleted docs
(do you have any deletions?), 2) norms (have you disabled norms for
fields that don't actually require it), and 3) FieldCache (used when
you sort by field instead of relevance).

Mike

On Thu, Jun 25, 2009 at 4:40 AM, Simon
Willnauer<simon.willnauer@xxxxxxxxxxxxxx> wrote:
> Hey there,
>
> On Thu, Jun 25, 2009 at 9:10 AM, Ganesh<emailgane@xxxxxxxxxxx> wrote:
>> Hello all,
>>
>> I am using Lucene v2.4.1
>>
>> 1)
>> I have build multiple indexes of total 30 million documents. My memory limit 
>> is 512 MB. In this case i am getting frequently OOME. If i increased the 
>> memory limit to 1 GB / 1.5 GB then it is working fine. My point is it will 
>> also will get exhausted when it reaches 60 / 90 million documents.
>>
>> I thought to use setTermInfosIndexDivisor, but even then the memory 
>> consumption is same. This parameter has no effect. Whether this parameter 
>> should be set while building index? I build the index using default value. 
>> After hitting OOME i am setting this.
> I would be curious what you do to your index. do you have a lot of
> pending deletes? do you call optimize frequently? In which situations
> do you hit the OOM?
>>
>> Directory dir = FSDirectory.getDirectory(indexPath);
>> IndexReader reader = IndexReader.open(dir, true);
>> reader.setTermInfosIndexDivisor(5);
> Loaded terms might not dominate your memory consumption in side
> lucene. Again, you should provide more information of indexing, the
> environment and the situation where the error occurs.
>
> simon
>>
>> 2)
>> IndexWriter.setTermIndexInterval should be set while creating the index? If 
>> i build the index with default value, After some time if i use this 
>> parameter, Whether there will be some effect?
>>
>> Regards
>> Ganesh
>>
>> Send instant messages to your online friends http://in.messenger.yahoo.com
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
>> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>