[email protected]
[Top] [All Lists]

Re: How can we know if 2 lucene indexes are same?

Subject: Re: How can we know if 2 lucene indexes are same?
From: "Jason Rutherglen"
Date: Fri, 5 Sep 2008 08:50:02 -0400
In Ocean I had to use a transaction log and execute everything that
way like SQL database replication.  Then let each node handle it's own
merging process.  Syncing the indexes is used to get a new node up to
speed, otherwise it's avoided for the reasons mentioned in the
previous email.

On Fri, Sep 5, 2008 at 8:33 AM, Michael McCandless
<[email protected]> wrote:
> Shalin Shekhar Mangar wrote:
>> Let me try to explain.
>> I have a master where indexing is done. I have multiple slaves for
>> querying.
>> If I commit+optimize on the master and then rsync the index, the data
>> transferred on the network is huge. An alternate way is to commit on
>> master,
>> transfer the delta to the slave and issue an optimize on the slave. This
>> is
>> very fast because less data is transferred on the network.
> Large segment merges will also send huge traffic.  You may just want to send
> all updates (document adds/deletes) to all slaves directly?  It'd be nice if
> you could somehow NOT sync the effects of segment merging, but do sync doc
> add/deletes... not sure how to do that.
>> However, we need to ensure that the index on the slave is actually in sync
>> with the master. So that on another commit, we can blindly transfer the
>> delta to the slave.
> I assume your app ensures that no deltas arrive to the slave while it's
> running optimize?  So then the question becomes (I think) "if two indices
> are identical to begin with, and I separately run optimize on each, will the
> resulting two optimized indices be identical?".
> By "in sync" you also require the final segment name (after optimize) is
> identical right?
> I think the answer is yes, but I'm not certain unless I think more about it.
>  Also this behavior is not "promised" in Lucene's API.
> Merges for optimize are now allowed to run concurrently (by default, with
> ConcurrentMergeScheduler), except for the final (< mergeFactor segments)
> merge, which waits until others have finished.  So if there are 7 obvious
> merges necessary to optimize, 3 will run concurrently, while 4 wait.  Those
> 4 then run as the merges finish over time, which may happen in different
> orders for each index and so different merges may run.  Then the final merge
> will run and I *think* the net number of merges that ran should always be
> the same and so the final segment name should be the same.
> Mike
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>