[email protected]
[Top] [All Lists]

Re: Faking index merge by modifying segments file?

Subject: Re: Faking index merge by modifying segments file?
From: Otis Gospodnetic
Date: Wed, 2 Nov 2005 03:47:40 -0800 PST

--- Paul Elschot <[email protected]> wrote:

> On Tuesday 01 November 2005 08:51, Otis Gospodnetic wrote:
> > Hello,
> > 
> > I spent most of today talking to some people about Lucene, and one
> of
> > them said how they would really like to have an "instantaneous
> index
> > merge", and how he is thinking he could achieve that by simply
> opening
> > segments file of one index, and adding segment names of the other
> > index/indices, plus adjusting the segment size (SegSize in
> > fileformats.html), thus creating a single (but unoptimized) index.
> > 
> > Any reactions to that?
> > 
> > I imagine this isn't quite that simple to implement, as one would
> have
> > to renumber all documents, in order to avoid having multiple
> documents
> > with the same document id.
> > 
> > Can anyone think of any other problems with this approach, or
> perhaps
> > offer ideas for possible document renumbering?
> Document numbers within segments are determined dynamically in the
> index reader, so these should not be a problem. Each segment simply
> numbers
> its documents from zero.

Uh, and I always thought they were stored in the index.  Aren't they
stored in the .fdx and .fdt files?  And shouldn't they also be linked
from some place.  I see a mention of document numbers in information
about the .frq.

> Iirc the segment names determine the order
> of the segments for an index reader.
> I think creating a new index by adding segments from an existing one
> should
> be fairly straightforward. Some care will be needed to avoid
> clashes in the segment names.

You mean ensuring that segment _x from index A doesn't clash with _x
from index B?  Segment names are written only in the segments file, I
believe, so I think if I detect that _x is already taken, I could
simply rename it to something (e.g. _foo) that hasn't been taken yet,
and remember to use that segment name when writing the segments file.

> Also what should happen with
> the index from which the segments are taken? Should the shared
> segments be copied between indexes?

I can simply distroy the original index once I've created a fakely
merged one.  I'm not sure what you mean by shared segments.  If I have
two indices, A and B, then each of them will have its own set of
segments with no segments in common.

> It's possible to share segments between indexes when the file system
> allows files to be present in multiple directories.

Oh, are you saying that I could just leave segments where they are and
use something like symlinks to point to them from a new index?

A: <index files for A>
B: <index files for B>
C: <symlinks to index files for A>
   <symlinks to index files for B>
   <segments file with segment names for A and B>



To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>