[email protected]
[Top] [All Lists]

RE: Faking index merge by modifying segments file?

Subject: RE: Faking index merge by modifying segments file?
From: Otis Gospodnetic
Date: Wed, 2 Nov 2005 03:14:03 -0800 PST
Hello,

--- Robert Engels <[email protected]> wrote:

> Problem is the terms need to be sorted in a single segment.

Are you referring to Term Dictionary (.tis and .tii files as described
at http://lucene.apache.org/java/docs/fileformats.html )?  If so, is
that really true?

I don't have an optimized Lucene multi-file index handy to look at, but
.tis and .tii files are "per segment" files, so wouldn't a set of .tis
and .tii files from multiple indices be equivalent to a set of .tis and
.tii files from multiple segments of a single index?

For example, if we have two indices, A and B, both optimized, we have:

A: segA.tis   (this may contain terms bar and foo)
   segA.tii
   ...
   segments   (this would list segA)

B: segB.tis   (this may contain terms piggy and bank)
   segB.tii
   ...
   segments   (this would list segB)

Wouldn't that be the same as a single index, say index C:

C: segA.tis   (this may contain terms bar and foo)
   segA.tii
   segB.tis   (this may contain terms piggy and bank)
   segB.tii
   ...
   segments   (this would list segments segA and segB)


That is really what I am talking about: take all index files of index A
and all index files of segment B and stick them in a new index dir for
a new index C.  Then open segments files of index A and index B, pull
out segment names and other information from there, and write a new
segments file with that information in index dir for that new index C.

This sounds like it should be possible, except for docId clashes - if
index A had a document with Id 100 and index B also has a document with
Id 100, after my index file copying, index C will end up having 2
documents with Id 100, and that won't work.  So, documents in C would
have to be renumbered (re-assigned Ids), as they get renumbered during
optimization, but without rewriting all index files in index C.

Does this sound right?

Also, I may not need to actually copy/move files around, if I just make
use of sym/hard links.

Thanks,
Otis


> -----Original Message-----
> From: Otis Gospodnetic [mailto:[email protected]]
> Sent: Tuesday, November 01, 2005 1:52 AM
> To: [email protected]
> Subject: Faking index merge by modifying segments file?
> 
> 
> Hello,
> 
> I spent most of today talking to some people about Lucene, and one of
> them said how they would really like to have an "instantaneous index
> merge", and how he is thinking he could achieve that by simply
> opening
> segments file of one index, and adding segment names of the other
> index/indices, plus adjusting the segment size (SegSize in
> fileformats.html), thus creating a single (but unoptimized) index.
> 
> Any reactions to that?
> 
> I imagine this isn't quite that simple to implement, as one would
> have
> to renumber all documents, in order to avoid having multiple
> documents
> with the same document id.
> 
> Can anyone think of any other problems with this approach, or perhaps
> offer ideas for possible document renumbering?
> 
> Thanks,
> Otis
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]xxxxxxxxxxx
> For additional commands, e-mail: [email protected]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>