|
|
[
http://issues.apache.org/jira/browse/LUCENE-140?page=comments#action_12376780 ]
Jason Lambert commented on LUCENE-140:
--------------------------------------
I was having this problem intermittently while indexing over multiple threads
and I have found that the following steps can cause this error (with Lucene 1.3
and 1.4.x):
- Open an IndexReader (#1) over an existing index (this reader is used for
searching while updating the index)
- Using this reader (#1) do a search for the document(s) that you would like to
update; obtain their document ID numbers
- Create an IndexWriter and add several new documents to the index (for me,
this writing is done in other threads) (*)
- Close the IndexWriter (*)
- Open another IndexReader (#2) over the index
- Delete the previously found documents by their document ID numbers using
reader #2
- Close the #2 reader
- Create another IndexWriter (#2) and re-add the updated documents
- Close the IndexWriter #2
- Close the original IndexReader (#1) and open a new reader for general
searching
If I ensure that the steps marked with an asterisk (*) do not happen (i.e.
using thread synchronization), I never get this error. Otherwise, it will
happen intermittently while closing the second IndexWriter (#2) (sometimes I
get an ArrayIndexOutOfBoundsException during the deletion). These 'extra'
writes cause the initial 'segments' file to be updated after which it is
re-read while opening the second IndexReader (#2).
This can end up deleting documents using possibly non-existent IDs, most likely
causing the index corruption that this error signals.
> docs out of order
> -----------------
>
> Key: LUCENE-140
> URL: http://issues.apache.org/jira/browse/LUCENE-140
> Project: Lucene - Java
> Type: Bug
> Components: Index
> Versions: unspecified
> Environment: Operating System: Linux
> Platform: PC
> Reporter: legez
> Assignee: Lucene Developers
> Attachments: bug23650.txt, corrupted.part1.rar, corrupted.part2.rar
>
> Hello,
> I can not find out, why (and what) it is happening all the time. I got an
> exception:
> java.lang.IllegalStateException: docs out of order
> at
> org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
> at
> org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:191)
> at
> org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:172)
> at
> org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:135)
> at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:88)
> at
> org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:341)
> at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:250)
> at Optimize.main(Optimize.java:29)
> It happens either in 1.2 and 1.3rc1 (anyway what happened to it? I can not
> find
> it neither in download nor in version list in this form). Everything seems
> OK. I
> can search through index, but I can not optimize it. Even worse after this
> exception every time I add new documents and close IndexWriter new segments is
> created! I think it has all documents added before, because of its size.
> My index is quite big: 500.000 docs, about 5gb of index directory.
> It is _repeatable_. I drop index, reindex everything. Afterwards I add a few
> docs, try to optimize and receive above exception.
> My documents' structure is:
> static Document indexIt(String id_strony, Reader reader, String
> data_wydania,
> String id_wydania, String id_gazety, String data_wstawienia)
> {
> Document doc = new Document();
> doc.add(Field.Keyword("id", id_strony ));
> doc.add(Field.Keyword("data_wydania", data_wydania));
> doc.add(Field.Keyword("id_wydania", id_wydania));
> doc.add(Field.Text("id_gazety", id_gazety));
> doc.add(Field.Keyword("data_wstawienia", data_wstawienia));
> doc.add(Field.Text("tresc", reader));
> return doc;
> }
> Sincerely,
> legez
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-dev-help@xxxxxxxxxxxxxxxxx
|
|