That's what I thought.
So, that leads me to . . . is it necessarily all that much faster to index
in an incremental update fashion, rather than just clobbering the old index?
On Mon, Nov 10, 2008 at 12:52 PM, Erick Erickson <erickerickson@xxxxxxxxx>wrote:
> You have to have indexed something that uniquely identifies the
> document in order to know what the old one is. Really, this is
> the same question as updating, isn't it? If you could update
> a document in place, you'd have to know what document
> that was. If you know that information, you know which
> document to delete.
> Note that lucene has no built-in document recognition. If I
> add the same document to the index twice, Lucene will
> happily consider them two *separate* documents. You have
> to code your own notion of document meta-id (as distinct
> from the Lucene doc id). It could be the URL, the file path
> on disk, a document ID from your organization... the
> possibilities are endless. Which is why Lucene can't do that
> for you.
> On Mon, Nov 10, 2008 at 2:22 PM, ChadDavis <chadmichaeldavis@xxxxxxxxx
> > In the FAQ's it says that you have to do a manual incremental update:
> > How do I update a document or a set of documents that are already
> > >
> > > There is no direct update procedure in Lucene. To update an index
> > > incrementally you must first *delete* the documents that were updated,
> > and
> > > *then re-add* them to the index.
> > >
> > How do I determine the existing document that matches the document I am
> > updating?