|Subject:||Re: Proposal about Version API "relaxation"|
|Date:||Thu, 15 Apr 2010 14:52:57 +0300|
Well ... I must say that I completely disagree w/ dropping index structure back-support. Our customers will simply not hear of reindexing 10s of TBs of content because of version upgrades. Such a decision is key to Lucene adoption in large-scale projects. It's entirely not about whether Lucene is a content store or not - content is stored on other systems, I agree. But that doesn't mean reindexing it is tolerable.
Up until now, Lucene migrated my segments gradually, and before I upgraded from X+1 to X+2 I could run optimize() to ensure my index will be readable by X+2. I don't think I can myself agree to it, let alone convince all the stakeholders in my company who adopt Lucene today in numerous projects, to let go of such capability. We've been there before (requiring reindexing on version upgrades) w/ some offerings and customers simply didn't like it and were forced to use an enterprise-class search engine which offered less (and didn't use Lucene, up until recently !). Until we moved to Lucene ...
What's Solr's take on it?
I differentiate between structural changes and runtime changes. I, myself, don't mind if we let go of back-compat support for runtime changes, such as those generated by analyzers. For a couple of reasons, the most important ones are (1) these are not so frequent (but so is index structural change) and (2) that's a decision I, as the application developer, makes - using or not a newer version of an Analyzer. I don't mind working hard to make a 2.x Analyzer version work in the 3.x world, but I cannot make a 2.x index readable by a 3.x Lucene jar, if the latter doesn't support it. That's the key difference, in my mind, between the two. I can choose not to upgrade at all to a newer analyzer version ... but I don't want to be forced to stay w/ older Lucene versions and features because of that ... well people might say that it's not Lucene's problem, but I beg to differ. Lucene benefits from wider and faster adoption and we rely on new features to be adopted quickly. That might be jeopardized if we let go of that strong capability, IMO.
What we can do is provide an index migration tool ... but personally I don't know what's the difference between that and gradually migrating segments as they are merged, code-wise. I mean - it has to be the same code. Only an index migration tool may take days to complete on a very large index, while the ongoing migration takes ~0 time when you come to upgrade to a newer Lucene release.
And the note about Terrier requiring reindexing ... well I can't say it's a strength of it but a damn big weakness IMO.
About the release pace, I don't think we can suddenly release every 2 years ... makes people think the project is stuck. And some out there are not so fond of using a 'trunk' version and release it w/ their products because trunk is perceived as ongoing development (which it is) and thus less stable, or is likely to change and most importantly harder to maintain (as the consumer). So I still think we should release more often than not.
That's why I wanted to differentiate X and Y, but I don't mind if we release just X ... if that's so important to people. BTW Mike, Eclipse's releases are like Lucene, and in fact I don't know of so many projects that just release X ... many of them seem to release X.Y.
I don't understand why we're treating this as a "all or nothing" thing. We can let go of API back-compat, that clearly has no affect on index structure and content. We can even let go of index runtime changes for all I care. But I simply don't think we can let go of index structure back-support.
On Thu, Apr 15, 2010 at 1:12 PM, Michael McCandless <lucene@xxxxxxxxxxxxxxxxxx> wrote:
2010/4/15 Shai Erera <serera@xxxxxxxxx>:
|<Prev in Thread]||Current Thread||[Next in Thread>|
|Previous by Date:||[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters, Uwe Schindler (JIRA)|
|Next by Date:||Re: Proposal about Version API "relaxation", Robert Muir|
|Previous by Thread:||Re: Proposal about Version API "relaxation", Michael McCandless|
|Next by Thread:||Re: Proposal about Version API "relaxation", Robert Muir|
|Indexes:||[Date] [Thread] [Top] [All Lists]|