Hi Don't use JDBM if you want a BTree it is too slow... Been there.
Use Xindice's Filer class: org.apache.xindice.core.filer.Filer and
instantiate a org.apache.xindice.core.filer.BTreeFiler which is a whole lot
JDBM and Xindice are comparable in reading speed but surely not in writing.
Check out my implementation of the BTree:
On Wed, Jul 30, 2008 at 10:44 PM, Jason Rutherglen <
> A possible open source solution using a page based database would be to
> store the documents in http://jdbm.sourceforge.net/ which offers BTree,
> Hash, and raw page based access. One would use a primary key type of
> persistent ID to lookup the document data from JDBM.
> Would be a good Lucene project to implement and I think a good solution for
> Ocean LUCENE-1313. Storing documents in Lucene is fine but for a realtime
> search index with many documents being deleted a lot of garbage builds up.
> Frequent merging of documents files becomes IO intensive.
> Of course one issue with JDBM which I am not sure other SQL page based
> systems do is load individual fields directly from disk rather than load
> entire page into RAM first, then load the pages. Maybe it does not matter.
> On Mon, Jul 28, 2008 at 9:53 PM, John Evans <john@xxxxxxxxxxx> wrote:
> > Hi All,
> > I have successfully used Lucene in the "tradtiional" way to provide
> > full-text search for various websites. Now I am tasked with developing a
> > data-store to back a web crawler. The crawler can be configured to
> > retrieve
> > arbitrary fields from arbitrary pages, so the result is that each
> > may have a random assortment of fields. It seems like Lucene may be a
> > natural fit for this scenario since you can obviously add arbitrary
> > to each document and you can store the actually data in the database.
> > done some research to make sure that it would meet all of our individual
> > requirements (that we can iterate over documents, update (delete/replace)
> > documents, etc.) and everything looks good. I've also seen a couple of
> > references around the net to other people trying similar things...
> > I know it's not meant to be used this way, so I thought I would post here
> > and ask for guidance? Has anyone done something similar? Is there any
> > specific reason to think this is a bad idea?
> > The one thing that I am least certain about his how well it will scale.
> > may reach the point where we have tens of millions of documents and a
> > percentage of those documents may be relatively large (10k-50k each). We
> > actually would NOT be expecting/needing Lucene's normal extreme fast text
> > search times for this, but we would need reasonable times for adding new
> > documents to the index, retrieving documents by ID (for iterating over
> > documents), optimizing the index after a series of changes, etc.
> > Any advice/input/theories anyone can contribute would be greatly
> > appreciated.
> > Thanks,
> > -
> > John
Marcus Herou CTO and co-founder Tailsweep AB