|
|
Hi Hossman,
Thanks for your reply.when i index the search fields in my
lucene document,it occupy 20% of the original size.how can i reduce the
reduce the index size.
hossman_lucene wrote:
>
>
> : I need to store all the attributes of the document i index as part of
> the
> : index. And I need to get the size of the files as close to 20% of the
> : original size as possible. If anyone can help with this I can pay a
> nominal
> : fee. Please contact me if anyone can help.
>
> Let's be clear about something here: Lucene is an "indexing" technology,
> not a compression technology. Lucene indexes *can* be smaller then the
> source corpus they index for a variety of reasons (most notably because
> the inverted index structure allows a more susinect represetation of the
> same words appearing in multiple docuemnts, but things like stop word
> removal play a big part as well) but this reduced size only applies to the
> *indexed* fields. If you "store" data in Lucene there is nothing
> intrinisic in the Lucene that makes it smaller. If you have a chunk of
> text, and you tell Lucene to store that text, and index that text, it
> doesn't matter how space efficient the "index" is, Lucene still still
> needs to store all of that text.
>
> Here's a few simple steps to achieve what you want, and i won't charge you
> anything for it...
>
> 1) take your existing code for building an index, and change it so that
> all of the options to "store" field values are commented out, and only the
> "indexed" fields exist.
> 2) index your corpus of size X, note the size of the index Y. compute
> the value Z such that: Z = (0.20 * X) - Y
> 3) go find a compression library A that can compress your corpus of size
> X down to size Z
> 4) go edit your orriginal code to use algorithm A to compress all of
> your stored fields before adding them to Lucene.
>
> Viola!
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
>
>
>
--
View this message in context:
http://www.nabble.com/Need-Lucene-Compression-help----can-pay-nominal-fee-tf3881801.html#a11188737
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
|
|