java-user@lucene.apache.org
[Top] [All Lists]

Re: Problem using Lucene on Ubuntu

Subject: Re: Problem using Lucene on Ubuntu
From: Grant Ingersoll
Date: Mon, 18 Feb 2008 09:22:12 -0500
Good point Jan!

On Feb 18, 2008, at 9:00 AM, Jan Peter Stotz wrote:

Grant Ingersoll wrote:

Note: ENCODING is whatever encoding the file is in, as in "UTF-8", if that is what your files are in.

I think there is a misunderstanding, the WordExtractor extracts text from MS Word (.doc) files. Those files are binary and therefore does not have an encoding. I would print out the extracted text into a plain text files and compare if there are differences between the file generated on Windows and Linux/Ubuntu. This allows to determine if this is a WordExtractor or a Lucene problem.

Jan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>