java-user@lucene.apache.org
[Top] [All Lists]

Re: How to serach in sentence and dispaly the whole sentence

Subject: Re: How to serach in sentence and dispaly the whole sentence
From: anton feldmann
Date: Thu, 27 Apr 2006 00:20:00 +0200
Are the names of a field in a document unique or can i make a field with the name "sentence" for each sentence in an text document?

Grant Ingersoll schrieb:
Anton,

I think there are at least a couple of ways of doing this. I assume you have a program that does sentence detection already, as Lucene does not provide this. If not, I am sure a search of the web will find one that has high accuracy.
You can:
1. Index each sentence as a separate Document. You will need a field on the Document relating it back to the overall file so you can reconstruct it. 2. As you index, insert sentence/paragraph boundary markers into your index and then use the SpanQuery functionality. Search this mail archive for sentence boundary detection and Span Query (try the dev list too). I think there was a discussion between me, Doug and Hoss on how to do this. 3. Do search as you do now and then post process to figure out what sentence it came from. This will be inefficient, but I don't know what your requirements are that way, so it may work for you.

There are probably other ways too.

anton feldmann wrote:
I intend, to make a search, to find a word or a word pair
in  a sentence or a paragraph. But then the sentence should be indicated
as a whole. The question relates to the fact, that I need to extend Lucene in such a way that this is possible. But where to I make a start, because I have no idea, how I have to change the IndexFile, whether that conforms with the Lucene Team.

cheers

anton feldmann


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>