|
|
Are the names of a field in a document unique or can i make a field with
the name "sentence" for each sentence in an text document?
Grant Ingersoll schrieb:
Anton,
I think there are at least a couple of ways of doing this. I assume
you have a program that does sentence detection already, as Lucene
does not provide this. If not, I am sure a search of the web will
find one that has high accuracy.
You can:
1. Index each sentence as a separate Document. You will need a field
on the Document relating it back to the overall file so you can
reconstruct it.
2. As you index, insert sentence/paragraph boundary markers into your
index and then use the SpanQuery functionality. Search this mail
archive for sentence boundary detection and Span Query (try the dev
list too). I think there was a discussion between me, Doug and Hoss
on how to do this.
3. Do search as you do now and then post process to figure out what
sentence it came from. This will be inefficient, but I don't know
what your requirements are that way, so it may work for you.
There are probably other ways too.
anton feldmann wrote:
I intend, to make a search, to find a word or a word pair
in a sentence or a paragraph. But then the sentence should be indicated
as a whole. The question relates to the fact, that I need to extend
Lucene
in such a way that this is possible. But where to I make a start,
because
I have no idea, how I have to change the IndexFile, whether that
conforms with the Lucene Team.
cheers
anton feldmann
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
|
|