java-user@lucene.apache.org
[Top] [All Lists]

Re: Indexing synonyms for multiple words

Subject: Re: Indexing synonyms for multiple words
From: Michael McCandless
Date: Mon, 2 Mar 2009 11:41:22 -0500

Since Lucene doesn't represent/store end position for a token, I don't think the index can properly represent SYN spanning two positions?

I suppose you could encode this into payloads, and create a custom query that would look at the payload to enforce the constraint.

Or, if you switch to doing SYN expansion only at runtime (not adding it to the index), that might work.

Mike

Uwe Schindler wrote:

I think his problem is, that "SYN" is a synonym for the phrase "WORD1
WORD2". Using these positions, a phrase like "SYN WORD2" would also match
(or other problems in queries that depend on order of words).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@xxxxxxxxxxx

-----Original Message-----
From: Michael McCandless [mailto:lucene@xxxxxxxxxxxxxxxxxx]
Sent: Monday, March 02, 2009 4:07 PM
To: java-user@xxxxxxxxxxxxxxxxx
Subject: Re: Indexing synonyms for multiple words


Shouldn't WORD2's position be 1 more than your SYN?

Ie, don't you want these positions?:

   WORD1  2
   WORD2  3
   SYN 2

The position is the starting position of the token; Lucene doesn't
store an ending position

Mike

Sumukh wrote:

Hi,

I'm fairly new to Lucene. I'd like to know how we can index synonyms
for
multiple words.

This is the scenario:

Consider a sentence: AAA BBB WORD1 WORD2 EEE FFF GGG.

Now assume the two words combined WORD1 WORD2 can be replaced by
another
word SYN.

If I place SYN after WORD1 with positionIncrement set to 0, WORD2 will
follow SYN,
which is incorrect; and the other way round if I place it after WORD2.

If any of you have solved a similar problem, I'd be thankful if you
could
share some light on
the solution.

Regards,
Sumukh


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>