java-user@lucene.apache.org
[Top] [All Lists]

Re: Auto Slop

Subject: Re: Auto Slop
From: Grant Ingersoll
Date: Fri, 6 Jul 2007 10:16:11 -0400
FYI: Solr has a nice Analysis debugging tool that lets you see the results of running an analysis as it passes through each phase of the Analyzer. Some enterprising soul might want to make a contribution along these lines that could be added to the contrib. :-)

Cheers,
Grant


On Jul 3, 2007, at 5:02 PM, Walt Stoneburner wrote:

I've solved the problem, thanks to tips from Mark Miller and Ard
Schrijvers, and am simply recording it so that someone else walking
through the archives might get some benefit.

A while ago I had been working on a case-sensitive version of Lucene,
where with a prefix symbol, it was possible to indicate whether you
were interested in the token as-is, or the case insensitive version.
(For instance 'LET' is a real acronym, whereas 'let' is a stop word.)

I did this by writing a filter that injected two tokens for every one.
The problem was, that token.setPositionIncrement(0) wasn't being
called on the duplicate token.  As such, it did appear that there were
intermediate tokens between the real ones, when they should have
occupied the same logical position.

Adding that simple call made phrases work perfectly, even in the light
of all the modified functionality I've added.  Gotta love Lucene!

What would have been a helpful debugging tool, would have been if Luke
was capable of dumping the token positions.  Clearly it can
reconstruct a document in this manner, but it would have been extra
nice to see numerical positions they resided at, not just the sequence
of tokens.

Hope this helps someone else in the future,
-Walt Stoneburner,
http://www.wwco.com/~wls/blog/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>