[email protected]
[Top] [All Lists]

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/

Subject: Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/
From: Doug Cutting
Date: Thu, 17 Nov 2005 14:03:14 -0800
Yonik Seeley wrote:
I'm not sure I understand why this is.  epsilon is based on 1,
(smallest number such that 1-epsilon != 1, right?).  What's special
about 1?
1 is special for multiplication, but, you're right, not so special for
addition, the operation in question. The thing that makes addition
accurate is more mantissa bits. Epsilon is proportional to the number
of mantissa bits. So smaller epsilons will give us more accuracy, but,
you're right, a particular epsilon value won't guarantee us accuracy.
I'm worried about the impact of things like this:
 smallfloat(10) + smallfloat(1) + smallfloat(1) + smallfloat(1) -> 10

And it makes things very order dependent:
 smallfloat(1) + smallfloat(1) + smallfloat(1) + smallfloat(10) -> 12
10 and 12 are pretty close scores, so while this is clearly not a good
thing, relevant and irrelevant documents are hopefully separated by more
than this. In any case, it would be a whole lot more accurate than
ignoring tfs altogether. And we can do better in this particular case,
using 4 or 5 bit mantissas.
Also, epsilon related to the mantissa, not the exponent?
That would make it 1/8, not 1/32.
I'm not sure what you're saying. The current epsilon, with 3-bit
mantissa, is 1/8, right? With a five bit mantissa it would go to 1/32, no?
Also, if we don't need to represent very small numbers, we could lower
the zero point of the exponent (currently it's 15 for the 5/3 split),
Right. Arguably we don't need numbers smaller than 1/100. A 4-bit
mantissa with a zero exponent point of 5 gives a minimum value of .0005
and a max of 2M, plenty of range. A 5-bit mantissa with zero-exponent
point of 2 gives us a minimum of .03 and a max of around 2k, nearly the
desired range, but with greater precision. In your case above, 10+1+1
would give 12, moreover 10+.5+.5 would give 11. I think this is
probably the best choice. What do you think?

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>