[Top] [All Lists]

## Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/

 Subject: Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/ Doug Cutting Thu, 17 Nov 2005 14:03:14 -0800
 Yonik Seeley wrote: I'm not sure I understand why this is. epsilon is based on 1, (smallest number such that 1-epsilon != 1, right?). What's special about 1? 1 is special for multiplication, but, you're right, not so special for addition, the operation in question. The thing that makes addition accurate is more mantissa bits. Epsilon is proportional to the number of mantissa bits. So smaller epsilons will give us more accuracy, but, you're right, a particular epsilon value won't guarantee us accuracy. I'm worried about the impact of things like this: smallfloat(10) + smallfloat(1) + smallfloat(1) + smallfloat(1) -> 10 And it makes things very order dependent: smallfloat(1) + smallfloat(1) + smallfloat(1) + smallfloat(10) -> 12 10 and 12 are pretty close scores, so while this is clearly not a good thing, relevant and irrelevant documents are hopefully separated by more than this. In any case, it would be a whole lot more accurate than ignoring tfs altogether. And we can do better in this particular case, using 4 or 5 bit mantissas. Also, epsilon related to the mantissa, not the exponent? That would make it 1/8, not 1/32. I'm not sure what you're saying. The current epsilon, with 3-bit mantissa, is 1/8, right? With a five bit mantissa it would go to 1/32, no? Also, if we don't need to represent very small numbers, we could lower the zero point of the exponent (currently it's 15 for the 5/3 split), right? Right. Arguably we don't need numbers smaller than 1/100. A 4-bit mantissa with a zero exponent point of 5 gives a minimum value of .0005 and a max of 2M, plenty of range. A 5-bit mantissa with zero-exponent point of 2 gives us a minimum of .03 and a max of around 2k, nearly the desired range, but with greater precision. In your case above, 10+1+1 would give 12, moreover 10+.5+.5 would give 11. I think this is probably the best choice. What do you think? Doug --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]