java-dev@lucene.apache.org
[Top] [All Lists]

[jira] Commented: (LUCENE-469) (Parallel-)MultiSearcher: using Sort obje

Subject: [jira] Commented: (LUCENE-469) (Parallel-)MultiSearcher: using Sort object changes the scores
From: "Yonik Seeley (JIRA)"
Date: Mon, 21 Nov 2005 20:17:43 +0100 CET
    [ 
http://issues.apache.org/jira/browse/LUCENE-469?page=comments#action_12358183 ] 

Yonik Seeley commented on LUCENE-469:
-------------------------------------

Thanks Luc!
a couple of things:

- changing expert level search functions to not normalize.  +1 from me, but I'd 
like to hear from some others on a change like this.

- TopDocs and TopFieldDocs are public... this patch changes the constructors.  
Although this is a great way to test if we got all the cases within Lucene, if 
anyone created their own instances outside lucene, it would break backward 
compatibility.  This is beyond expert level though... so perhaps it shouldn't 
worry us.

- I'm not sure if the MultiSearcher implementation is correct for other Sorts.
  FieldDocSortedHitQueue.getMaxScore() is only the max score of docs 
inserted... I think you need to reference  docs.maxScore() rather than relying 
on the  FieldDocSortedHitQueue in this case, right?


> (Parallel-)MultiSearcher: using Sort object changes the scores
> --------------------------------------------------------------
>
>          Key: LUCENE-469
>          URL: http://issues.apache.org/jira/browse/LUCENE-469
>      Project: Lucene - Java
>         Type: Bug
>   Components: Search
>     Versions: CVS Nightly - Specify date in submission
>  Environment: 21 november 2005, revision 345901
>     Reporter: Luc Vanlerberghe
>  Attachments: MultiSearcherSort.patch, TestMultiSearcher.patch
>
> Example: 
> Hits hits=multiSearcher.search(query);
> returns different scores for some documents than
> Hits hits=multiSearcher.search(query, Sort.RELEVANCE);
> (both for MultiSearcher and ParallelMultiSearcher)
> The documents returned will be the same and in the same order, but the scores 
> in the second case will seem out of order.
> Inspecting the Explanation objects shows that the scores themselves are ok, 
> but there's a bug in the normalization of the scores.
> The document with the highest score should have score 1.0, so all document 
> scores are divided by the highest score.  (Assuming the highest score was>1.0)
> However, for MultiSearcher and ParallelMultiSearcher, this normalization 
> factor is applied *per index*, before merging the results together (the merge 
> itself is ok though).
> An example: if you use
> Hits hits=multiSearcher.search(query, Sort.RELEVANCE);
> for a MultiSearcher with two subsearchers, the first document will have score 
> 1.0.
> The next documents from the same subsearcher will have decreasing scores.
> The first document from the other subsearcher will however have score 1.0 
> again !
> The same applies for other Sort objects, but it is less visible.
> I will post a TestCase demonstrating the problem and suggested patches to 
> solve it in a moment...

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-dev-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>