|
|
Hello Steven,
I looked up the paper and read the relevant part. The text quote you provided
is from the introcution. I belief that quote referes to the basic purpose of an
information retrieval system in general. At least to the purpose of a vector
space model IR system.
If this is the theoretical justfication of the coord_q_d normalisation than it
is actually replicating the the other part of the scoring formula to some
degree. The entire forumla is actually concerned with this - comparing the term
frequencies of query and document.
Is there any other paper that actually shows the benefit of doing this
particular normalisation with coord_q_d? I am not suggesting here that it is
not useful, I am just looking for evidence how the idea developed.
Karl
-------- Original-Nachricht --------
Datum: Tue, 12 Dec 2006 10:01:05 -0500
Von: Steven Rowe <sarowe@xxxxxxx>
An: java-user@xxxxxxxxxxxxxxxxx
Betreff: Re: Lucene scoring: coord_q_d factor
> Karl Koch wrote:
> > The coord(q,d) normalisation is "a score factor based on how many of
> > the query terms are found in the specified document." and described
> > here:
> >
> >
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html#formula_coord
> >
> > Does this have a theoretical base? On what basis was the decition
> > make to have it? Does anybody know a paper (in Information Retrieval,
> > Information Seeking, etc.) or other more general information about
> > this?
>
> Following is quoted from: Krovetz, R. & Croft, W. B. (1992) Lexical
> Ambiguity and Information Retrieval. ACM Transactions on Information
> Systems, 10(2): 115-141.
>
> Many retrieval systems represent documents and queries
> by the words they contain, and base the comparison on
> the number of words they have in common. The more
> words the query and document have in common, the
> higher the document is ranked; this is referred to as
> a "coordination match." Performance is improved by
> weighting query and document words using frequency
> information from the collection and individual
> document texts [27].
>
> 27. Salton, G. & McGill, M. Introduction to Modern Information
> Retrieval. McGraw-Hill, New York, 1983.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
--
Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen!
Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
|
|