|
|
OK, that makes sense. So I just need to add all of the sub-compounds that are
real words at posIncr=0, even if they are combinations of other sub-compounds.
Thanks!
-----Original Message-----
From: Robert Muir [mailto:rcmuir@xxxxxxxxx]
Sent: Wednesday, October 21, 2009 11:49 AM
To: java-user@xxxxxxxxxxxxxxxxx
Subject: Re: Using org.apache.lucene.analysis.compound
yes, your dictionary :)
if Ãberwachungsgesetz is a real word, add it to your dictionary.
for example, if your dictionary is { "Rind", "Fleisch", "Draht", "Schere",
"Gesetz", "Aufgabe", "Ãberwachung" }, and you index
RindfleischÃberwachungsgesetz, then all 3 queries will have the same score.
but if you expand the dictionary to { "Rind", "Fleisch", "Draht", "Schere",
"Gesetz", "Aufgabe", "Ãberwachung", "Ãberwachungsgesetz" }, then this makes
a big difference.
all 3 queries will still match, but Ãberwachungsgesetz will have a higher
score. this is because now things are analyzed differently:
RindfleischÃberwachungsgesetz will be decompounded as before, but with an
additional token: Ãberwachungsgesetz.
so back to your original question, these 'concatenations' of multiple
components, yes compounds will do that, if they are real words. but it won't
just make them up.
"Ãberwachungsgesetz"
0.23013961 = (MATCH) sum of:
0.057534903 = (MATCH) weight(field:Ãberwachungsgesetz in 0), product of:
0.5 = queryWeight(field:Ãberwachungsgesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:Ãberwachungsgesetz in 0), product
of:
1.0 = tf(termFreq(field:Ãberwachungsgesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
0.057534903 = (MATCH) weight(field:Ãberwachung in 0), product of:
0.5 = queryWeight(field:Ãberwachung), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:Ãberwachung in 0), product of:
1.0 = tf(termFreq(field:Ãberwachung)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
0.057534903 = (MATCH) weight(field:Ãberwachungsgesetz in 0), product of:
0.5 = queryWeight(field:Ãberwachungsgesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:Ãberwachungsgesetz in 0), product
of:
1.0 = tf(termFreq(field:Ãberwachungsgesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
0.057534903 = (MATCH) weight(field:gesetz in 0), product of:
0.5 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
1.6294457 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
"gesetzÃberwachung"
0.064782135 = (MATCH) sum of:
0.032391068 = (MATCH) weight(field:gesetz in 0), product of:
0.2814906 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
0.9173473 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
0.032391068 = (MATCH) weight(field:Ãberwachung in 0), product of:
0.2814906 = queryWeight(field:Ãberwachung), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
0.9173473 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:Ãberwachung in 0), product of:
1.0 = tf(termFreq(field:Ãberwachung)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
"fleischgesetz"
0.064782135 = (MATCH) sum of:
0.032391068 = (MATCH) weight(field:fleisch in 0), product of:
0.2814906 = queryWeight(field:fleisch), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
0.9173473 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:fleisch in 0), product of:
1.0 = tf(termFreq(field:fleisch)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
0.032391068 = (MATCH) weight(field:gesetz in 0), product of:
0.2814906 = queryWeight(field:gesetz), product of:
0.30685282 = idf(docFreq=1, maxDocs=1)
0.9173473 = queryNorm
0.11506981 = (MATCH) fieldWeight(field:gesetz in 0), product of:
1.0 = tf(termFreq(field:gesetz)=1)
0.30685282 = idf(docFreq=1, maxDocs=1)
0.375 = fieldNorm(field=field, doc=0)
On Wed, Oct 21, 2009 at 1:40 PM, Benjamin Douglas
<bbdouglas@xxxxxxxxxxxxx>wrote:
> Thanks for all of the answers so far!
>
> Paul's question is similar to another aspect I am curious about:
>
> Given the way the sample word is analyzed, is there anything in the scoring
> mechanism that would rank "Ãberwachungsgesetz" higher than
> "gesetzÃberwachung" or "fleischgesetz"?
>
>
--
Robert Muir
rcmuir@xxxxxxxxx
|
|