java-user@lucene.apache.org
[Top] [All Lists]

Re: Lucene and Chinese language

Subject: Re: Lucene and Chinese language
From: kafka0102
Date: Fri, 02 Jul 2010 15:09:15 +0800
you can choose ik : http://code.google.com/p/ik-analyzer/ for this problem. It generates a better token result than standard and CJK. And, you can use its IKQueryParser instead of queryparser. It generates "and queries" instead of " phrase queries".



On 2010å07æ01æ 20:47, Robert Muir wrote:
its really just a bad situation for chinese :(

with queryparser, you either get no phrase query support (by using the
hack), or all queries automatically become phrase queries.

if you want to do the equivalent of *åçåè*, then you need to use queryparser
without the hack, just do a query of "åçåè" (with quotes).
if you want to do the equivalent of å ç å è, then you need to use
queryparser with the hack, just do a query of åçåè (without quotes).


2010/7/1 Kolhoff, Jacqueline - ENCOWAY<Kolhoff@xxxxxxxxxx>

Ok, understand!

So it is better to use another analyzer in the chinese case at index-time
or do you suggest to use another "QueryParser" at query-time?

-----UrsprÃngliche Nachricht-----
Von: Robert Muir [mailto:rcmuir@xxxxxxxxx]
Gesendet: Donnerstag, 1. Juli 2010 14:35
An: java-user@xxxxxxxxxxxxxxxxx
Betreff: Re: Lucene and Chinese language

2010/7/1 Kolhoff, Jacqueline - ENCOWAY<Kolhoff@xxxxxxxxxx>

As you can see, the query parser automatically added double quotes and
blanks. But this does not work for our English or German queries.

If I use the PositionHackAnalyzerWrapper and the case with * I got no
results, query is:
+anotherfieldname:description +myfieldname:*åçåè*

If I remove the * the query is:
+ anotherfieldname: description
+(myfieldname:åmyfieldname:çmyfieldname:åmyfieldname:è)

and I got results but not for German or English queries.

Weird?

its working correctly, your chinese wildcard query doesnt make sense, as
you
havent indexed the text in a way to do queries like that (you have indexed
individual chars).
in practice this is where you would do a chinese phrase query of "åçåè"
(with quotes) instead of *... but if you use the positionfilterhack, you
cant do phrase queries.

--
Robert Muir
rcmuir@xxxxxxxxx






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>