java-user@lucene.apache.org
[Top] [All Lists]

Re: Lucene and Chinese language

Subject: Re: Lucene and Chinese language
From: Danil ÅORIN
Date: Thu, 1 Jul 2010 12:30:21 +0300
Try to use CJK analyzer for both indexing and searching chinese language.
Then you won't need "text"->"*text*" transformation.

There might be some false positives in the results though.
You can also may want to try smartcn analyzer which is dictionary based, but
I have no expertise to evaluate the results (we still use CJK for asian
languages, as there are no complains so far)


2010/7/1 Kolhoff, Jacqueline - ENCOWAY <Kolhoff@xxxxxxxxxx>

>
> Hi!
>
> We are using lucene in our project to search through information objects
> which works fine. For indexing we use the StandardAnalyzer.
> Now, we have to support the Chinese language. I found out that the Chinese
> words and letters are correctly saved in the index but the query to search
> for them does not work. Example: in English language the query is âtextâ
> which we parse to â*text*â. If we search for Chinese words / phrases like
> âäåäæäåâthe query is â*äåäæäå*â but there are no search results. If the
> query places blanks between the single letters / symbols like this â*ä å ä æ
> ä å*â we are getting results. Does the StandardAnalyzer interpret each
> Chinese letter as one word? What are best practices for this case? Shall we
> use another analyzer (Chinese analyzer)? Or is it better to replace the
> query parser in this case?
>
> Regards,
> Jacqueline.
>
<Prev in Thread] Current Thread [Next in Thread>