java-user@lucene.apache.org
[Top] [All Lists]

Lucene and Chinese language

Subject: Lucene and Chinese language
From: "Kolhoff, Jacqueline - ENCOWAY"
Date: Thu, 1 Jul 2010 11:19:29 +0200
Hi!

We are using lucene in our project to search through information objects which 
works fine. For indexing we use the StandardAnalyzer.
Now, we have to support the Chinese language. I found out that the Chinese 
words and letters are correctly saved in the index but the query to search for 
them does not work. Example: in English language the query is “text” which we 
parse to “*text*”. If we search for Chinese words / phrases like “佛山东方书城”the 
query is “*佛山东方书城*“ but there are no search results. If the query places blanks 
between the single letters / symbols like this “*佛 山 东 方 书 城*“ we are getting 
results. Does the StandardAnalyzer interpret each Chinese letter as one word? 
What are best practices for this case? Shall we use another analyzer (Chinese 
analyzer)? Or is it better to replace the query parser in this case?

Regards,
Jacqueline.
<Prev in Thread] Current Thread [Next in Thread>