java-user@lucene.apache.org
[Top] [All Lists]

SnowballAnalyzer question

Subject: SnowballAnalyzer question
From: Chris Bamford
Date: Fri, 8 Aug 2008 13:07:25 +0100
Hi.

I am using the SnowballAnalyzer because of it's multi-language stemming capabilities - and am very happy with that. There is one small glitch which I'm hoping to overcome - can I get it to split up internet domain names in the same way that StopAnalyzer does? i.e. for the sentence "This is a URL: www.google.de / this is a company name: XY&Z Corporation", here is the default output from the two analysers:

StopAnalyzer:
   [url] [www] [google] [de] [company] [name] [xy] [z] [corporation]

SnowballAnalyzer:
[this] [is] [a] [url] [www.google.d] [this] [is] [a] [compani] [name] [xy&z] [corpor]

Ideally I would like "www.google.de" to be split into [www] [google] [de] (rather than [www.google.d]), but retain the rest of the SnowballAnalyzer's capabilities.
Can I perhaps extend  SnowballAnalyzer to allow me to achieve this?

Thanks for any tips / pointers,

- Chris


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>