java-user@lucene.apache.org
[Top] [All Lists]

Keep URLs intact and not tokenized by the StandardTokenizer

Subject: Keep URLs intact and not tokenized by the StandardTokenizer
From: Sudha Verma
Date: Wed, 18 Nov 2009 22:58:11 -0700
Hi,

I am using lucene 2-9-1.

I am reading in free text documents which I index using lucene and the
StandardAnalyzer at the moment.

The StandardAnalyzer keeps email addresses intact and does not tokenize
them. Is there something similar for
URLs? This seems like a common need. So, I thought I'd check if there
is anything out there that does it already.

I'd appreciate any help.

Thanks,
sudha
<Prev in Thread] Current Thread [Next in Thread>