|
|
I did the following:
highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
which works.
On Thu, Mar 12, 2009 at 6:41 PM, Amin Mohammed-Coleman <aminmc@xxxxxxxxx>wrote:
> JIRA updated. Includes new testcase which shows highlighter not working as
> expected.
>
>
> On Thu, Mar 12, 2009 at 5:56 PM, Amin Mohammed-Coleman
> <aminmc@xxxxxxxxx>wrote:
>
>> Hi
>>
>> I have found that it is not issue with POI. I extracted text using PoI but
>> differenlty and the term is extracted properly. When I store the text and
>> retrieve it the term exists. However running the text through highlighter
>> doesn't work
>>
>> I will post test case with plain text file on JIRA. Currently on a cramped
>> train!
>>
>> Cheers
>>
>>
>>
>> On 11 Mar 2009, at 18:11, markharw00d <markharw00d@xxxxxxxxxxx> wrote:
>>
>> If you can supply a Junit test that recreates the problem I think we can
>>> start to make progress on this.
>>>
>>>
>>>
>>> Amin Mohammed-Coleman wrote:
>>>
>>>> Hi
>>>>
>>>> Apologies for re sending this mail. Just wondering if anyone has
>>>> experienced the below. I'm not sure if this could happen due nature of
>>>> document. It does seem strange one term search returns summary while
>>>> another
>>>> does not even though same document is being returned.
>>>>
>>>> I'm asking this so I can code around this if is normal.
>>>>
>>>>
>>>> Apologies again for re sending this mail
>>>>
>>>> Cheers
>>>>
>>>> Amin
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On 9 Mar 2009, at 07:50, Amin Mohammed-Coleman <aminmc@xxxxxxxxx>
>>>> wrote:
>>>>
>>>> Hi
>>>>>
>>>>> I am seeing some strange behaviour with the highlighter and I'm
>>>>> wondering if anyone else is experiencing this. In certain instances I
>>>>> don't
>>>>> get a summary being generated. I perform the search and the search
>>>>> returns
>>>>> the correct document. I can see that the lucene document contains the
>>>>> text
>>>>> in the field. However after doing:
>>>>>
>>>>> SimpleHTMLFormatter simpleHTMLFormatter = new
>>>>> SimpleHTMLFormatter("<span class=\"highlight\"><b>", "</b></span>");
>>>>> //required for highlighting
>>>>> Query query2 = multiSearcher.rewrite(query);
>>>>> Highlighter highlighter = new
>>>>> Highlighter(simpleHTMLFormatter, new QueryScorer(query2));
>>>>> ...
>>>>>
>>>>> String text= doc.get(FieldNameEnum.BODY.getDescription());
>>>>> TokenStream tokenStream =
>>>>> analyzer.tokenStream(FieldNameEnum.BODY.getDescription(), new
>>>>> StringReader(text));
>>>>> String result = highlighter.getBestFragments(tokenStream,
>>>>> text, 3, "...");
>>>>>
>>>>>
>>>>> the string result is empty. This is very strange, if i try a different
>>>>> term that exists in the document then I get a summary. For example I
>>>>> have a
>>>>> word document that contains the term "document" and "aspectj". If I
>>>>> search
>>>>> for "document" I get the correct document but no highlighted summary.
>>>>> However if I search using "aspectj" I get the same doucment with
>>>>> highlighted summary.
>>>>>
>>>>> Just to mentioned I do rewrite the original query before performing the
>>>>> highlighting.
>>>>>
>>>>> I'm not sure what i'm missing here. Any help would be appreciated.
>>>>>
>>>>> Cheers
>>>>> Amin
>>>>>
>>>>> On Sat, Mar 7, 2009 at 4:32 PM, Amin Mohammed-Coleman <
>>>>> aminmc@xxxxxxxxx> wrote:
>>>>> Hi
>>>>>
>>>>> Got it working! Thanks again for your help!
>>>>>
>>>>>
>>>>> Amin
>>>>>
>>>>>
>>>>> On Sat, Mar 7, 2009 at 12:25 PM, Amin Mohammed-Coleman <
>>>>> aminmc@xxxxxxxxx> wrote:
>>>>> Thanks! The final piece that I needed to do for the project!
>>>>>
>>>>> Cheers
>>>>>
>>>>> Amin
>>>>>
>>>>> On Sat, Mar 7, 2009 at 12:21 PM, Uwe Schindler <uwe@xxxxxxxxxxx>
>>>>> wrote:
>>>>> > cool. i will use compression and store in index. is there anything
>>>>> > special
>>>>> > i need to for decompressing the text? i presume i can just do
>>>>> > doc.get("content")?
>>>>> > thanks for your advice all!
>>>>>
>>>>> No just use Field.Store.COMPRESS when adding to index and
>>>>> Document.get()
>>>>> when fetching. The decompression is automatically done.
>>>>>
>>>>> You may think, why not enable compression for all fields? The case is,
>>>>> that
>>>>> this is an overhead for very small and short fields. So you should only
>>>>> use
>>>>> it for large contents (it's the same like compressing very small files
>>>>> as
>>>>> ZIP/GZIP: These files mostly get larger than without compression).
>>>>>
>>>>> Uwe
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
>>>>> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>> No virus found in this incoming message.
>>>> Checked by AVG - www.avg.com Version: 8.0.237 / Virus Database:
>>>> 270.11.10/1995 - Release Date: 03/11/09 08:28:00
>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@xxxxxxxxxxxxxxxxx
>>> For additional commands, e-mail: java-user-help@xxxxxxxxxxxxxxxxx
>>>
>>>
>
|
|