[email protected]
[Top] [All Lists]

Re: bytecount as String and prefix length

Subject: Re: bytecount as String and prefix length
From: Marvin Humphrey
Date: Mon, 31 Oct 2005 13:28:14 -0800
I wrote...

Unfortunately, once the changes to TermBuffer, TermInfosWriter, and StringHelper are applied, execution speed at index-time suffers a slowdown of about 20%. Perhaps this can be blamed on all the calls to getBytes("UTF-8") in TermInfosWriter? Maybe alternative implementations using ByteBuffer, CharsetDecoder, and CharsetEncoder are possible that can mitigate the problem?

The version of writeTerm below is about the same speed as the one with the calls to getBytes("UTF-8").
I think I'll take a at a custom charsToUTF8 converter algo.

Marvin Humphrey
Rectangular Research

//---------------------------------------------------------------------- ---
  private final void writeTerm(Term term)
       throws IOException {
    while (true) {
CoderResult status = utf8Encoder.encode(CharBuffer.wrap (term.text()),
        byteBuf, false);
      if (status.isOverflow()) {
        bufSize += 32;
        byteBuf = ByteBuffer.allocate(bufSize);
      else {
    int totalLength = byteBuf.position();
    int start = StringHelper.bytesDifference(lastByteBuf, byteBuf);
    int length = totalLength - start;

output.writeVInt(start); // write shared prefix length
    output.writeVInt(length);                  // write delta length

    byte[] bytes = byteBuf.array();
    for (int i = start ; i < totalLength; i++) {
output.writeByte(bytes[i]); // write delta UTF-8 bytes
output.writeVInt(fieldInfos.fieldNumber(term.field)); // write field num
    lastTerm = term;
    // swap byteBuf and lastByteBuf
    scratchByteBuf = lastByteBuf;
    lastByteBuf = byteBuf;
    byteBuf = scratchByteBuf;

To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

<Prev in Thread] Current Thread [Next in Thread>