[email protected]
[Top] [All Lists]

Re: Realtime & distributed

Subject: Re: Realtime & distributed
From: Jake Mannix
Date: Thu, 8 Oct 2009 20:24:43 -0700
On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric <[email protected]> wrote:

> Does anyone have any recommendations?  I've looked at Katta, but it doesn't
> seem to support realtime searching.  It also uses hdfs, which I've heard can
> be slow.  I'm looking to serve 40gb of indexes and support about 1 million
> updates per day.
Hi Eric,

  As I mentioned in my response to Jason, we at LinkedIn serve our roughly
50million document profile index on a real-time distributed setup (we're
serving facets in real-time also), serving tens of millions of queries a day
in the 1-10ms latency per node, based on the open source zoie project (built
here at LinkedIn) : http://zoie.googlecode.com

  Zoie doesn't handle the distributed part of the setup, it's just the
real-time side.  Distribution is done pretty straitgtforwardly in our case
though: N shards each getting a different contiguous slice of the user base,
each replicated K times, and all N*K nodes get indexing events distributed
by a message queue independently.

  If you have any questions about zoie, let me know.  The documentation
could get filled in a little further, and it doesn't touch on distributed
side of things, so feel free to ping me.

<Prev in Thread] Current Thread [Next in Thread>