On Thu, Oct 8, 2009 at 7:00 PM, Angel, Eric <eangel@xxxxxxxxxxxx> wrote:
> Does anyone have any recommendations? I've looked at Katta, but it doesn't
> seem to support realtime searching. It also uses hdfs, which I've heard can
> be slow. I'm looking to serve 40gb of indexes and support about 1 million
> updates per day.
As I mentioned in my response to Jason, we at LinkedIn serve our roughly
50million document profile index on a real-time distributed setup (we're
serving facets in real-time also), serving tens of millions of queries a day
in the 1-10ms latency per node, based on the open source zoie project (built
here at LinkedIn) : http://zoie.googlecode.com
Zoie doesn't handle the distributed part of the setup, it's just the
real-time side. Distribution is done pretty straitgtforwardly in our case
though: N shards each getting a different contiguous slice of the user base,
each replicated K times, and all N*K nodes get indexing events distributed
by a message queue independently.
If you have any questions about zoie, let me know. The documentation
could get filled in a little further, and it doesn't touch on distributed
side of things, so feel free to ping me.