Thanks for all the inputs guys.
As Erick said let me elaborate the problem a bit.
We are trying to develop a local search application. The user will be able
to locate businesses, localities and roads. We have data for all the 3 with
us. We do not want to provide separate boxes for the user to enter data i.e
a common one for all entry box (a la google :)) where the user enters an
address (or road name or area name) or all the 3 etc etc. From the user
query we have to find the best possible match in our data. The data has lots
of numbers as well as names with initials and stuff like that. The user may
enter the names with a space between the initals or they might club the
initials together etc etc. From the user query we do not have a way to
figure out what is what apart from the obvious ones as to if something ends
with a road then it is a road name or if there is a layout in the query then
it is an area etc. Right now we have our own custom framework. I am trying
to figure out as to whether Lucene is suited for this kind of application.
Once again thanks for all the inputs.
On Fri, Mar 6, 2009 at 7:15 PM, Erick Erickson <erickerickson@xxxxxxxxx>wrote:
> Whatever you do will be wrong <G>. What you're saying is
> that you have structured data that the user wants to search
> in an unstructured way, and you want to try to create a
> system that intuits what the user meant. Good luck <G>.
> Can you back up a bit and talk about the problem you're
> trying to solve? If, for instance, you're trying to find the
> best match for a particular business, one approach would
> be to create one index where each business had
> where the field bagowords contained a copy of the data
> from the other three fields, then search bagowords
> for your query terms. It sounds simplistic, but it might be
> surprisingly good.
> And if this is out in left field, a higher level statement
> of the problem would help get better answers.
> On Fri, Mar 6, 2009 at 1:25 AM, Srinivas Bharghav
> > I am trying to evaluate as to whether Lucene is the right candidate for
> > problem at hand.
> > Say I have 3 indexes:
> > Index 1 has street names.
> > Index 2 has business names.
> > Index 3 has area names.
> > All these names can be single words or a combination of words like
> > street or marks and spencers street etc etc.
> > Now the use enters a query saying "mc donalds woodward street kingston
> > precinct".
> > I have to parse this query and come up with the best match possible. The
> > problem is, in the query I do not know which part is the business name or
> > area name or street name. Also the user may give the query in any order
> > example he may give it as "kingston precinct mc donalds woodward street".
> > There might be spelling mistkaes in the query enterd by the user. Also he
> > might use road for street or lane for street and such things. I know that
> > Lucene is the right candidate for the synonym and spelling mistakes part
> > but
> > am a bit hazy regarding the user query parsing part as to in which index
> > search what. Any help is greatly appreciated.
> > Thanks,
> > Srini.