|
|
| Thanks. I added my comment on the issue. I think we should revert and then someone can put up a patch to make this pluggable. As it stands, this Best Fit calculation has nothing to do with the CartesianTierPlotter anyway, so we could refactor it pretty easily.
-Grant On Apr 14, 2010, at 12:42 PM, Helleringer, Nicolas wrote: Tables are well on JIRA : https://issues.apache.org/jira/browse/LUCENE-2359
Nicolas
2010/4/14 Helleringer, Nicolas <nicolas.helleringer@xxxxxxxxxxxxx>
Here are the summary tables :
First a table to remind metrics on the Tiers :
Tile Level
TierLegnth
TierBoxes
TileLength (miles)
0
1
1
24902
1
2
4
12451
2
4
16
6225,5
3
8
64
3112,75
4
16
256
1556,375
5
32
1024
778,1875
6
64
4096
389,09375
7
128
16384
194,546875
8
256
65536
97,2734375
9
512
262144
48,63671875
10
1024
1048576
24,31835938
11
2048
4194304
12,15917969
12
4096
16777216
6,079589844
13
8192
67108864
3,039794922
14
16384
268435456
1,519897461
15
32768
1073741824
0,75994873 Then the comparaison table between legacy and new bestFit :
Radius (miles)
legacy bestFit
legacy bestFit TileLength
legacy bestFit max number of Box to fetch
new bestFit
new bestFit TileLength
new bestFit number of Box to fetch
1
18
0,75994873
9
14
1,519897461
4
5
16
0,75994873
64
12
6,079589844
4
10
15
0,75994873
225
11
12,15917969
4
25
13
3,039794922
100
9
24,31835938
9
50
12
6,079589844
100
8
97,2734375
4
100
11
12,15917969
100
7
194,546875
4
250
10
24,31835938
144
6
389,09375
4
500
9
48,63671875
144
5
778,1875
4
1000
8
97,2734375
144
4
1556,375
4
2500
7
194,546875
196
3
3112,75
4
5000
6
389,09375
196
2
6225,5
4
10000
5
778,1875
196
1
12451
4
I hope mailers will keep the formating ...
If not I shall post on JIRA.
Formulas :
TileLength is 24902 (earth circumference) / TierLength bestFit formulas as summarized by Grant in his email. number of box to fetch : pow(ceil(TileLength/Radius)+1,2) => TileLength/Radius is for how many tiles are needed to cover the radius, +1 is because you are not always well aligned, the pow(X,2) because there is two directions/axis
Best regards,
Nicolas
2010/4/14 Chris Male <gento0nz@xxxxxxxxx>
Hi,
On Wed, Apr 14, 2010 at 6:07 PM, Grant Ingersoll <gsingers@xxxxxxxxxx> wrote:
On Apr 14, 2010, at 11:06 AM, Chris Male wrote:
> Hi,
>
> My understanding of the benefits of the new algorithm is that it means a lower tier level resulting in fewer boxes, but more documents inside those boxes that are outside of the search radius.
>
> While having fewer boxes means fewer term queries to make against the index, more documents means more costly calculations to filter out those extraneous documents.
>
> For those doing just Cartesian Tier filtering it seems like the new approach is a win, but for those doing distance calculations on those documents passing the filter, it seems to come at a cost.
Currently, this is only used for filtering. AIUI, Tiers aren't really that useful for distance calculations, are they? After all, all you have is a box id and you'd have to reverse out the calc of that to be able to calc a distance, no? Perhaps I'm missing something.
How Spatial Lucene currently works (or at least one of the ways it was designed to work), is using a 2 step filtering process. Step 1 is the Cartesian Tier filtering. The resulting set of Documents is then passed on through to Step 2 which then calculates the distance from each Document to the search centre. If the distance is greater than the radius, the Document is filtered out. This means that after both filtering steps you have only those Documents that are in the search radius.
How this impacts this algorithm choice is that the more Documents the pass through Step 1, the more calculations that have to be done in Step 2.
I'm not sure, however, that it is a win for filtering. It seems like you end up including docs in the result set that should be in there.
I'll wait for Nicolas' summary table, but I'm inclined to revert and then someone can refactor if they want to offer alternate implementations.
-- Chris Male | Software Developer | JTeam BV.| www.jteam.nl
-------------------------- Grant Ingersoll
|
|
|