Jump to content
Science Forums

Recommended Posts

Posted

Hey, changing formats for a little bit, it's not often that i ask questions, but i could really use your help... you know who you are... yeah you do :lol:

 

So this started as an argument at work where i know i am right, and i know i can do it better, i just haven't had the time, nor the math to figure it out. So i figured i'd ask.

 

Basically problem is such, you have a set of coordinates (lat, long), you have a database that references coordinates to locations, say big cities. Now how do you find where the nearest city is?

 

 

The thoughtless approach of the other side: take a simplified formula to find the distance between 2 coordinates, shove it into a database query, and match it to a static threshold.

Solves: well sometimes it gives you results back

Issues: our light database has 45000 rows... and the query perfoms a calculation on every row, it takes over 0.25 sec to run the query, which is extremely inefficient to say the least. The model fails in loosely populated areas, where say there is nothing around for 100 miles, and your threshold is set to 100 miles, you didn't find anything and failed. And in densely populated area, say north east, within those 100 miles, you may find over a couple of hundred records, which you will have to sort through.

 

So i utterly disagree with the static, inefficient and slow model in exchange for something speedier. Database for example can easily match on range, and that will be many times faster then running the calculation per row. So from that came an algorithm for a function:

function get_nearby(lat, long, offset)
{
 lat_offset = calculate lat coordinate offset, based on the current lat and offset(distance)
 long_offset = calculate long coordinate offset based on the current longitude
 select from the database where latitude between (latitude+lat_offset) and (latitude-lat_offset) and longtitude between (longtitude-long_offset) and (longitude+long_offset)
 if(!results) get_nearby(lat, long, offset*2)
}

this will essentially give me less data, more quickly, so i can post-process it to find the nearest location.

And from some testing, the comparison queries take more then 25 times less time to run, less then 0.01 sec making it a much more attrative option, and it will just continue searching (as long as the first value in the between clause is less then the second value apparently)

 

So, i started looking at math for distance between 2 locations, and without using the Vincenti's formula which is complicated, Haversine formula is much simpler and precise enough for the use, so for example

<?php

// nyc 42.101483,-72.589811
// springfield 40.714269,-74.005973
$lat1=(42.101483*pi())/180;
$long1=(-72.589811*pi())/180;
$lat2=(40.714269*pi())/180;
$long2=(-74.005973*pi())/180;
$dlat = ($lat2-$lat1);
$dlong=($long2-$long1);
$a = pow(sin($dlat/2),2)+cos($lat1)*cos($lat2)*pow((sin($dlong/2)),2);
$x=sqrt(1-$a);
$y=sqrt($a);
$c = 2*atan2($y, $x); //atan2 - approximately 2*atan(y/(sqrt(y^2+x^2)+x))
$d = (6378137*$c)/1609.344;
echo number_format($d,2);

 

which gives us: 120.85 mi

Vincenty formula gives us: 120.74 mi

 

And that's close enough for the purpose.

 

Anyways so i have all that, makes matching the results straight forward. But now i need to figure out how to calculate a coordinate offset based on distance. Say i have a coordinate of nyc, 42.101483,-72.589811, how do i find out what the offset in degrees is for 5 miles north/south and east/west of that location is (maybe somewhere around 0.02 degrees or something around that, i am guessing)

 

Any help would be appreciated :lol:

Posted

I don't understand why you need the offset? The offset in the original code was use to select potential closest sites before calculating which was closest. As you say, this risks failure if the offset is too small, and is slow if the offset is too large.

 

Instead, you seem to be proposing to do the distance calculation first, so what use is the offset? Also, how does this help speed up the process? It seems to me you will have to calculate the distance for every location before you can find the closest anyway?

 

I think you have to:

 

1. Calculate the distance of all cities from a given long/lat position.

2. Index by distance.

3. Return the first record.

 

That should be quicker than:

 

1. Calculate the distance of all cities from a given long/lat position.

2. Sort by distance.

3. Return the first record.

 

The only advantage is indexing instead of sorting the records. Or perhaps I've misunderstood the problem???

Posted

i think you misunderstood the solution. I am not proposing to do a distance calculation first, fist thing you need is the angle offset based on the distance offset, what that allows you to do is to basically create a frame, 2 lat offsets by 2 long offsets with the current position being in the middle, that gives you a range of lat/long values you can match on in the database, negating the need to calculate the distance to every city every time in the database. If nothing is found within the distance, you increase it and try again, until you get back a data set. i can then run a calculation on the distance from the object to every returned location, sort that by the distance and return the closest match... Your method would work, but it would take over a 1/4 second to run for each lookup, it will increase the need for resources used by the database server when you run the query, and as the database grows, all of that, the time and resources needed will increase, and that's not in any way optimal... you can ofcourse counter that with creating indexes that will contain smaller data sets, say index per 5 degrees of longitude, but that creates, a, more data in the database, with indexes and everything, and if you are on the edge of the index, you need to write the query smart enough to use multiple indexes. And imho that's a lot less of an elegant solution then what i am suggesting, and it will still run slower...

Posted

The problem with using just the Lat and Long offsets initially is that they give a warped sense of distance.

 

I am curious, why is .25 seconds too slow? Are you planning on calculating this hundreds or thousands of times per minute?

 

Bill

Posted

it's not the amount, tbd, it's really the principal of the problem. Say our database doubles tomorrow, each lookup will now take twice as long to run. And that's a 1/2 second, so 8 queries queries/sec would kill the database server, and that's just bad design if you ask me.

 

so i have a part of this, which i can definitely work with. I used the direct Vincenty's approach, which given a position direction and distance gives you a set of final coordinates, and that allows me to calculate the offset pretty well, consider this quick code here:

 

<?php

// springfield 40.714269,-74.005973

function destVincenti($lat1, $lon1, $brng, $dist) {
 $a = 6378137; $b = 6356752.3142;  $f = 1/298.257223563;  // WGS-84 ellipsiod

 $alpha1 = ($brng*pi())/180;
 $sinAlpha1 = sin($alpha1);
 $cosAlpha1 = cos($alpha1);
 
 $tanU1 = (1-$f) * tan(($lat1*pi())/180);
 $cosU1 = 1 / sqrt((1 + pow($tanU1,2)));
 $sinU1 = $tanU1*$cosU1;
 $sigma1 = atan2($tanU1, $cosAlpha1);
 $sinAlpha = $cosU1 * $sinAlpha1;
 $cosSqAlpha = 1 - pow($sinAlpha,2);
 $uSq = $cosSqAlpha * (pow($a,2) - pow($b,2)) / pow($b,2);
 $A = 1 + $uSq/16384*(4096+$uSq*(-768+$uSq*(320-175*$uSq)));
 $B = $uSq/1024 * (256+$uSq*(-128+$uSq*(74-47*$uSq)));
 
 $sigma = $dist / ($b*$A);
 $sigmaP = 2*pi();
 while (abs($sigma-$sigmaP) > 0.000000000001) 
 {
   $cos2SigmaM = cos(2*$sigma1 + $sigma);
   $sinSigma = sin($sigma);
   $cosSigma = cos($sigma);
   $deltaSigma = $B*$sinSigma*($cos2SigmaM+$B/4*($cosSigma*(-1+2*pow($cos2SigmaM,2))-$B/6*$cos2SigmaM*(-3+4*pow($sinSigma,2))*(-3+4*pow($cos2SigmaM,2))));
   $sigmaP = $sigma;
   $sigma = $dist / ($b*$A) + $deltaSigma;
 }

//  $tmp = $sinU1*$sinSigma - $cosU1*$cosSigma*$cosAlpha1;
 $lat2 = atan2($sinU1*$cosSigma + $cosU1*$sinSigma*$cosAlpha1, (1-$f)*sqrt(pow($sinAlpha,2) + pow($sinU1*$sinSigma - $cosU1*$cosSigma*$cosAlpha1,2)));
 $lambda = atan2($sinSigma*$sinAlpha1, $cosU1*$cosSigma - $sinU1*$sinSigma*$cosAlpha1);
 $C = $f/16*$cosSqAlpha*(4+$f*(4-3*$cosSqAlpha));
 $L = $lambda - (1-$C) * $f * $sinAlpha * ($sigma + $C*$sinSigma*($cos2SigmaM+$C*$cosSigma*(-1+2*pow($cos2SigmaM,2))));

 return array("lat"=>($lat2*180)/pi(), "long"=>$lon1+($L*180)/pi());
}

$lat = 40.714269;
$long = -74.005973;
$offset = 5; //in miles
echo "orig $lat $longn";

$latlong=destVincenti($lat, $long, "45", sqrt(2*(pow($offset*1609.344,2))));
$lat_offset = abs($latlong["lat"]-$lat);
//$latlong=destVincenti($lat, $long, "90", "5000");
$long_offset = abs(abs($latlong["long"])-abs($long));

echo "latitude offset = $lat_offsetn";
echo "longtitude offset = $long_offsetn";

$lat1=($lat*pi())/180;
$long1=($long*pi())/180;
$lat2=($latlong["lat"]*pi())/180;
$long2=($long*pi())/180;
$dlat = ($lat2-$lat1);
$dlong=($long2-$long1);
$a = pow(sin($dlat/2),2)+cos($lat1)*cos($lat2)*pow((sin($dlong/2)),2);
$x=sqrt(1-$a);
$y=sqrt($a);
$c = 2*atan2($y, $x);
$d = (6378137*$c)/1609.344;
echo "Distance crosscheck with haversine " . number_format($d,2);

now with the offset i can query the database for a range and only have to do math on the limited results... question is can i do even less work and get close to the same offset...? can one figure out the offset using the haversine formula with a spherical earth approach given distance and angle...?

 

sample output

time php lat_long.php

orig 40.714269 -74.005973

latitude offset = 0.072421586930929

longtitude offset = 0.095333675473043

Distance crosscheck with haversine 5.01

real 0m0.068s

user 0m0.042s

sys 0m0.022s

 

oh original offset is 5 miles

Posted
i think you misunderstood the solution. I am not proposing to do a distance calculation first, fist thing you need is the angle offset based on the distance offset, what that allows you to do is to basically create a frame, 2 lat offsets by 2 long offsets with the current position being in the middle, that gives you a range of lat/long values you can match on in the database, negating the need to calculate the distance to every city every time in the database. If nothing is found within the distance, you increase it and try again, until you get back a data set.

Yes, I misunderstood your proposal. I think I've got it now,except I don't see why you need to convert long and lat into distances in miles? Why not just use arbitrary offsets to choose the data set of local towns? After all, long and lat can be regarded as just numbers.

 

Arguably you might even be able to find the closest city without translating into actual distances. I.e. the ratio between the "size" of a degree of longitude and a degree of latitude is a function of the latitude. The latitude of the location is a constant, so that ratio only needs calculating once, then the "size" of the latitude offset of each city can be pro rata'd to make the units of longitude and latitude offset the same. The distances can then be compared using Pythagoras' theorem.

 

Perhaps that was what you were suggesting anyway? Either that or I still do not understand what your problem is...

Posted
I think I've got it now,except I don't see why you need to convert long and lat into distances in miles?

well i dont, actually i convert the distance in miles to an angle offset for latitude and longitude at the current location. I could just use an arbitrary angle offset, but the goal was to also have a generic function, so i can say, tell me blank within a 3 mile radius, or within a 100 mile radius, and use the same exact code with at least some level of precision (not just going and increasing offset till the distances increase to over 100 miles). This also allows me play with different scales of distances to achieve the least iterations with the least amount of post processing too.

 

hmm i like what you are proposing though, i should be able to get the length of a degree at the current latitude and longitude, and scale it down for the arbitrary offset... hmmm *ponders*

 

i gotta run pick up some parts for my bike, but i like what you are proposing, i am still kinda failing to see the pythagoras piece of it, but i definitely like the lat/long offset based on the arcdegree length...

Posted

The only problem I can see with this linear approach is if the distances to the cities is so great that the linear approximations are too inaccurate. E.g. I guestimate this should work for hundreds of miles, but not thousands?

 

If so, I'd suggest that could be compensated for by making the latitude compensation vary linearly by latitude of the city. So both the latitude of the location and of the city are compensated for, albeit only by linear approximations. That converts the square approximation into a regular trapezium, which should be good enough for thousands of miles?

Posted

I'm not a coder and didn't quite follow the issues of the thread, but...

 

A coder once told me that the easiest (efficient and computationally friendly) way to find a word in the dictionary is to split it in half and see if the word is before or after the halved word. Whether it's before or after, split that section and check if it's before or after and repeat until you've found the word or not. If I recall this should find a word in about 10 steps.

 

I would have two lists, one sorted by lat and the other by long. To find the nearest city use the method above to find the 5 cities with the nearest latitude and the 5 cities with the nearest longitude (whether or not these are all within 5 miles or 1,000 miles won't matter). Use Haversine or the great circle vincenty formula to solve all the potentials for distance and find which is smallest.

 

I'm not sure how that fits into the discussion thus far, but if the list is made up of [lat, long, city] then I can't think of any potential downfalls with this approach.

 

~modest

Posted

Ok, there is another problem. Using squares to select the "nearest" cities has a ratio of 1:1.414 in the distances to cities found, depending on the angle. So there could be a nearer city due north, south, east or west excluded from the search area.

 

I'd suggest that the quickest solution could be to increment the offset until one or more cities are found, then extend the offset by a ratio of 1.415 (or 1.5 to be sure) and include all the cities found in the selection of the nearest. That's just a single additional iteration of the initial selection of likely cities.

Posted

I was thinking ratio of 2 or 4 would be just as good with fewer queries to the database, i mean if i am starting at say 2.5 miles, every time i double the offset i quadruple the search area... i think that works well, and jedai, i think you get my idea above as you just described it :lol:

 

And Clay, that was the thought, i could use the haversine (which will honestly be close enough) to find the nearest city, and even if i an off by 1/100 of a mile, it won't be a big deal... I mean i have the direct Vincenty's approach working above, i can just as easily make the inverse work and get a result accurate to around the 1/1000000000000000 of a meter, but i don't think i really need to be that precise... and if i do, i can always just swap the functions used...

Posted
I would have two lists, one sorted by lat and the other by long. To find the nearest city use the method above to find the 5 cities with the nearest latitude and the 5 cities with the nearest longitude (whether or not these are all within 5 miles or 1,000 miles won't matter). Use Haversine or the great circle vincenty formula to solve all the potentials for distance and find which is smallest.

 

Sorry, I'm in a mid-saturday-haze. This won't work at all

 

~modest :lol:

Posted
I would have two lists, one sorted by lat and the other by long. To find the nearest city use the method above to find the 5 cities with the nearest latitude and the 5 cities with the nearest longitude (whether or not these are all within 5 miles or 1,000 miles won't matter). Use Haversine or the great circle vincenty formula to solve all the potentials for distance and find which is smallest.

 

I'm not sure how that fits into the discussion thus far, but if the list is made up of [lat, long, city] then I can't think of any potential downfalls with this approach.

This mechanism might find the nearest city, but not always. Why 5 cities? Why not 3 or 4, or 6 or 7? It seems arbitrary, and hence fallible.

 

Also the binary split is irrelevant. That is used for sorting items. Nowadays you index them, it's much quicker.

 

Then there is the question of whether the linear approximation I suggest is accurate enough, and sufficiently quicker, to make Haversine or Vincenti unnecessary.

 

But the idea of separately choosing the nearest by longitude and latitude then processing only those seems worth considering in principle. I suggest that it needs some work on how many cities you select, and why.

 

On second thoughts, you are right, it won't work.

Posted
But the idea of separately choosing the nearest by longitude and latitude then processing only those seems good in principle. I suggest that it needs some work on how many cities you select, and why.

 

No, I think it was entirely flawed. The nearest cities to any given longitude, for example, would be spread across the globe. The same with lat. You'd end up with 10 cities that essentially make up two great circles.

 

My bad.

 

~modest

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...