Determining "place" Location by Averaging User Data

By on 12/28/2010

Location based services like foursquare, gowalla, and the recently minted 'facebook places' let users check in from wherever they are at to inform their friends of their whereabouts. By this point, most places have already been defined in those services, so all you usually have to do is pick from a list of nearby locations. And if it's not on the list, you can just define a new place.

Despite the social weight of these already existing services ... there is still a lush breeding grounds for innovation in location services. If you were building such a service, let's talk about the process of defining a "new place" as a user. What happens if the first user of your app to create a place is actually just around the corner, or if the place is large enough, on the east side rather than the west side. In a naively designed system, your place would be incorrectly defined from that point forward and would offer incorrect position data for the life of your service.

This post proposes a simple system that lets the place's location in the world adjust itself over time. The more checkins you receive, the more accurate the location.

The concept is quite simple actually, by computing the centroid from all known checkins, you find the true 'center' of you place. Doing this calculation could not be any simpler, centroid is just a fancy term for the average vector from a set (ie. Lat/Lon)



I made a useful extension method that you can apply to a collection of vectors which can be invoked as such.
Vector2 centroid = locations.Centroid();

public static class VectorExtensions { public static Vector2 Centroid(this List<Vector2> points) { int pointCount = points.Count; Vector2 total = Vector2.Zero; Vector2 centroid = Vector2.Zero;

if (pointCount > 0) { for (int i = 0; i < pointCount; i++) { var point = points[i]; total.X += point.X; total.Y += point.Y; } centroid = new Vector2( total.X / (float)pointCount, total.Y / (float)pointCount); } return centroid; } }
So given a set of points, this would easily give you the logical center of them. Even if there's a random checkin from outside the area, the average will not let the centroid vary by much.



But there is still an issue with this approach. How can this techique be applied to a real world system, you can't expect me to keep a copy of every checkin ever done for a given location ... I mean you can, but for most purposes it's overkill.

Luckily, there is a way to reduce the problem domain to limit the amount of data needed to recalculate the centroid every time there is a new point to consider. By doing a weighted average, your data structure is greatly simplified because all you have to keep in the database for any given location is:
  • latitude
  • longitude
  • numberOfCheckins
With that, you can find your list of closest places to let the user see if their location is already in the db.

If the user selects an existing place to check in to, you can calc the centroid as follows:
public static Vector2 WeightedCentroid(
                                                this Vector2 newPoint,
                                                Vector2 oldCentroid,
                                                int numOfCheckins)
{
    return new Vector2(
        (oldCentroid.X * (float)numOfCheckins + newPoint.X) / ((float)numOfCheckins + 1),
        (oldCentroid.Y * (float)numOfCheckins + newPoint.Y) / ((float)numOfCheckins + 1)
        );
};
With that, you can save the value of the new average, along with the latest number of checkins. And that's it!

See more in the archives