TaskRabbit is Hiring!

We’re a tight-knit team that’s passionate about building a solution that helps people by maximizing their time, talent and skills. We are actively hiring for our Engineering and Design teams. Click To Learn more

Pablo Jairala

GeoJson intersection query

@ 20 Aug 2014

elasticsearch ruby


Update (2015-01-20): as commenter SS points out, there’s sadly no support for relevancy/scoring on overlap area for GeoJson shapes in elasticsearch yet. There’s an issue opened detailing the progress on this front. Please disregard the second part of the article.

We have this feature for Taskers where they can draw a Geo JSON polygon of the area of the world they want to work in. Part of the feature was that after they draw the area, I then have to correlate that to one of the Metropolitan areas we serve. At first, this latter part of the feature was built using a centroid and then looking up the latitude and longitude of the centroid and storing the first metro that contained that lat/lng.

Sadly, this solution had some issues. For example, if a Tasker were to draw an area where a large portion of the shape was located over the ocean (be it for legitimate reasons or because they had trouble using the tool), the centroid could very well land in the ocean, and we would be unable to relate that to a metro.

The solution was relatively simple in that all I’d have to build was a query in Elastic Search (which is where we keep all our Geo JSON mappings for our metros) that would grab the best metro we could for the drawn shape.

There are some considerations obviously, like what happens if the shape overlaps two metros, or if it contains neither. Consensus was that the metro that covered most area would win. In the latter scenario, we would just not relate to any metro and be done with it.

I started with a version of the query that looked something like this:

def query_hash
    {
      _source: {include: %w()},
      query: {
        filtered: {
          filter: {
            and: [
              {
                geo_shape: {
                  geometry: {
                    shape: {
                      type:        :polygon,
                      coordinates: @coordinates
                    }
                  }
                }
              },
              {term: {level: :metro}}
            ]
          }
        }
      },
      page: 1,
      page_size: 1
    }
end

Where @coordinates is an instance variable holding the array of points in the drawn Geo JSON. This handled the issue of finding whatever metros were touched by the drawn shape and returning the first one. I decided to not include anything from the source as we only needed the internal _id field which maps directly to our DB of metros directly. Grabbing extra fields, in particular the Geo JSON shapes, slowed down the query immensely.

So the query got the job done, however, those familiar with Elastic Search will see an issue right away: filtered queries are basically constantly scored, so it doesn’t matter if a metro’s shape was barely intersected by the drawn shape or entirely covered, it would return all results that were touched with a score of 1. Not great.

The second and final pass so far looks like:

def query_hash
    {
      _source: {include: %w()},
      query: {
        bool: {
          must: [
            {
              geo_shape: {
                geometry: {
                  shape: {
                    type:        :polygon,
                    coordinates: @coordinates
                  }
                }
              }
            },
            {term: {level: :metro}}
          ]
        }
      },
      page: 1,
      page_size: 1
    }
end

This handles both restrictions (grabbing only metro level) and the polygon part of the query. Since it’s a bool query, the scoring is done by amount of overlap which is exactly what we needed.

Note: crossposted from my personal blog

Comments

Coments Loading...