TaskRabbit is Hiring!

We’re a tight-knit team that’s passionate about building a solution that helps people by maximizing their time, talent and skills. We are actively hiring for our Engineering and Design teams. Click To Learn more

Pablo Jairala

Fix duplicate consecutive coordinates in Elasticsearch geo json shapes in Ruby

@ 02 Dec 2015

elasticsearch ruby


Recently at TaskRabbit we upgraded our Elasticsearch versions from 1.5 to 1.7. One consequence of the update was the newer version seems to be more strict about how it treats consecutive duplicate points in geo json shapes we store.

One feature we have is we allow Taskers to draw maps of geographical locations they want to work on. This can obviously lead to duplicate consecutive points if the Tasker happens to double click on the map drawing tool on the same spot, for example. It was particularly problematic because some maps that were already stored in Elasticsearch from version 1.5 couldn’t really be updated cause the shape was now invalid cause of these duplicate points.

So we had to write a little script to go through the shapes and remove them. This was very straightforward to do, but I figured someone might run into a similar issue and might find the actual code helpful:

class RabbitProfileGeoJsonFixup

  attr_reader :fixed_coordinates

  def initialize(coordinates)
    @coordinates = coordinates
  end

  def changed?
    @fixed_coordinates.present? && @fixed_coordinates != @coordinates
  end

  def fix
    return if @coordinates.empty?

    @fixed_coordinates = []

    @coordinates.each do |outer_coord|
      fixed_outer_coords = []
      size = outer_coord.size

      outer_coord.each_with_index do |coord, idx|
        # * don't check the first element
        # * don't check the last element
        # * if the previous coord is the same as the current one, we dont add
        unless idx > 0 && idx < (size - 1) && coord == outer_coord[idx - 1]
          fixed_outer_coords << coord
        end
      end

      @fixed_coordinates << fixed_outer_coords
    end

    @fixed_coordinates
  end

end

As you can see, the class receives the coordinates as a param, for example:

coordinates = [[[-87.72480010986328,41.984504674276074],[-87.72823333740233,41.98144219285842],[-87.71879196166992,41.90048830606728],[-87.71707534790039,41.899466149343546],[-87.71707534790039,41.89869952106346],[-87.71638870239256,41.895377358833876],[-87.6020622253418,41.76670255261859],[-87.59965896606444,41.769519280779775],[-87.60000228881836,41.76849503030295],[-87.60000228881836,41.76849503030295],[-87.51056671142578,41.754409927954946],[-87.50988006591795,41.75543440328294],[-87.50850677490234,41.7559466348148],[-87.50782012939453,41.756971085614175],[-87.5064468383789,41.75748330488168],[-87.5064468383789,41.75799552006108],[-87.6492691040039,41.96153247330561],[-87.64789581298828,41.963064211132306],[-87.6485824584961,41.96561702568286],[-87.72480010986328,41.984504674276074]]]

And then goes through the coordinates outer array, then goes through each of the coordinate points, and if the previous coordinate is the same, we just don’t add it to our @fixed_coordinates array. Finally, we expose them via an attr_reader.

fixup = RabbitProfileGeoJsonFixup.new(coordinates)
fixup.fix
# => [[-87.72480010986328, 41.984504674276074], [-87.72823333740233, 41.98144219285842], [-87.71879196166992, 41.90048830606728], [-87.71707534790039, 41.899466149343546], [-87.71707534790039, 41.89869952106346], [-87.71638870239256, 41.895377358833876], [-87.6020622253418, 41.76670255261859], [-87.59965896606444, 41.769519280779775], [-87.60000228881836, 41.76849503030295], [-87.60000228881836, 41.76849503030295], [-87.51056671142578, 41.754409927954946], [-87.50988006591795, 41.75543440328294], [-87.50850677490234, 41.7559466348148], [-87.50782012939453, 41.756971085614175], [-87.5064468383789, 41.75748330488168], [-87.5064468383789, 41.75799552006108], [-87.6492691040039, 41.96153247330561], [-87.64789581298828, 41.963064211132306], [-87.6485824584961, 41.96561702568286], [-87.72480010986328, 41.984504674276074]]

We threw the fixup in a before_save that only runs if the coordinates themselves change, as well as wrote a Rake task that goes through all the existing maps and fixes them up if needed.

Comments

Coments Loading...