TaskRabbit is Hiring!

We’re a tight-knit team that’s passionate about building a solution that helps people by maximizing their time, talent and skills. We are actively hiring for our Engineering and Design teams. Click To Learn more

Evan Tahler

active_record, MySQL, and emoji

@ 24 Apr 2014

ruby rails database


More and more, people are adopting emoji in their online communications. At TaskRabbit, we noticed that our users are starting to use emoji all over the place, from task descriptions to reviews.

There are some problems when supporting the emoji character set with our stack, which includes Rails 4.0 and MySQL. The main problem is that MySQL’s utf8 encoding does not actually support multi-byte strings, which emoji relies on. In MySQL 5.5, the utf8mb4 encoding was introduced which allows for Multi-Byte (mb) strings… and therefore emoji would work! The mysql2 gem introduced support for utf8mb4 about a year ago, but only recently did active_record (and rails) add support for this in rails 4.1.

Initially, we decided to ignore all emoji characters, literally stripping them out of strings with our demogi gem (Thanks Pablo!). However, with our new product launch in the UK, we thought it was time to actually address the problem. Here is what we learned:

Migrating MySQL from utf8 to utf8mb4

The good news is that the upgrade path from utf8 to utf8mb4 is easy. As we are adding bytes, the migration is really just a definition change at the table-level. Nothing has to change with your existing data. This is a non-blocking and no-downtime migration. If you are using normal rails migrations, all of your column types for VARCHAR columns will be based on the table’s encoding. Changing the table will change the column type. The bad news is that any text-type (or blob-type) columns will need to be explicitly changed.

Check out the migration steps:

  1. change the DB’s encoding entirely, so new tables will be created in utf8mb4
  2. alter all existing tables
  3. explicitly update text-type columns
class Utf8mb4 < ActiveRecord::Migration

  UTF8_PAIRS = {
    'users'     => 'notes',
    'comments'  => 'message'
    # ...
  }

  def self.up
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8mb4;"

    ActiveRecord::Base.connection.tables.each do |table|
      execute "ALTER TABLE `#{table}` CHARACTER SET = utf8mb4;"
    end

    UTF8_PAIRS.each do |table, col|
      execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT  CHARACTER SET utf8mb4  NULL;"
    end

  end

  def self.down
    execute "ALTER DATABASE `#{ActiveRecord::Base.connection.current_database}` CHARACTER SET utf8;"

    ActiveRecord::Base.connection.tables.each do |table|
      execute "ALTER TABLE `#{table}` CHARACTER SET = utf8;"
    end

    UTF8_PAIRS.each do |table, col|
      execute "ALTER TABLE `#{table}` CHANGE `#{col}` `#{col}` TEXT  CHARACTER SET utf8  NULL;"
    end
  end
end

database.yml

The only change here is to change the encoding:

development:
  adapter: mysql2
  encoding: utf8mb4 # <---
  database: my_db_name
  username: root
  password: my_password
  host: 127.0.0.1
  port: 3306

Index Lengths

The last step here is to worry about index lengths, as mentioned above. If you are on rails 4.1, you have nothing to worry about! The rest of us have a few options:

  1. monkeypatch activerecord
  2. change the index length within MySQL
  3. set the length to 191 within all index migrations

We chose #2 due to the simplicity of the solution. Check the links above for a detailed discussion of the problem.

module ActiveRecord
  module ConnectionAdapters
    class AbstractMysqlAdapter
      NATIVE_DATABASE_TYPES[:string] = { :name => "varchar", :limit => 191 }
    end
  end
end

And now you can emoji to your ❤’s content!

Comments

Coments Loading...