Today at work, we had a couple of big migrations we had to run in two of our biggest and most important tables. Both migrations included adding new columns to the tables, which would incur in table locks, which meant incurring in around an hour of downtime (including displaying our maintainance page) while running these migrations after 10 PM.
However, a while back, we had found pt-online-schema-change, a nifty little tool that’s part of the Percona Toolkit that allows you to run these types of migrations without incurring in downtime. Basically, exactly what we needed.
As a cherry-on-top-of-your-pie surprise, it turns out that it’s completely compatible with MySQL, which was perfect for us since our staging boxes run vanilla MySQL (as oposed to our production servers, which use Percona). Another super cool thing about it is that it works out of the box with Percona Cluster, which we’re currently using in production.
Long story short, we figured this would be a great time to test out the tool. So I set out to run both migrations on one of our staging boxes. The command looked something like:
pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute
After chugging along quite nicely for a while, it was done. The new field was added to the
users table without locking it whatsoever and we were all happy enough to try it on production at night.
As these things usually go, from staging to production, things didn’t work exactly the same. Basically, the box was setup a bit different and when trying to run the same command, it would error out with a somewhat cryptic error message:
pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute Can't use an undefined value as an ARRAY reference at /usr/bin/pt-online-schema-change line 7085.
I tried looking for the error online, and most, if not all, of the mentions were related to a changelog entry about Percona Toolkit a while back. After making sure we were on the latest version (we were), we continued debugging.
Debugging output to the rescue
What ended up providing much needed clarity to the issue, was running the command the same as before but with the
PTDEBUG flag turned on to provide verbose debugging output to STDERR, as recommended in the docs.
PTDEBUG=1 pt-online-schema-change -uguy -ppwd --alter "ADD COLUMN new_field INT" D=staging_db1,t=users --execute # ... # VersionCheck:7008 15008 Version check file percona-version-check in /tmp # VersionCheck:7082 15008 Version check failed: Cannot open /tmp/percona-version-check: Permission denied at /usr/bin/pt-online-schema-change line 7120. #
So there we have it. Turns out the problem all along was a simple permissions issue with our maintenance user. After fixing this, we ran the migrations without problem.