Performant Large-Scale Postgres Migrations

November 30, 2016

[rails] [postgres] [scaling]

Recently I tried to add a column to one of our very big tables, and as you might expect from the title of this blog, it took so long that I aborted it after an hour. Can you spot the problem?

class AddInfoColumnToCaseDocketReport < ActiveRecord::Migration
  def change
    add_column :case_docket_reports, :info, :jsonb, default: '{}'
  end
end

Not all migrations are created equal. Adding a column to Postgres without a default value is very fast because only the catalog tables get written. With a default value, however, it can get very, very, s-l-o-w, because it has to make a copy of every row and reindex.

Add columns without default values

Adding a default value to an existing column is also remarkably fast, as it also does not change any existing rows. So, if we break this migration into 2 steps, it will be very, very, fast:

class AddInfoColumnToCaseDocketReport < ActiveRecord::Migration
  def change
    add_column :case_docket_reports, :info, :jsonb
    change_column :case_docket_reports, :info, :jsonb, default: '{}'
  end
end

Add default values to existing columns

Of course, now your table has columns filled with NULL values, which may or may not be a bad thing in your application. Let’s assume for the moment that it is a bad thing. How do we update all of those rows? The naive approach in Rails is to iterate over each object, assigning the new value:

CaseDocketReport.where(info: nil).find_each(batch_size: 100) do |docket_report|
  docket_report.update(info: {})
end

This is also very slow because you have to drag each row from Postgres, over the “wire” and into Rails, create the associated objects and then write each change out in an isolated transaction. There is an alternative that is very fast: Rails exposes the SQL UPDATE via the #update_all method. This is very, very fast:

loop do
  results = CaseDocketReport.where(info: nil).limit(10_000).update_all(info: {})
  break if results == 0
end

This little snippet, in one SQL command, will update 10,000 rows at a time!

UPDATE info = '{}' FROM "case_docket_reports" WHERE info IS NULL;

Do as much work inside Postgres, not in Rails

As I’ve written before, plan and deploy your migrations so that they can run independently of your production code. That way, you can deploy just your migrations, and run them in production without affecting your application, or incurring downtime.

Can Rails do this? As of Rails 4, yes! Another problem area are indexes, because creating one would normally block writes. By adding the index CONCURRENTLY, we can avoid this. Because Rails runs migrations inside a transaction and concurrent indexes must be added outside one, we add disable_ddl_transaction! to the migration.

class AddIndexToAsksActive < ActiveRecord::Migration
  disable_ddl_transaction!

  def change
    add_index :asks, :active, algorithm: :concurrently
  end
end

Add indexes CONCURRENTLY

Another trouble spot is changing the type of a column. Again, as I mentioned in Zero Downtime Migrations, the solution is: “Don’t!” Instead, add a new column with a new name. When you’ve completed your upgrades, and, at your leisure, you can DROP the column. Here’s an interesting side-tip, you can “hide” the old column from your code so you don’t accidentally use it:

class User < ActiveRecord::Base
  def self.columns # TODO: Remove after completing migration
    super.reject { |c| c.name == 'ecf_username' }
  end

  #...
end

Don’t change columns, add new ones, then drop the old ones

You can also run into trouble in the most delightfully insidious way by running an innocuous-looking query that accidentally starts locking big chunks of your db. A good example that has bitten everyone at least once is running a back-office analytics query that does an enormous n-way join. Solution: have a read-only replica and query to that!

Run big queries against a read-only replica

My last recommendation is this: Hire a really excellent DBA or outsource the whole DB platform. There are endless numbers of knobs and settings and configuration values that you might or might not get right. As your system grows to the point of being “interesting,” you’re going to need that expertise. Or, you can take advantage of PaaS companies who make it their core competency. You get a “slice” of an expert DBA. In addition, they probably know how to scale your database up (and down) better than you ever will, since they do it all the time, and you only need to do it once in awhile. Of course, if you’re lucky, you’ll have those problems that are good to have, and you can move things in-house.

Performant Large-Scale Postgres Migrations - November 30, 2016 - Ken Mayer