使用 ActiveRecord::Base.connection.execute 时进行批处理

Batching when using ActiveRecord::Base.connection.execute

我正忙着编写一个迁移程序,它将允许我们将我们的 yamler 从 Syck 移动到 Psych,并最终将我们的项目升级到 ruby2。这个迁移将非常耗费资源,所以我要需要使用分块。

我编写了以下方法来确认我计划使用的迁移结果产生了预期的结果并且可以在不停机的情况下完成。为了避免活动记录自动执行序列化,我需要使用 ActiveRecord::Base.connection.execute

我描述转换的方法如下

 def show_summary(table, column_name)
  a = ActiveRecord::Base.connection.execute <<-SQL
   SELECT id, #{column_name} FROM #{table}
  SQL
  all_rows = a.to_a; ""
  problem_rows = all_rows.select do |row|
    original_string = Syck.dump(Syck.load(row[1]))
    orginal_object = Syck.load(original_string)

    new_string = Psych.dump(orginal_object)
    new_object = Syck.load(new_string)

    Syck.dump(new_object) != original_string rescue true
  end

problem_rows.map do |row|
  old_string = Syck.dump(Syck.load(row[1]))
  new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
  roundtrip_string = begin
    Syck.dump(Syck.load(new_string))
  rescue => e
    e.message
  end

  new_row = {}
  new_row[:id] = row[0]
  new_row[:original_encoding] = old_string
  new_row[:new_encoding] = roundtrip_string
  new_row
  end
end

如何在使用 ActiveRecord::Base.connection.execute 的同时使用批处理?

为了完整起见,我的更新函数如下

  # Migrate the given serialized YAML column from Syck to Psych
  # (if any).
  def migrate_to_psych(table, column)
    table_name = ActiveRecord::Base.connection.quote_table_name(table)

    column_name = ActiveRecord::Base.connection.quote_column_name(column)

    fetch_data(table_name, column_name).each do |row|
      transformed = ::Psych.dump(convert(Syck.load(row[column])))

      ActiveRecord::Base.connection.execute <<-SQL
         UPDATE #{table_name}
         SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
         WHERE id = #{row['id']};
      SQL
    end
  end

  def fetch_data(table_name, column_name)
    ActiveRecord::Base.connection.select_all <<-SQL
       SELECT id, #{column_name}
       FROM #{table_name}
       WHERE #{column_name} LIKE '---%'
    SQL
  end

我从 http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb

那里得到的

您可以使用 SQL 的 LIMITOFFSET 子句轻松构建一些东西:

def fetch_data(table_name, column_name)
  batch_size, offset = 1000, 0
  begin
    batch = ActiveRecord::Base.connection.select_all <<-SQL
      SELECT id, #{column_name}
      FROM #{table_name}
      WHERE #{column_name} LIKE '---%'
      LIMIT #{batch_size} 
      OFFSET #{offset}
    SQL
    batch.each do |row|
      yield row
    end
    offset += batch_size
  end until batch.empty?
end

你可以像以前一样使用它,只是没有 .each:

fetch_data(table_name, column_name) do |row| ... end

HTH!