使用 ActiveRecord::Base.connection.execute 时进行批处理
Batching when using ActiveRecord::Base.connection.execute
我正忙着编写一个迁移程序,它将允许我们将我们的 yamler 从 Syck 移动到 Psych,并最终将我们的项目升级到 ruby2。这个迁移将非常耗费资源,所以我要需要使用分块。
我编写了以下方法来确认我计划使用的迁移结果产生了预期的结果并且可以在不停机的情况下完成。为了避免活动记录自动执行序列化,我需要使用 ActiveRecord::Base.connection.execute
我描述转换的方法如下
def show_summary(table, column_name)
a = ActiveRecord::Base.connection.execute <<-SQL
SELECT id, #{column_name} FROM #{table}
SQL
all_rows = a.to_a; ""
problem_rows = all_rows.select do |row|
original_string = Syck.dump(Syck.load(row[1]))
orginal_object = Syck.load(original_string)
new_string = Psych.dump(orginal_object)
new_object = Syck.load(new_string)
Syck.dump(new_object) != original_string rescue true
end
problem_rows.map do |row|
old_string = Syck.dump(Syck.load(row[1]))
new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
roundtrip_string = begin
Syck.dump(Syck.load(new_string))
rescue => e
e.message
end
new_row = {}
new_row[:id] = row[0]
new_row[:original_encoding] = old_string
new_row[:new_encoding] = roundtrip_string
new_row
end
end
如何在使用 ActiveRecord::Base.connection.execute
的同时使用批处理?
为了完整起见,我的更新函数如下
# Migrate the given serialized YAML column from Syck to Psych
# (if any).
def migrate_to_psych(table, column)
table_name = ActiveRecord::Base.connection.quote_table_name(table)
column_name = ActiveRecord::Base.connection.quote_column_name(column)
fetch_data(table_name, column_name).each do |row|
transformed = ::Psych.dump(convert(Syck.load(row[column])))
ActiveRecord::Base.connection.execute <<-SQL
UPDATE #{table_name}
SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
WHERE id = #{row['id']};
SQL
end
end
def fetch_data(table_name, column_name)
ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
SQL
end
我从 http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb
那里得到的
您可以使用 SQL 的 LIMIT
和 OFFSET
子句轻松构建一些东西:
def fetch_data(table_name, column_name)
batch_size, offset = 1000, 0
begin
batch = ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
LIMIT #{batch_size}
OFFSET #{offset}
SQL
batch.each do |row|
yield row
end
offset += batch_size
end until batch.empty?
end
你可以像以前一样使用它,只是没有 .each
:
fetch_data(table_name, column_name) do |row| ... end
HTH!
我正忙着编写一个迁移程序,它将允许我们将我们的 yamler 从 Syck 移动到 Psych,并最终将我们的项目升级到 ruby2。这个迁移将非常耗费资源,所以我要需要使用分块。
我编写了以下方法来确认我计划使用的迁移结果产生了预期的结果并且可以在不停机的情况下完成。为了避免活动记录自动执行序列化,我需要使用 ActiveRecord::Base.connection.execute
我描述转换的方法如下
def show_summary(table, column_name)
a = ActiveRecord::Base.connection.execute <<-SQL
SELECT id, #{column_name} FROM #{table}
SQL
all_rows = a.to_a; ""
problem_rows = all_rows.select do |row|
original_string = Syck.dump(Syck.load(row[1]))
orginal_object = Syck.load(original_string)
new_string = Psych.dump(orginal_object)
new_object = Syck.load(new_string)
Syck.dump(new_object) != original_string rescue true
end
problem_rows.map do |row|
old_string = Syck.dump(Syck.load(row[1]))
new_string = Psych.dump(Syck.load(old_string)) rescue "Parse failure"
roundtrip_string = begin
Syck.dump(Syck.load(new_string))
rescue => e
e.message
end
new_row = {}
new_row[:id] = row[0]
new_row[:original_encoding] = old_string
new_row[:new_encoding] = roundtrip_string
new_row
end
end
如何在使用 ActiveRecord::Base.connection.execute
的同时使用批处理?
为了完整起见,我的更新函数如下
# Migrate the given serialized YAML column from Syck to Psych
# (if any).
def migrate_to_psych(table, column)
table_name = ActiveRecord::Base.connection.quote_table_name(table)
column_name = ActiveRecord::Base.connection.quote_column_name(column)
fetch_data(table_name, column_name).each do |row|
transformed = ::Psych.dump(convert(Syck.load(row[column])))
ActiveRecord::Base.connection.execute <<-SQL
UPDATE #{table_name}
SET #{column_name} = #{ActiveRecord::Base.connection.quote(transformed)}
WHERE id = #{row['id']};
SQL
end
end
def fetch_data(table_name, column_name)
ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
SQL
end
我从 http://fossies.org/linux/openproject/db/migrate/migration_utils/legacy_yamler.rb
那里得到的您可以使用 SQL 的 LIMIT
和 OFFSET
子句轻松构建一些东西:
def fetch_data(table_name, column_name)
batch_size, offset = 1000, 0
begin
batch = ActiveRecord::Base.connection.select_all <<-SQL
SELECT id, #{column_name}
FROM #{table_name}
WHERE #{column_name} LIKE '---%'
LIMIT #{batch_size}
OFFSET #{offset}
SQL
batch.each do |row|
yield row
end
offset += batch_size
end until batch.empty?
end
你可以像以前一样使用它,只是没有 .each
:
fetch_data(table_name, column_name) do |row| ... end
HTH!