批量 MYSQL 插入以提高数据库结构迁移的性能
Batch MYSQL inserts for performance in DB structure migration
我需要重组我的 MYSQL InnoDB 数据库。
目前我有一个 customer
table 持有 3 个产品名称。
我需要将这些名称提取到新的 product
table。 product
table 应该包含当前保存在 customer
table 中的每个名称,并通过新的 [=19] 链接到 customer
table =] table。虽然产品名称可能不是唯一的,但它们彼此之间没有任何关系,这意味着每个 customer
都需要在 product
table 中插入 3 个新条目以及 customer_product
table.
中的 3 个新条目
所以不是这个:
customer
| id | product_name_a | product_name_b | product_name_c |
我需要这个:
customer
| id |
customer_product
| customer_id | product_id | X3
product
| id | name | X3
我编写了以下 MYSQL 有效的程序:
BEGIN
DECLARE nbr_of_customers BIGINT(20);
DECLARE customer_count BIGINT(20);
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
SELECT COUNT(*) FROM customer INTO nbr_of_customers;
SET customer_count = 0;
SET product_id = 1;
WHILE customer_count < nbr_of_customers DO
SELECT
customer.id,
customer.product_name_a,
customer.product_name_b,
customer.product_name_c
INTO
customer_id,
product_name_a,
product_name_b,
product_name_c
FROM customer
LIMIT customer_count,1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_a);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_b);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_c);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
SET customer_count = customer_count + 1;
END WHILE;
END;
这太慢了。
我在本地 运行 估计我的 ~15k 客户需要 ~1 小时才能完成。我的 VPS 服务器比那慢得多,因此可能需要 10 小时才能完成。
问题似乎是插入需要很长时间。因此,我想在过程中存储所有插入,并在循环完成后批量执行它们,我知道要插入什么。
我有办法批量执行所有 ~100k 插入以优化性能,还是有更好的方法?
最终编辑:
我标记了正确的解决方案,因为它在极大地加快进程方面做得非常出色,这是问题的主要焦点。最后,我最终使用修改后的生产代码(在 Java 中)执行迁移,因为该解决方案在不转义插入的字符串方面存在局限性。
也许您可以在三个单独的插入(而不是 ~100K)中执行此操作,如下所示:
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_a = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_b = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_c = product.name
当然,您必须提前设置 product
table,并且您希望从 customer
table事后。
如果您在 customer.product_name_X
列(也可能是 product.name
列上创建索引,虽然它很少,但如果它很重要的话,可能会进一步加快速度。 EXPLAIN
可以提供帮助。
首先,使用游标来处理单个查询的结果,而不是对每一行执行单独的查询。
然后将 VALUES
列表连接成您使用 PREPARE
和 EXECUTE
执行的字符串。
我的代码以 100 个客户为一组进行插入,因为我预计查询的大小会有限制。
BEGIN
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
DECLARE done INT DEFAULT FALSE;
DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET product_id = 1;
OPEN cur;
SET @product_values = '';
SET @cp_values = '';
read_loop: LOOP
FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
IF done THEN
LEAVE read_loop;
END IF;
SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");
SET product_id = product_id + 3;
IF product_id % 300 = 1 -- insert every 100 customers
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END LOOP;
IF @product_values != '' -- Process any remaining rows
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END;
请注意,使用此解决方案,产品名称在插入前将无法正确转义。 因此,如果任何产品名称包含特殊字符,例如单引号 '
。
,则此解决方案将不起作用
我需要重组我的 MYSQL InnoDB 数据库。
目前我有一个 customer
table 持有 3 个产品名称。
我需要将这些名称提取到新的 product
table。 product
table 应该包含当前保存在 customer
table 中的每个名称,并通过新的 [=19] 链接到 customer
table =] table。虽然产品名称可能不是唯一的,但它们彼此之间没有任何关系,这意味着每个 customer
都需要在 product
table 中插入 3 个新条目以及 customer_product
table.
所以不是这个:
customer
| id | product_name_a | product_name_b | product_name_c |
我需要这个:
customer
| id |
customer_product
| customer_id | product_id | X3
product
| id | name | X3
我编写了以下 MYSQL 有效的程序:
BEGIN
DECLARE nbr_of_customers BIGINT(20);
DECLARE customer_count BIGINT(20);
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
SELECT COUNT(*) FROM customer INTO nbr_of_customers;
SET customer_count = 0;
SET product_id = 1;
WHILE customer_count < nbr_of_customers DO
SELECT
customer.id,
customer.product_name_a,
customer.product_name_b,
customer.product_name_c
INTO
customer_id,
product_name_a,
product_name_b,
product_name_c
FROM customer
LIMIT customer_count,1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_a);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_b);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
INSERT INTO product(id, name)
VALUES(product_id, product_name_c);
INSERT INTO customer_product(customer_id, product_id)
VALUES(customer_id, product_id);
SET product_id = product_id + 1;
SET customer_count = customer_count + 1;
END WHILE;
END;
这太慢了。
我在本地 运行 估计我的 ~15k 客户需要 ~1 小时才能完成。我的 VPS 服务器比那慢得多,因此可能需要 10 小时才能完成。
问题似乎是插入需要很长时间。因此,我想在过程中存储所有插入,并在循环完成后批量执行它们,我知道要插入什么。
我有办法批量执行所有 ~100k 插入以优化性能,还是有更好的方法?
最终编辑:
我标记了正确的解决方案,因为它在极大地加快进程方面做得非常出色,这是问题的主要焦点。最后,我最终使用修改后的生产代码(在 Java 中)执行迁移,因为该解决方案在不转义插入的字符串方面存在局限性。
也许您可以在三个单独的插入(而不是 ~100K)中执行此操作,如下所示:
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_a = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_b = product.name
INSERT INTO customer_product (customer_id, product_id)
SELECT customer.id as customer_id, product.id as product_id
FROM customer
JOIN product on customer.product_name_c = product.name
当然,您必须提前设置 product
table,并且您希望从 customer
table事后。
如果您在 customer.product_name_X
列(也可能是 product.name
列上创建索引,虽然它很少,但如果它很重要的话,可能会进一步加快速度。 EXPLAIN
可以提供帮助。
首先,使用游标来处理单个查询的结果,而不是对每一行执行单独的查询。
然后将 VALUES
列表连接成您使用 PREPARE
和 EXECUTE
执行的字符串。
我的代码以 100 个客户为一组进行插入,因为我预计查询的大小会有限制。
BEGIN
DECLARE product_id BIGINT(20);
DECLARE customer_id BIGINT(20);
DECLARE product_name_a VARCHAR(500);
DECLARE product_name_b VARCHAR(500);
DECLARE product_name_c VARCHAR(500);
DECLARE done INT DEFAULT FALSE;
DECLARE cur CURSOR FOR SELECT c.id, c.product_name_a, c.product_name_b, c.product_name_c FROM customer AS c;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
SET product_id = 1;
OPEN cur;
SET @product_values = '';
SET @cp_values = '';
read_loop: LOOP
FETCH cur INTO customer_id, product_name_a, product_name_b, product_name_c;
IF done THEN
LEAVE read_loop;
END IF;
SET @product_values = CONCAT(@product_values, IF(@product_values != '', ',', ''), "(", product_id, ",'", product_name_a, "'), (", product_id + 1, ",'", product_name_b, "'), (", product_id + 2, ",'", product_name_c, "'), ");
SET @cp_values = CONCAT(@cp_values, IF(@cp_values != '', ',', ''), "(", customer_id, ",", product_id, "), (", customer_id, ",", product_id + 1, "), (", customer_id, ",", product_id + 2, "),");
SET product_id = product_id + 3;
IF product_id % 300 = 1 -- insert every 100 customers
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END LOOP;
IF @product_values != '' -- Process any remaining rows
THEN BEGIN
SET @insert_product = CONCAT("INSERT INTO product(id, name) VALUES ", @product_values);
PREPARE stmt1 FROM @insert_product;
EXECUTE stmt1;
SET @insert_cp = CONCAT("INSERT INTO customer_product(customer_id, product_id) VALUES ", @cp_values);
PREPARE stmt2 FROM @insert_cp;
EXECUTE stmt2;
SET @product_values = '';
SET @cp_values = '';
END IF;
END;
请注意,使用此解决方案,产品名称在插入前将无法正确转义。 因此,如果任何产品名称包含特殊字符,例如单引号 '
。