如何有效地从大型 MySQL table 中删除过期行
How to efficiently delete expired rows from a large MySQL table
我有一个非常大的 table,我想从中删除旧行。 table 示例:
| customer_id | first_purchase_date | last_purchase_date |
|<primary key>| | <index> |
** 为了论证,我使用这个例子 table。有问题的 table 不是客户 table。真正的 table 在过去 2 个月里增长到 28 GB 的大小,用于计算只需要 2 周历史数据的东西。
我想要做的是从这个 table 中删除过去一年没有购买任何东西的客户。 IE。 delete from table where last_purchase_date < now() - interval 1 year;
像这样简单地删除对数据库来说成本太高。我知道分区可用于截断旧行,但我不确定如何有效地实现它。
此外,如果客户要购买某物,则该行可能会通过更新 last_purchase_date 移动到不同的分区。这不也很贵吗?
提前感谢您的指导!
您认为 partitioning 是前进的方向是对的,因为:
Data that loses its usefulness can often be easily removed from a
partitioned table by dropping the partition (or partitions) containing
only that data. Conversely, the process of adding new data can in some
cases be greatly facilitated by adding one or more new partitions for
storing specifically that data.
如果这对您不起作用,您仍然可以
In addition, MySQL 5.7 supports explicit partition selection for
queries. For example, SELECT * FROM t PARTITION (p0,p1) WHERE c < 5
selects only those rows in partitions p0 and p1 that match the WHERE
condition. In this case, MySQL does not check any other partitions of
table t; this can greatly speed up queries when you already know which
partition or partitions you wish to examine. Partition selection is
also supported for the data modification statements DELETE, INSERT,
REPLACE, UPDATE, and LOAD DATA, LOAD XML.
既然你想根据日期而不是主键来删除东西,你需要的是一个RANGE分区方案。
首先找到最早的日期并根据它创建分区
ALTER TABLE sales
PARTITION BY RANGE( TO_DAYS(last_purchase_date)) (
PARTITION p0 VALUES LESS THAN (TO_DAYS('2018-12-31')),
PARTITION p1 VALUES LESS THAN (TO_DAYS('2017-12-31')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2016-12-31')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2015-12-31')),
..
PARTITION p10 VALUES LESS THAN MAXVALUE));
选择适当数量的分区,但不要太担心,因为您以后随时可以更改分区。分区时,您甚至可能发现根本不需要删除步骤。
我有一个非常大的 table,我想从中删除旧行。 table 示例:
| customer_id | first_purchase_date | last_purchase_date |
|<primary key>| | <index> |
** 为了论证,我使用这个例子 table。有问题的 table 不是客户 table。真正的 table 在过去 2 个月里增长到 28 GB 的大小,用于计算只需要 2 周历史数据的东西。
我想要做的是从这个 table 中删除过去一年没有购买任何东西的客户。 IE。 delete from table where last_purchase_date < now() - interval 1 year;
像这样简单地删除对数据库来说成本太高。我知道分区可用于截断旧行,但我不确定如何有效地实现它。
此外,如果客户要购买某物,则该行可能会通过更新 last_purchase_date 移动到不同的分区。这不也很贵吗?
提前感谢您的指导!
您认为 partitioning 是前进的方向是对的,因为:
Data that loses its usefulness can often be easily removed from a partitioned table by dropping the partition (or partitions) containing only that data. Conversely, the process of adding new data can in some cases be greatly facilitated by adding one or more new partitions for storing specifically that data.
如果这对您不起作用,您仍然可以
In addition, MySQL 5.7 supports explicit partition selection for queries. For example, SELECT * FROM t PARTITION (p0,p1) WHERE c < 5 selects only those rows in partitions p0 and p1 that match the WHERE condition. In this case, MySQL does not check any other partitions of table t; this can greatly speed up queries when you already know which partition or partitions you wish to examine. Partition selection is also supported for the data modification statements DELETE, INSERT, REPLACE, UPDATE, and LOAD DATA, LOAD XML.
既然你想根据日期而不是主键来删除东西,你需要的是一个RANGE分区方案。
首先找到最早的日期并根据它创建分区
ALTER TABLE sales
PARTITION BY RANGE( TO_DAYS(last_purchase_date)) (
PARTITION p0 VALUES LESS THAN (TO_DAYS('2018-12-31')),
PARTITION p1 VALUES LESS THAN (TO_DAYS('2017-12-31')),
PARTITION p2 VALUES LESS THAN (TO_DAYS('2016-12-31')),
PARTITION p3 VALUES LESS THAN (TO_DAYS('2015-12-31')),
..
PARTITION p10 VALUES LESS THAN MAXVALUE));
选择适当数量的分区,但不要太担心,因为您以后随时可以更改分区。分区时,您甚至可能发现根本不需要删除步骤。