使用 Where 子句时如何 Optimize/Refactor MySQL 调整 Table 性能
How to Optimize/Refactor MySQL Pivot Table Performance when using Where Clause
我有两个简单的 MySQL tables - 一个索引 table t_id
,它有一个唯一的主 ID;和一个枢轴 table t_data
将这些 id 分布在各种数据字段中:
CREATE TABLE `t_id` (
`id` bigint(12) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `t_data` (
`id` int(11) NOT NULL,
`field` varchar(50) CHARACTER SET cp1251 NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci
DEFAULT NULL,
UNIQUE KEY `idxfield` (`id`,`field`),
KEY `value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
这是一些示例数据:
+----+--------------+-------------------+
| id | field | value |
+----+--------------+-------------------+
| 1 | organization | Apple Inc. |
| 1 | state | CA |
| 2 | organization | Adobe Inc. |
| 2 | state | CA |
| 3 | organization | Alphabet Inc. |
| 3 | state | CA |
| 4 | organization | Rockwell Collins |
| 4 | state | IA |
| 5 | organization | GEICO |
| 5 | state | MD |
| 6 | organization | Anheuser-Busch |
| 6 | state | MO |
| 7 | organization | Bank of America |
| 7 | state | NC |
+----+--------------+-------------------+
可以使用标准数据透视表进行报告 table select 查询:
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
limit 0,10
这个简单的示例仅显示了两个 "virtual" 字段(组织和州),具有 7 个唯一 ID:
+----+------------------+-------+
| id | organization | state |
+----+------------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
| 4 | Rockwell Collins | IA |
| 5 | GEICO | MD |
| 6 | Anheuser-Busch | MO |
| 7 | Bank of America | NC |
+----+------------------+-------+
在我们的实际生产情况下,我们有几十个 "virtual" 字段(不仅仅是 2 个)和数百万个唯一 ID(不仅仅是 7 个)。该数据库在对单个 ID(不到一秒)执行 crud 类型查询时表现非常好,甚至一次列出一个限制组(同样不到一秒)。当试图用 where 子句约束 select 时,问题就出现了(查询需要几十秒)。例如,要查找加利福尼亚州的所有组织:
select
x.id,
x.organization,
x.state
from
(
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
) as x
where x.state='CA'
limit 0,10
+----+---------------+-------+
| id | organization | state |
+----+---------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
+----+---------------+-------+
这可行,但需要很长时间(同样是 10 秒)!这里的最佳实践是什么——是否有更好的方法来编写这些类型的查询?如何针对 where 子句优化这些枢轴 table 查询?
如果您想查找在加利福尼亚州运营的组织,您实际上不需要子查询:
SELECT
i.id,
COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
ON i.id = d.id
GROUP BY
i.id
HAVING
COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;
这里的技巧是在 HAVING
子句中断言匹配的 id
组需要在加利福尼亚有 state
的记录。
对于大型数据集,这应该会快得多。此外,它可以轻松扩展到任意数量的 "virtual" 字段。您可以将任何搜索条件放在 %%.
之间。
select
i.id,
coalesce(max(case when field = 'organization' then value end), '') as organization,
coalesce(max(case when field = 'state' then value end), '') as state
from t_id i
left join t_data d
on i.id = d.id
and i.id like '%%'
and i.id in (
select id
from `t_data`
where `field` = 'organization'
and `value` like '%%'
and id in (
select id
from `t_data`
where `field` = 'state'
and `value` like '%%'
)
)
group by i.id
这是 EAV,不是 Pivot。因此,解决方案在于 "self join".
SELECT a.id,
a.value AS organization,
b.value AS state
FROM t_data AS a
JOIN t_data AS b ON a.id = b.id
WHERE a.field = 'organization'
AND b.field = 'state';
如果您需要 t_id
来控制哪些 ID,请输入
JOIN t_id AS i ON i.id = a.id
如果您想限制在 CA,请添加
AND b.value = 'CA'
并添加
INDEX(field, value)
因此无需扫描那么多行即可找到 CA 条目。
我有两个简单的 MySQL tables - 一个索引 table t_id
,它有一个唯一的主 ID;和一个枢轴 table t_data
将这些 id 分布在各种数据字段中:
CREATE TABLE `t_id` (
`id` bigint(12) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
CREATE TABLE `t_data` (
`id` int(11) NOT NULL,
`field` varchar(50) CHARACTER SET cp1251 NOT NULL,
`value` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci
DEFAULT NULL,
UNIQUE KEY `idxfield` (`id`,`field`),
KEY `value` (`value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
这是一些示例数据:
+----+--------------+-------------------+
| id | field | value |
+----+--------------+-------------------+
| 1 | organization | Apple Inc. |
| 1 | state | CA |
| 2 | organization | Adobe Inc. |
| 2 | state | CA |
| 3 | organization | Alphabet Inc. |
| 3 | state | CA |
| 4 | organization | Rockwell Collins |
| 4 | state | IA |
| 5 | organization | GEICO |
| 5 | state | MD |
| 6 | organization | Anheuser-Busch |
| 6 | state | MO |
| 7 | organization | Bank of America |
| 7 | state | NC |
+----+--------------+-------------------+
可以使用标准数据透视表进行报告 table select 查询:
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
limit 0,10
这个简单的示例仅显示了两个 "virtual" 字段(组织和州),具有 7 个唯一 ID:
+----+------------------+-------+
| id | organization | state |
+----+------------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
| 4 | Rockwell Collins | IA |
| 5 | GEICO | MD |
| 6 | Anheuser-Busch | MO |
| 7 | Bank of America | NC |
+----+------------------+-------+
在我们的实际生产情况下,我们有几十个 "virtual" 字段(不仅仅是 2 个)和数百万个唯一 ID(不仅仅是 7 个)。该数据库在对单个 ID(不到一秒)执行 crud 类型查询时表现非常好,甚至一次列出一个限制组(同样不到一秒)。当试图用 where 子句约束 select 时,问题就出现了(查询需要几十秒)。例如,要查找加利福尼亚州的所有组织:
select
x.id,
x.organization,
x.state
from
(
select
i.id,
ifnull (max(case when d.field = 'organization' then d.value end),'') 'organization',
ifnull (max(case when d.field = 'state' then d.value end),'') 'state'
from `t_id` i
left join `t_data` d
on i.id = d.id
group by i.id
) as x
where x.state='CA'
limit 0,10
+----+---------------+-------+
| id | organization | state |
+----+---------------+-------+
| 1 | Apple Inc. | CA |
| 2 | Adobe Inc. | CA |
| 3 | Alphabet Inc. | CA |
+----+---------------+-------+
这可行,但需要很长时间(同样是 10 秒)!这里的最佳实践是什么——是否有更好的方法来编写这些类型的查询?如何针对 where 子句优化这些枢轴 table 查询?
如果您想查找在加利福尼亚州运营的组织,您实际上不需要子查询:
SELECT
i.id,
COALESCE(MAX(CASE WHEN field = 'organization' THEN value END), '') AS organization,
COALESCE(MAX(CASE WHEN field = 'state' THEN value END), '') AS state
FROM t_id i
LEFT JOIN t_data d
ON i.id = d.id
GROUP BY
i.id
HAVING
COUNT(CASE WHEN field = 'state' AND value = 'CA' THEN 1 END) > 0;
这里的技巧是在 HAVING
子句中断言匹配的 id
组需要在加利福尼亚有 state
的记录。
对于大型数据集,这应该会快得多。此外,它可以轻松扩展到任意数量的 "virtual" 字段。您可以将任何搜索条件放在 %%.
之间。select
i.id,
coalesce(max(case when field = 'organization' then value end), '') as organization,
coalesce(max(case when field = 'state' then value end), '') as state
from t_id i
left join t_data d
on i.id = d.id
and i.id like '%%'
and i.id in (
select id
from `t_data`
where `field` = 'organization'
and `value` like '%%'
and id in (
select id
from `t_data`
where `field` = 'state'
and `value` like '%%'
)
)
group by i.id
这是 EAV,不是 Pivot。因此,解决方案在于 "self join".
SELECT a.id,
a.value AS organization,
b.value AS state
FROM t_data AS a
JOIN t_data AS b ON a.id = b.id
WHERE a.field = 'organization'
AND b.field = 'state';
如果您需要 t_id
来控制哪些 ID,请输入
JOIN t_id AS i ON i.id = a.id
如果您想限制在 CA,请添加
AND b.value = 'CA'
并添加
INDEX(field, value)
因此无需扫描那么多行即可找到 CA 条目。