MariaDB Pivot Table 性能

Question

我有一个 table 包含具有动态类别的数据：

+----------+--------------+---------------+---------+
| category | string_value | integer_value | user_id |
+----------+--------------+---------------+---------+
| cat_1    | NULL         | 1             |       1 |
| cat_1    | NULL         | 3             |       2 |
| cat_2    | foo          | NULL          |       1 |
| cat_2    | bar          | NULL          |       2 |
+----------+--------------+---------------+---------+

我需要这个 table 的旋转版本，我将其与语句一起使用：

select
  user_id,
  max(case when category == 'cat_1' then integer_value end) as 'cat_1',
  max(case when category == 'cat_2' then string_value end) as 'cat_2',
from my_table
group by user_id

这将创建以下格式的结果：

+---------+-------+-------+
| user_id | cat_1 | cat_2 |
+---------+-------+-------+
|       1 |     1 | foo   |
|       2 |     3 | bar   |
+---------+-------+-------+

对于许多类别和 table 个条目，这个查询本身也表现良好（例如，对于 8 个类别和 240k 个条目，它需要大约 20 毫秒），但是如果我将这个确切的查询包装在 select * from <query>，性能显着下降（至 650 毫秒）。

此外，按 user_id 排序不会显着影响性能，而按任何其他字段排序也会导致性能下降，即使相应字段的索引和 user_id 存在。我猜想这种方法本身对于更大的 tables 是不可行的？但是，我很好奇添加 select * from <query 部分时导致额外执行时间的原因是什么。

背景：我尝试使用此查询来存储动态用户数据，并且我想防止在运行时更改 table 结构（即添加列）。欢迎任何替代方案。我正在使用 MariaDB 10.5.5，我需要解决方案才能与 MySQL 5.7 和 SQL Server 2019 一起使用。

执行计划：

无包围select * from:

+----+-------------+-----------+-------+---------------+------------+---------+-----+--------+---------+----------+------------+-------+    
| id | select_type | table     | type  | possible_keys | key        | key_len | ref | rows   | r_rows  | filtered | r_filtered | Extra |
|----|-------------|-----------|-------|---------------|------------|---------|-----|--------|---------|----------|------------|-------|
|  1 | SIMPLE      | user_data | index |               | user_index |         |   9 | 226067 | 1619.00 |    100.0 |      99.88 |       |
+----+-------------+-----------+-------+---------------+------------+---------+-----+--------+---------+----------+------------+-------+

与周围select * from:

+----+-------------+------------+-------+---------------+------------+---------+-----+--------+-----------+----------+------------+-------+ 
| id | select_type | table      | type  | possible_keys | key        | key_len | ref | rows   | r_rows    | filtered | r_filtered | Extra |
|----|-------------|------------|-------|---------------|------------|---------|-----|--------|-----------|----------|------------|-------|
|  1 | PRIMARY     | <derived2> | ALL   |               |            |         |     | 226067 |    200.00 |    100.0 |      100.0 |       |
|  2 | DERIVED     | user_data  | index |               | user_index |       9 |     | 226067 | 242418.00 |    100.0 |      100.0 |       |
+----+-------------+------------+-------+---------------+------------+---------+-----+--------+-----------+----------+------------+-------+

Answer 1

这是我对正在发生的事情的推测。

您在底层 table 上有一个索引，MariaDB 将其用于聚合。这意味着没有完成排序。 . .只需读取索引，它就可以开始返回行。

这是一个非常好的功能。但是当您只是运行查询时，您看到的时间是第行。

当您使用派生的 table 时，MariaDB 必须在返回其中的 any 行之前生成 all 行。因此，带有子查询的 select * 做了更多的工作。

这就是第二个版本比第一个版本慢的原因。我希望 returns 数万行的查询在大多数机器上花费超过 20 毫秒。

MariaDB Pivot Table 性能

MariaDB Pivot Table Performace

sql

pivot-table

query-performance

mariadb

执行计划：