缓慢的 Informix COUNT/GROUP BY 查询，即使有适当的索引

Question

我有一个非常简单的查询，在 Informix 11 中运行速度很慢，即使存在适当的索引并且正在使用它也是如此：

select COUNTRY, COUNT(*) from EVENTS group by COUNTRY

它应该运行变慢的原因是什么？我有使用 SQL 服务器进行类似查询的经验，如果存在适当的索引，它们会立即执行。

更多信息：

EVENTS table 中的 500.000 条记录的查询大约需要 15 秒（这让我很担心，因为这个 table 将有数百万条记录，而且我已经看到执行时间正在迅速增加）。
EVENTS table 有一个按国家/地区分类的索引。通过使用 EXPLAIN 指令，我检查了该索引是否正在使用。
EVENTS table 有很多列（大约 70 个）。
"country" 列是 varchar(32)。
"country"有 25 个不同的值。
table 扫描由 Informix 完成：

1) informix.EVENTS: INDEX PATH

(1) Index Name: informix.country_ix Index Keys: COUNTRY (Serial, fragments: ALL) Query statistics: ----------------- Table map : ---------------------------- Internal name Table name ---------------------------- t1 EVENTS type table rows_prod est_rows rows_scan time est_cost ------------------------------------------------------------------- scan t1 501906 39285 501906 00:14.88 29390 type rows_prod est_rows rows_cons time est_cost ------------------------------------------------------------ group 25 4 501906 00:15.58 79761

Answer 1

我会尝试的事情：

COUNT(1) 而不是 COUNT(*) 以防 DBMS 愚蠢
在没有索引的情况下测试查询并检查执行计划，因为它可能会造成混淆
测试索引可以加速哪些查询并尝试不同的索引类型

Answer 2

所以，我做了一些测试。

TL;DR

将国家/地区列类型更改为 CHAR(32)，重建索引，您应该会有更好的性能。

长版：

在 linuxcentos 7（在 virtualbox 中创建的 VM）上使用了 informix 12.10FC6DE。用于 dbspace 的页面大小为 2048 字节，缓冲池为 50000 页。

创建了一个 table (tst)，行大小约为 425 字节（平均每页 4 行），有几列。在这些列中，一列是 country VARCHAR(32)，另一列是 static_country CHAR(32)。用 499999 行填充 table，country 和 static_country 列均匀分布于 25 个国家/地区名称。

创建了 2 个索引，一个在列 country (idx1_tst) 上，另一个在列 static_country (idx2_tst).

table 分区使用了 125000 个数据页（使用 oncheck -pT）。索引使用了大约 1500 页（使用 oncheck -pT）。

A. 运行多次查询，强制进行 SEQUENCIAL SCAN（运行次在 10 到 15 秒之间）：

SELECT --+ FULL (tst)
    country, COUNT(*)
FROM
    tst
GROUP BY
    country

DIRECTIVES FOLLOWED:
FULL ( tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 1415645
Estimated # of Rows Returned: 25
Temporary Files Required For: Group By

  1) mydb.tst: SEQUENTIAL SCAN


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     00:12.17   140001

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     00:13.01   1275644

B. 运行多次查询，强制对 country 列索引进行 INDEX SCAN，其类型为 VARCHAR(32) ( 运行次在 4m30s 和 5m 之间）：

SELECT --+ INDEX (tst idx1_tst)
    country, COUNT(*)
FROM
    tst
GROUP BY
    country

DIRECTIVES FOLLOWED:
INDEX ( tst idx1_tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 3462411
Estimated # of Rows Returned: 25

  1) mydb.tst: INDEX PATH

    (1) Index Name: mydb.idx1_tst
        Index Keys: country   (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     04:49.71   3462411

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     04:50.51   1275644

C. 运行多次查询，强制对 static_country 列索引进行 INDEX SCAN，其类型为 CHAR(32) ( 运行次，介于 2 到 3 秒之间）：

SELECT --+ INDEX (tst idx2_tst)
    static_country, COUNT(*)
FROM
    tst
GROUP BY
    static_country

DIRECTIVES FOLLOWED:
INDEX ( tst idx2_tst )
DIRECTIVES NOT FOLLOWED:

Estimated Cost: 16428
Estimated # of Rows Returned: 25

  1) mydb.tst: INDEX PATH

    (1) Index Name: mydb.idx2_tst
        Index Keys: static_country   (Key-Only)  (Serial, fragments: ALL)


Query statistics:
-----------------

  Table map :
  ----------------------------
  Internal name     Table name
  ----------------------------
  t1                tst

  type     table  rows_prod  est_rows  rows_scan  time       est_cost
  -------------------------------------------------------------------
  scan     t1     499999     499999    499999     00:02.02   16429

  type     rows_prod  est_rows  rows_cons  time       est_cost
  ------------------------------------------------------------
  group    25         25        499999     00:02.72   1277132

在 sysmaster 数据库上使用 SMI table sysptprof 我可以看到以下计数器（在运行秒之间使用 onstat -z 重置计数器）：

案例A（顺序扫描）：
- table tst分区：
  - lockreqs 499999
  - isreads 125001
  - bufreads 500060
  - pagreads 117532
情况B（对VARCHAR类型列进行INDEX SCAN）：
- table tst分区：
  - lockreqs 499999
  - isreads 499990
  - bufreads 999997
  - pagreads 348585
- 索引idx1_tst分区：
  - lockreqs 499999
  - isreads 500009
  - bufreads 506961
  - pagreads 2545
案例C（对CHAR类型列进行INDEX SCAN）：
- 索引idx2_tst分区：
  - lockreqs 499999
  - isreads 500000
  - bufreads 502879
  - pagreads 1440

因此，对于 SEQUENCIAL SCAN，table 分区上只有 activity，如我所料。

对于 CHAR 列上的 INDEX SCAN，索引分区上只有 activity，如我所料（解释包含 Key-Only 指示）。

对于 VARCHAR 列上的 INDEX SCAN，在 table 和索引分区中都有 activity，这不是我所期望的（但正如费尔南多指出的那样，解释不包含Key-Only 指示）。

我无法从 informix 解释这种行为。但是一位同事向我指出了 informix 性能手册上的这个条目（版本 12.10FC6，第 10 章，查询计划，访问计划）：

Important: The optimizer does not choose a key-only scan for a VARCHAR column. If you want to take advantage of key-only scans, use the ALTER TABLE with the MODIFY clause to change the column to a CHAR data type.

缓慢的 Informix COUNT/GROUP BY 查询，即使有适当的索引

Slow Informix COUNT/GROUP BY query, even with appropriate index

sql

database

optimization

performance

informix