获取有关 postgresql 中外部排序的一些详细信息

Get some details about external sort in postgresql

阅读本文 article 我发现了以下示例:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM foo ORDER BY c1;

查询计划是这样的:

Sort  (cost=172682.84..175182.84 rows=1000000 width=37) (actual time=584.215..681.531 rows=1000000 loops=1)
  Sort Key: c1
  Sort Method: external sort  Disk: 45928kB
  Buffers: shared hit=3197 read=5137, temp read=5741 written=5741
  ->  Seq Scan on foo  (cost=0.00..18334.00 rows=1000000 width=37) (actual time=0.036..91.914 rows=1000000 loops=1)
        Buffers: shared hit=3197 read=5137
Total runtime: 711.195 ms

As known外部排序是一个算法家族。这是否意味着 PostgreSQLexternal merge sort?如果是这样,我怎样才能得到一些细节,例如批次数量及其大小。有可能吗?

您可以通过设置参数 trace_sorts=on (http://www.postgresql.org/docs/9.4/static/runtime-config-developer.html)

在日志文件中获取此信息

此外,您可能需要查看 src/backend/utils/sort/tuplesort.c,至少要查看此评论:

This module handles sorting of heap tuples, index tuples, or single Datums (and could easily support other kinds of sortable objects, if necessary). It works efficiently for both small and large amounts of data. Small amounts are sorted in-memory using qsort(). Large amounts are sorted using temporary files and a standard external sort algorithm.

See Knuth, volume 3, for more than you want to know about the external sorting algorithm. We divide the input into sorted runs using replacement selection, in the form of a priority tree implemented as a heap (essentially his Algorithm 5.2.3H), then merge the runs using polyphase merge, Knuth's Algorithm 5.4.2D. The logical "tapes" used by Algorithm D are implemented by logtape.c, which avoids space wastage by recycling disk space as soon as each block is read from its "tape".

...