在 PostgreSQL 中使用 Explain 为 Daterange 估算行数

Question

有人知道 PostgreSQL 如何使用 Explain 估计日期范围数据类型的行数吗？例如考虑我们有这个查询：

Select * From 'Table1' Where 'period' && '[2015-01-01,2015-12-30)';

在这种情况下（当我们有日期范围数据类型时）对于 table Table1 中的 period 字段，[=15= 中的 Histogram bound 字段] 为 null ，但在 pg_statistic 中将保存两个值（stakind1=7 和 stakind2=6），其中没有用于此编码的文档，而且两个 stavalues 将保存在 table 中，看起来其中之一是字段 period 的直方图，但还有另一个？这是一个例子：

Stavalue1:"{""[2015-01-02,2015-01-03)"",""[2015-01-29,2015-02-01)"",""[2015-02-09,2015-02-13)""}"
Satvalue2: "{1,1,1}"

这里我有三个问题：

什么是 Satvalue2？或者什么是 stakind2=6？
我们如何解释上限和下限是周期的直方图？
'Explain' 如何估算我上面提到的查询的行数？

提前致谢

Answer 1

参见 src/include/catalog/pg_statistic.h 中的定义：

/*
 * A "length histogram" slot describes the distribution of range lengths in
 * rows of a range-type column. stanumbers contains a single entry, the
 * fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
 * a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
 * values that divide the column data values into M-1 bins of approximately
 * equal population. The lengths are stored as float8s, as measured by the
 * range type's subdiff function. Only non-null rows are considered.
 */
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM  6

/*
 * A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
 * a range-type column.  stavalues contains M (>=2) range values that divide
 * the column data values into M-1 bins of approximately equal population.
 * Unlike a regular scalar histogram, this is actually two histograms combined
 * into a single array, with the lower bounds of each value forming a
 * histogram of lower bounds, and the upper bounds a histogram of upper
 * bounds.  Only non-NULL, non-empty ranges are included.
 */
#define STATISTIC_KIND_BOUNDS_HISTOGRAM  7

这回答了第一个和第二个问题。

&&的选择性在calc_hist_selectivity中计算src/backend/utils/adt/rangetypes_selfuncs.c。边界直方图分为两个直方图：hist_upper 表示较高值，hist_lower 表示较低值。此代码解释了会发生什么：

        case OID_RANGE_OVERLAP_OP:
        case OID_RANGE_CONTAINS_ELEM_OP:

            /*
             * A && B <=> NOT (A << B OR A >> B).
             *
             * Since A << B and A >> B are mutually exclusive events we can
             * sum their probabilities to find probability of (A << B OR A >>
             * B).
             *
             * "range @> elem" is equivalent to "range && [elem,elem]". The
             * caller already constructed the singular range from the element
             * constant, so just treat it the same as &&.
             */
            hist_selec =
                calc_hist_selectivity_scalar(typcache, &const_lower, hist_upper,
                                             nhist, false);
            hist_selec +=
                (1.0 - calc_hist_selectivity_scalar(typcache, &const_upper, hist_lower,
                                                    nhist, true));
            hist_selec = 1.0 - hist_selec;
            break;

在 PostgreSQL 中使用 Explain 为 Daterange 估算行数

Estimating Number of rows using Explain in PostgreSQL for Daterange

postgresql

estimation

date-range

histogram