在 PostgreSQL 中使用 Explain 为 Daterange 估算行数
Estimating Number of rows using Explain in PostgreSQL for Daterange
有人知道 PostgreSQL 如何使用 Explain 估计日期范围数据类型的行数吗?例如考虑我们有这个查询:
Select * From 'Table1' Where 'period' && '[2015-01-01,2015-12-30)';
在这种情况下(当我们有日期范围数据类型时)对于 table Table1
中的 period
字段,[=15= 中的 Histogram bound
字段] 为 null ,但在 pg_statistic
中将保存两个值(stakind1=7 和 stakind2=6),其中没有用于此编码的文档,而且两个 stavalues 将保存在 table 中,看起来其中之一是字段 period
的直方图,但还有另一个?
这是一个例子:
Stavalue1:"{""[2015-01-02,2015-01-03)"",""[2015-01-29,2015-02-01)"",""[2015-02-09,2015-02-13)""}"
Satvalue2: "{1,1,1}"
这里我有三个问题:
- 什么是 Satvalue2?或者什么是 stakind2=6?
- 我们如何解释上限和下限是周期的直方图?
- 'Explain' 如何估算我上面提到的查询的行数?
提前致谢
参见 src/include/catalog/pg_statistic.h
中的定义:
/*
* A "length histogram" slot describes the distribution of range lengths in
* rows of a range-type column. stanumbers contains a single entry, the
* fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
* a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
* values that divide the column data values into M-1 bins of approximately
* equal population. The lengths are stored as float8s, as measured by the
* range type's subdiff function. Only non-null rows are considered.
*/
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM 6
/*
* A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
* a range-type column. stavalues contains M (>=2) range values that divide
* the column data values into M-1 bins of approximately equal population.
* Unlike a regular scalar histogram, this is actually two histograms combined
* into a single array, with the lower bounds of each value forming a
* histogram of lower bounds, and the upper bounds a histogram of upper
* bounds. Only non-NULL, non-empty ranges are included.
*/
#define STATISTIC_KIND_BOUNDS_HISTOGRAM 7
这回答了第一个和第二个问题。
&&
的选择性在calc_hist_selectivity
中计算src/backend/utils/adt/rangetypes_selfuncs.c
。边界直方图分为两个直方图:hist_upper
表示较高值,hist_lower
表示较低值。此代码解释了会发生什么:
case OID_RANGE_OVERLAP_OP:
case OID_RANGE_CONTAINS_ELEM_OP:
/*
* A && B <=> NOT (A << B OR A >> B).
*
* Since A << B and A >> B are mutually exclusive events we can
* sum their probabilities to find probability of (A << B OR A >>
* B).
*
* "range @> elem" is equivalent to "range && [elem,elem]". The
* caller already constructed the singular range from the element
* constant, so just treat it the same as &&.
*/
hist_selec =
calc_hist_selectivity_scalar(typcache, &const_lower, hist_upper,
nhist, false);
hist_selec +=
(1.0 - calc_hist_selectivity_scalar(typcache, &const_upper, hist_lower,
nhist, true));
hist_selec = 1.0 - hist_selec;
break;
有人知道 PostgreSQL 如何使用 Explain 估计日期范围数据类型的行数吗?例如考虑我们有这个查询:
Select * From 'Table1' Where 'period' && '[2015-01-01,2015-12-30)';
在这种情况下(当我们有日期范围数据类型时)对于 table Table1
中的 period
字段,[=15= 中的 Histogram bound
字段] 为 null ,但在 pg_statistic
中将保存两个值(stakind1=7 和 stakind2=6),其中没有用于此编码的文档,而且两个 stavalues 将保存在 table 中,看起来其中之一是字段 period
的直方图,但还有另一个?
这是一个例子:
Stavalue1:"{""[2015-01-02,2015-01-03)"",""[2015-01-29,2015-02-01)"",""[2015-02-09,2015-02-13)""}"
Satvalue2: "{1,1,1}"
这里我有三个问题:
- 什么是 Satvalue2?或者什么是 stakind2=6?
- 我们如何解释上限和下限是周期的直方图?
- 'Explain' 如何估算我上面提到的查询的行数?
提前致谢
参见 src/include/catalog/pg_statistic.h
中的定义:
/*
* A "length histogram" slot describes the distribution of range lengths in
* rows of a range-type column. stanumbers contains a single entry, the
* fraction of empty ranges. stavalues is a histogram of non-empty lengths, in
* a format similar to STATISTIC_KIND_HISTOGRAM: it contains M (>=2) range
* values that divide the column data values into M-1 bins of approximately
* equal population. The lengths are stored as float8s, as measured by the
* range type's subdiff function. Only non-null rows are considered.
*/
#define STATISTIC_KIND_RANGE_LENGTH_HISTOGRAM 6
/*
* A "bounds histogram" slot is similar to STATISTIC_KIND_HISTOGRAM, but for
* a range-type column. stavalues contains M (>=2) range values that divide
* the column data values into M-1 bins of approximately equal population.
* Unlike a regular scalar histogram, this is actually two histograms combined
* into a single array, with the lower bounds of each value forming a
* histogram of lower bounds, and the upper bounds a histogram of upper
* bounds. Only non-NULL, non-empty ranges are included.
*/
#define STATISTIC_KIND_BOUNDS_HISTOGRAM 7
这回答了第一个和第二个问题。
&&
的选择性在calc_hist_selectivity
中计算src/backend/utils/adt/rangetypes_selfuncs.c
。边界直方图分为两个直方图:hist_upper
表示较高值,hist_lower
表示较低值。此代码解释了会发生什么:
case OID_RANGE_OVERLAP_OP:
case OID_RANGE_CONTAINS_ELEM_OP:
/*
* A && B <=> NOT (A << B OR A >> B).
*
* Since A << B and A >> B are mutually exclusive events we can
* sum their probabilities to find probability of (A << B OR A >>
* B).
*
* "range @> elem" is equivalent to "range && [elem,elem]". The
* caller already constructed the singular range from the element
* constant, so just treat it the same as &&.
*/
hist_selec =
calc_hist_selectivity_scalar(typcache, &const_lower, hist_upper,
nhist, false);
hist_selec +=
(1.0 - calc_hist_selectivity_scalar(typcache, &const_upper, hist_lower,
nhist, true));
hist_selec = 1.0 - hist_selec;
break;