在 PostgreSQL(PostGIS) 中优化 ST_Intersects

Optimizing ST_Intersects in PostgreSQL(PostGIS)

下面的查询需要将近 15 分钟的时间才能显示结果。我想知道为什么?因为数据?还是几何体的顶点?当我尝试使用不同的 table(小型 shapefile)查询时,它 运行 很快。

这是查询。 (感谢 Patrick):

WITH hi AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'High'
                                 AND ST_Intersects(fh.geom, ps.geom)
), med AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Medium'
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
), low AS (
  SELECT ps.id, ps.brgy_locat, ps.municipali
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON fh.hazard = 'Low'
                                 AND ST_Intersects(fh.geom, ps.geom)
  EXCEPT SELECT * FROM hi
  EXCEPT SELECT * FROM med
)
SELECT brgy_locat AS barangay, municipali AS municipality, high, medium, low
FROM (SELECT brgy_locat, municipali, count(*) AS high
      FROM hi
      GROUP BY 1, 2) cnt_hi
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS medium
      FROM med
      GROUP BY 1, 2) cnt_med USING (brgy_locat, municipali)
FULL JOIN (SELECT brgy_locat, municipali, count(*) AS low
      FROM low
      GROUP BY 1, 2) cnt_low USING (brgy_locat, municipali);

PostgreSQL 9.3,PostGIS 2.1.5

Table Polystructures:包含 9847 行:

CREATE TABLE evidensapp_polystructures (
  id serial NOT NULL PRIMARY KEY,
  bldg_name character varying(100) NOT NULL,
  bldg_type character varying(50) NOT NULL,
  brgy_locat character varying(50) NOT NULL,
  municipali character varying(50) NOT NULL,
  province character varying(50) NOT NULL,
  geom geometry(MultiPolygon,32651)
);

CREATE INDEX evidensapp_polystructures_geom_id
  ON evidensapp_polystructures USING gist (geom);
ALTER TABLE evidensapp_polystructures CLUSTER ON evidensapp_polystructures_geom_id;

Table SeniangCBR:只有 6 行,shapefile 大小(如果重要的话):52,060 KB

CREATE TABLE evidensapp_seniangcbr (
  id serial NOT NULL PRIMARY KEY,
  hazard character varying(16) NOT NULL,
  geom geometry(MultiPolygon,32651)
);

CREATE INDEX evidensapp_seniangcbr_geom_id ON evidensapp_seniangcbr USING gist (geom);
ALTER TABLE evidensapp_seniangcbr CLUSTER ON evidensapp_seniangcbr_geom_id;

使用LayerMapping utility as I am using Django(GeoDjango).

自动将所有数据加载到数据库中

EXPLAIN ANALYZE LINK HERE.

我现在没有服务器,我 运行 在我的电脑上查询。

EXPLAIN ANALYZE 输出难以阅读,因为所有字段和函数都被打乱到 radio alphabet 中。也就是说,有两件事很突出:

  1. 大部分时间花在 ST_Intersects() 函数上,这并不奇怪。
  2. EXCEPT 子句似乎也相当低效。

所以请试试这个,不那么冗长,版本:

SELECT brgy_locat AS barangay, municipali AS municipality,
       sum(CASE max_hz_id WHEN 3 THEN 1 ELSE 0 END) AS high,
       sum(CASE max_hz_id WHEN 2 THEN 1 ELSE 0 END) AS medium,
       sum(CASE max_hz_id WHEN 1 THEN 1 ELSE 0 END) AS low
FROM (
  SELECT ps.id, ps.brgy_locat, ps.municipali,
         max(CASE fh.hazard WHEN 'Low' THEN 1 WHEN 'Medium' THEN 2 WHEN 'High' THEN 3 END) AS max_hz_id
  FROM evidensapp_polystructures ps
  JOIN evidensapp_seniangcbr fh ON ST_Intersects(fh.geom, ps.geom)
  GROUP BY 1, 2, 3
) AS ps_fh
GROUP BY 1, 2;

现在只有对 ST_Intersects() 的一次调用,这可能(希望)比对危险地图子集的三个调用快很多(由于 PostGIS 代码的内部效率)。

很明显,hazard class 字符串被转换为整数范围,以便于排序和比较。在内部查询中,根据您的要求选择最大危险值。在主查询中,每个结构的那些最大值被汇总到它们各自的列中。如果可能的话,将 table 结构更改为使用这三个整数代码,并将 link 更改为 class 标签的助手 table:您的 table 会得到更小,因此更快,并且可以删除内部查询中的 CASE 语句。或者,添加一个包含整数代码的列并根据 "hazard" 列更新值。

请注意,这些 CASE 语句不是很有效(我在之前的回答中使用 EXCEPT 子句的原因)。在 PG 9.4 中,引入了一个关于聚合函数的新 FILTER 子句,这将使查询更快、更易于阅读:

count(id) FILTER (WHERE max_hz_id = 3) AS high

您可能需要考虑升级。

Selamat mula Maynila

bounding_box geometry(Polygon,4326) 列添加到您的 table。该列的值将是一个完全封装 multipolygon.

的边界框(multipolygon 的最大 x,y 和最小 x,y)

那么您的查询将如下所示:

AND ST_Intersects(fh.bounding_box, ps.bounding_box)
AND ST_Intersects(fh.geom, ps.geom)

这样做的好处是第一个 ST_Intersects 调用非常快。如果它 returns 为假,则永远不会调用第二个更复杂的 ST_Intersects 调用,在这种情况下可以节省一些时间。

与我 类似,我会在外部 SELECT.

中使用 UNION ALL 而不是 FULL JOIN
WITH hi AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   WHERE  fh.hazard = 'High'
   GROUP  BY 1, 2, 3
   )
, med AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi USING (brgy_locat, municipali)
   WHERE  fh.hazard = 'Medium'
   AND    hi.brgy_locat IS NULL
   GROUP  BY 1, 2, 3
   )
TABLE hi

UNION ALL
TABLE med

UNION ALL
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr     fh
   JOIN   evidensapp_polystructures ps ON ST_Intersects(fh.geom, ps.geom)
   LEFT   JOIN hi  USING (brgy_locat, municipali)
   LEFT   JOIN med USING (brgy_locat, municipali)
   WHERE  fh.hazard = 'Low'
   AND    hi.brgy_locat IS NULL
   AND    med.brgy_locat IS NULL
   GROUP BY 1, 2, 3;

这仅考虑具有相同 (brgy_locat, municipali) 的每组行的最高危险级别。结果中只有与 evidensapp_seniangcbr 中任何相关危险级别实际相交的行。此外,计数只计算实际相交的行。 evidensapp_polystructures 中可能有更多具有相同 (brgy_locat, municipali) 的行,只是不与相同的危险级别相交,因此被忽略。

选择一种标准方法来排除您已经在较低级别的较高危险级别中找到匹配项的行。

  • Select rows which are not present in other table

LEFT JOIN / IS NULL 应该使用 id 上的索引并且在这里表现得很好。当然比使用基于整行的 EXCEPT 更快,后者不能使用索引。

索引

不需要 像另一个建议的答案一样需要向你的 table 添加一个 bounding_box 几何列. PostGIS 在现代版本中使用(索引支持的)边界框比较 自动 The PostGIS documentation:

This function call will automatically include a bounding box comparison that will make use of any indexes that are available on the geometries.

事实上,我们已经在 explain output you posted.

中看到了索引扫描

您现有的 GiST 索引 evidensapp_polystructures_geom_id 应该可以加快查询速度。
旁白:索引的名称应该是evidensapp_polystructures_geom_idx.

此外,如果您还没有索引,请在 (brgy_locat, municipali) 上创建索引:

CREATE INDEX foo_idx ON evidensapp_polystructures (brgy_locat, municipali);

选择 LATERAL 加入

由于 evidensapp_seniangcbr 中只有 6 行,LATERAL 联接 可能 更快:

WITH hi AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      WHERE  ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'High'
   GROUP  BY 1, 2, 3
   )
, med AS (
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      LEFT   JOIN hi USING (brgy_locat, municipali)
      WHERE  hi.brgy_locat IS NULL
      AND    ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'Medium'
   GROUP  BY 1, 2, 3
   )
TABLE hi

UNION ALL
TABLE med

UNION ALL
   SELECT ps.brgy_locat, ps.municipali, fh.hazard, count(*) AS ct
   FROM   evidensapp_seniangcbr fh
        , LATERAL (
      SELECT ps.id, ps.brgy_locat, ps.municipali
      FROM   evidensapp_polystructures ps
      LEFT   JOIN hi  USING (brgy_locat, municipali)
      LEFT   JOIN med USING (brgy_locat, municipali)
      WHERE  hi.brgy_locat IS NULL
      AND    med.brgy_locat IS NULL
      AND    ST_Intersects(fh.geom, ps.geom)
      ) ps
   WHERE  fh.hazard = 'Low'
   GROUP  BY 1, 2, 3;

关于 LATERAL 加入: