如何过滤最大值并写入行?

How to filter the max value and write to row?

Postgres 9.3.5,PostGIS 2.1.4。

我在数据库中有两个 tables(polygonspoints)。

我想知道每个 polygon 中有多少 points。每个多边形将有 0 个点或超过 200000 个。小问题如下。

我的 point table 看起来如下:

x    y    lan
10  11    en
10  11    fr
10  11    en
10  11    es
10  11    en
- #just for demonstration/clarification purposes
13  14    fr
13  14    fr
13  14    es
-
15  16    ar
15  16    ar
15  16    ps

我不想简单地计算每个多边形的点数。我想知道每个多边形中最常出现的 lan 是什么。因此,假设每个 - 表示点落入一个新的多边形中,我的结果将如下所示:

Polygon table:

polygon    Count   lan
1          3       en
2          2       fr
3          2       ar

这是我目前得到的。

SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere 
  FROM hexagonslan AS hexagons, 
       points_count AS france 
 WHERE ST_Within(count.geom, hexagons.geom) 
 GROUP BY language, hexagonsWhere 
 ORDER BY hexagons DESC;

它给了我以下信息:

Polygon    Count     language
1          3         en
1          1         fr
1          1         es
2          2         fr
2          1         es
3          2         ar
3          1         ps

有两件事还不清楚。

  1. 如何只获取最大值?
  2. 如果最大值完全相同,将如何处理?

1 的答案。

要获取最常见的语言及其每个多边形的计数,您可以使用简单的 DISTINCT ON 查询:

SELECT DISTINCT ON (h.gid)
       h.gid AS polygon, count(c.geom) AS ct, c.language
FROM   hexagonslan h
LEFT   JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP  BY h.gid, c.language
ORDER  BY h.gid, count(c.geom) DESC, c.language;  -- language name is tiebreaker
  • Select first row in each GROUP BY group?

但是对于您描述的数据分布(每个多边形最多 200.000 个点),这应该显着更快(希望更好地使用 c.geom 上的索引):

SELECT h.gid AS polygon, c.ct, c.language
FROM   hexagonslan h
LEFT   JOIN LATERAL (
   SELECT c.language, count(*) AS ct
   FROM   points_count c
   WHERE  ST_Within(c.geom, h.geom) 
   GROUP  BY 1
   ORDER  BY 2 DESC, 1  -- again, language name is tiebreaker
   LIMIT  1
   ) c ON true
ORDER  BY 1;
  • Optimize GROUP BY query to retrieve latest record per user

LEFT JOIN LATERAL .. ON true 保留不包含任何点的多边形。

  • Call a set-returning function with an array argument multiple times

cases where there are by any chance the max values identical 中,示例中通过添加的 ORDER BY 项目选择了按字母顺序排列的第一语言。如果你想要所有种语言碰巧共享最大数量,你必须做更多:

2 的答案。

SELECT h.gid AS polygon, c.ct, c.language
FROM   hexagonslan h
LEFT   JOIN LATERAL (
   SELECT c.language, count(*) AS ct
        , rank() OVER (ORDER BY count(*) DESC) AS rnk
   FROM   points_count c
   WHERE  ST_Within(c.geom, h.geom) 
   GROUP  BY 1
   ) c ON c.rnk = 1
ORDER  BY 1, 3  -- language only as additional sort critieria

在这里使用 window function rank(),(不是 row_number()!)。我们可以在单个 SELECT 中获得计数或分数 计数的排名。考虑事件的顺序:

  • Best way to get result count before LIMIT was applied