如何过滤最大值并写入行?
How to filter the max value and write to row?
Postgres 9.3.5,PostGIS 2.1.4。
我在数据库中有两个 tables(polygons
和 points
)。
我想知道每个 polygon
中有多少 points
。每个多边形将有 0 个点或超过 200000 个。小问题如下。
我的 point
table 看起来如下:
x y lan
10 11 en
10 11 fr
10 11 en
10 11 es
10 11 en
- #just for demonstration/clarification purposes
13 14 fr
13 14 fr
13 14 es
-
15 16 ar
15 16 ar
15 16 ps
我不想简单地计算每个多边形的点数。我想知道每个多边形中最常出现的 lan
是什么。因此,假设每个 -
表示点落入一个新的多边形中,我的结果将如下所示:
Polygon
table:
polygon Count lan
1 3 en
2 2 fr
3 2 ar
这是我目前得到的。
SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere
FROM hexagonslan AS hexagons,
points_count AS france
WHERE ST_Within(count.geom, hexagons.geom)
GROUP BY language, hexagonsWhere
ORDER BY hexagons DESC;
它给了我以下信息:
Polygon Count language
1 3 en
1 1 fr
1 1 es
2 2 fr
2 1 es
3 2 ar
3 1 ps
有两件事还不清楚。
- 如何只获取最大值?
- 如果最大值完全相同,将如何处理?
1 的答案。
要获取最常见的语言及其每个多边形的计数,您可以使用简单的 DISTINCT ON
查询:
SELECT DISTINCT ON (h.gid)
h.gid AS polygon, count(c.geom) AS ct, c.language
FROM hexagonslan h
LEFT JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP BY h.gid, c.language
ORDER BY h.gid, count(c.geom) DESC, c.language; -- language name is tiebreaker
- Select first row in each GROUP BY group?
但是对于您描述的数据分布(每个多边形最多 200.000 个点),这应该显着更快(希望更好地使用 c.geom
上的索引):
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
ORDER BY 2 DESC, 1 -- again, language name is tiebreaker
LIMIT 1
) c ON true
ORDER BY 1;
- Optimize GROUP BY query to retrieve latest record per user
LEFT JOIN
LATERAL ..
ON true
保留不包含任何点的多边形。
- Call a set-returning function with an array argument multiple times
在 cases where there are by any chance the max values identical
中,示例中通过添加的 ORDER BY
项目选择了按字母顺序排列的第一语言。如果你想要所有种语言碰巧共享最大数量,你必须做更多:
2 的答案。
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
) c ON c.rnk = 1
ORDER BY 1, 3 -- language only as additional sort critieria
在这里使用 window function rank()
,(不是 row_number()
!)。我们可以在单个 SELECT
中获得计数或分数 和 计数的排名。考虑事件的顺序:
- Best way to get result count before LIMIT was applied
Postgres 9.3.5,PostGIS 2.1.4。
我在数据库中有两个 tables(polygons
和 points
)。
我想知道每个 polygon
中有多少 points
。每个多边形将有 0 个点或超过 200000 个。小问题如下。
我的 point
table 看起来如下:
x y lan
10 11 en
10 11 fr
10 11 en
10 11 es
10 11 en
- #just for demonstration/clarification purposes
13 14 fr
13 14 fr
13 14 es
-
15 16 ar
15 16 ar
15 16 ps
我不想简单地计算每个多边形的点数。我想知道每个多边形中最常出现的 lan
是什么。因此,假设每个 -
表示点落入一个新的多边形中,我的结果将如下所示:
Polygon
table:
polygon Count lan
1 3 en
2 2 fr
3 2 ar
这是我目前得到的。
SELECT count(*), count.language AS language, hexagons.gid AS hexagonsWhere
FROM hexagonslan AS hexagons,
points_count AS france
WHERE ST_Within(count.geom, hexagons.geom)
GROUP BY language, hexagonsWhere
ORDER BY hexagons DESC;
它给了我以下信息:
Polygon Count language
1 3 en
1 1 fr
1 1 es
2 2 fr
2 1 es
3 2 ar
3 1 ps
有两件事还不清楚。
- 如何只获取最大值?
- 如果最大值完全相同,将如何处理?
1 的答案。
要获取最常见的语言及其每个多边形的计数,您可以使用简单的 DISTINCT ON
查询:
SELECT DISTINCT ON (h.gid)
h.gid AS polygon, count(c.geom) AS ct, c.language
FROM hexagonslan h
LEFT JOIN points_count c ON ST_Within(c.geom, h.geom)
GROUP BY h.gid, c.language
ORDER BY h.gid, count(c.geom) DESC, c.language; -- language name is tiebreaker
- Select first row in each GROUP BY group?
但是对于您描述的数据分布(每个多边形最多 200.000 个点),这应该显着更快(希望更好地使用 c.geom
上的索引):
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
ORDER BY 2 DESC, 1 -- again, language name is tiebreaker
LIMIT 1
) c ON true
ORDER BY 1;
- Optimize GROUP BY query to retrieve latest record per user
LEFT JOIN
LATERAL ..
ON true
保留不包含任何点的多边形。
- Call a set-returning function with an array argument multiple times
在 cases where there are by any chance the max values identical
中,示例中通过添加的 ORDER BY
项目选择了按字母顺序排列的第一语言。如果你想要所有种语言碰巧共享最大数量,你必须做更多:
2 的答案。
SELECT h.gid AS polygon, c.ct, c.language
FROM hexagonslan h
LEFT JOIN LATERAL (
SELECT c.language, count(*) AS ct
, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM points_count c
WHERE ST_Within(c.geom, h.geom)
GROUP BY 1
) c ON c.rnk = 1
ORDER BY 1, 3 -- language only as additional sort critieria
在这里使用 window function rank()
,(不是 row_number()
!)。我们可以在单个 SELECT
中获得计数或分数 和 计数的排名。考虑事件的顺序:
- Best way to get result count before LIMIT was applied