如何在 postgres 查询中按正则表达式分组
How to group by a regular expression in a postgres query
我正在清理其他人的 restful 应用程序,在这样做的过程中,似乎有一些路由没有被使用。为了开始进行故障排除,我创建了一个 table,其中包含一个唯一的文本列来存储路线。
---------routes---------
https://test.com/user/1/info
https://test.com/test/2/info
https://test.com/banana/100
https://test.com/post/3/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/3/date
https://test.com/banana/3
https://test.com/user/2/info
https://test.com/test/5/info
.
.
.
现在 id 喜欢做的是,使用一些正则表达式(或其他)进行查询,将上面的条目分组以获得以下结果:
---------routes---------
https://test.com/user/{x}/info
https://test.com/test/{x}/info
https://test.com/post/{x}/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/{x}/date
https://test.com/banana/{x}
其中 {x} 只是作为分组结果出现的一些通用标记。我知道我们可以搜索特定的正则表达式,但我不知道如何尝试将字符串折叠成组,然后吐出 'recommended' 分组
PS:因为我们陷入了石器时代,任何解决方案都受限于postgresql 8.4.20
编辑--
klin,你的回答不太适合我,因为它给了我
regexp_replace | count
------------------------------+-------
https://test.com/user/1/info | 1
https://test.com/test/2/info | 1
https://test.com/banana/100 | 1
\x01{x}ate | 2
https://test.com/user/2/info | 1
https://test.com/grape/ | 1
https://test.com/test/5/info | 1
https://test.com/post/ | 1
https://test.com/banana/3 | 1
(9 rows)
但至少这给了我一些想法,我会 post 当我再玩一点的时候回来
我无法在 8.4 中对此进行测试...
with routes(url) as (
values
('https://test.com/user/1/info'),
('https://test.com/test/2/info'),
('https://test.com/banana/100'),
('https://test.com/post/3/date'),
('https://test.com/post/'),
('https://test.com/grape/'),
('http://test.com/post/3/date'),
('https://test.com/banana/3'),
('https://test.com/user/2/info'),
('https://test.com/test/5/info')
)
select regexp_replace(url, '^(.+//.+/.+/)\d+', '{x}'), count(*)
from routes
group by 1
regexp_replace | count
--------------------------------+-------
https://test.com/banana/{x} | 2
https://test.com/post/{x}/date | 1
http://test.com/post/{x}/date | 1
https://test.com/user/{x}/info | 2
https://test.com/test/{x}/info | 2
https://test.com/grape/ | 1
https://test.com/post/ | 1
(7 rows)
我正在清理其他人的 restful 应用程序,在这样做的过程中,似乎有一些路由没有被使用。为了开始进行故障排除,我创建了一个 table,其中包含一个唯一的文本列来存储路线。
---------routes---------
https://test.com/user/1/info
https://test.com/test/2/info
https://test.com/banana/100
https://test.com/post/3/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/3/date
https://test.com/banana/3
https://test.com/user/2/info
https://test.com/test/5/info
.
.
.
现在 id 喜欢做的是,使用一些正则表达式(或其他)进行查询,将上面的条目分组以获得以下结果:
---------routes---------
https://test.com/user/{x}/info
https://test.com/test/{x}/info
https://test.com/post/{x}/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/{x}/date
https://test.com/banana/{x}
其中 {x} 只是作为分组结果出现的一些通用标记。我知道我们可以搜索特定的正则表达式,但我不知道如何尝试将字符串折叠成组,然后吐出 'recommended' 分组
PS:因为我们陷入了石器时代,任何解决方案都受限于postgresql 8.4.20
编辑--
klin,你的回答不太适合我,因为它给了我
regexp_replace | count
------------------------------+-------
https://test.com/user/1/info | 1
https://test.com/test/2/info | 1
https://test.com/banana/100 | 1
\x01{x}ate | 2
https://test.com/user/2/info | 1
https://test.com/grape/ | 1
https://test.com/test/5/info | 1
https://test.com/post/ | 1
https://test.com/banana/3 | 1
(9 rows)
但至少这给了我一些想法,我会 post 当我再玩一点的时候回来
我无法在 8.4 中对此进行测试...
with routes(url) as (
values
('https://test.com/user/1/info'),
('https://test.com/test/2/info'),
('https://test.com/banana/100'),
('https://test.com/post/3/date'),
('https://test.com/post/'),
('https://test.com/grape/'),
('http://test.com/post/3/date'),
('https://test.com/banana/3'),
('https://test.com/user/2/info'),
('https://test.com/test/5/info')
)
select regexp_replace(url, '^(.+//.+/.+/)\d+', '{x}'), count(*)
from routes
group by 1
regexp_replace | count
--------------------------------+-------
https://test.com/banana/{x} | 2
https://test.com/post/{x}/date | 1
http://test.com/post/{x}/date | 1
https://test.com/user/{x}/info | 2
https://test.com/test/{x}/info | 2
https://test.com/grape/ | 1
https://test.com/post/ | 1
(7 rows)