如何在 postgres 查询中按正则表达式分组

How to group by a regular expression in a postgres query

我正在清理其他人的 restful 应用程序,在这样做的过程中,似乎有一些路由没有被使用。为了开始进行故障排除,我创建了一个 table,其中包含一个唯一的文本列来存储路线。

---------routes---------
https://test.com/user/1/info
https://test.com/test/2/info
https://test.com/banana/100
https://test.com/post/3/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/3/date
https://test.com/banana/3
https://test.com/user/2/info
https://test.com/test/5/info
.
.
.

现在 id 喜欢做的是,使用一些正则表达式(或其他)进行查询,将上面的条目分组以获得以下结果:

---------routes---------
https://test.com/user/{x}/info
https://test.com/test/{x}/info
https://test.com/post/{x}/date
https://test.com/post/
https://test.com/grape/
http://test.com/post/{x}/date
https://test.com/banana/{x}

其中 {x} 只是作为分组结果出现的一些通用标记。我知道我们可以搜索特定的正则表达式,但我不知道如何尝试将字符串折叠成组,然后吐出 'recommended' 分组

PS:因为我们陷入了石器时代,任何解决方案都受限于postgresql 8.4.20

编辑--

klin,你的回答不太适合我,因为它给了我

     regexp_replace        | count 
------------------------------+-------
 https://test.com/user/1/info |     1
 https://test.com/test/2/info |     1
 https://test.com/banana/100  |     1
 \x01{x}ate                   |     2
 https://test.com/user/2/info |     1
 https://test.com/grape/      |     1
 https://test.com/test/5/info |     1
 https://test.com/post/       |     1
 https://test.com/banana/3    |     1
(9 rows)

但至少这给了我一些想法,我会 post 当我再玩一点的时候回来

我无法在 8.4 中对此进行测试...

with routes(url) as (
values
    ('https://test.com/user/1/info'),
    ('https://test.com/test/2/info'),
    ('https://test.com/banana/100'),
    ('https://test.com/post/3/date'),
    ('https://test.com/post/'),
    ('https://test.com/grape/'),
    ('http://test.com/post/3/date'),
    ('https://test.com/banana/3'),
    ('https://test.com/user/2/info'),
    ('https://test.com/test/5/info')
)

select regexp_replace(url, '^(.+//.+/.+/)\d+', '{x}'), count(*)
from routes
group by 1

         regexp_replace         | count 
--------------------------------+-------
 https://test.com/banana/{x}    |     2
 https://test.com/post/{x}/date |     1
 http://test.com/post/{x}/date  |     1
 https://test.com/user/{x}/info |     2
 https://test.com/test/{x}/info |     2
 https://test.com/grape/        |     1
 https://test.com/post/         |     1
(7 rows)    

You can test this here (Postgres 9.5).

Check pattern here.