特定 URL 的正则表达式

Question

我有一个这样的 URL 列表：

http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html

我在这里想要的是只接受像

这样的 URLS

http://www.toto.com/something/else/again/more

仅限于此，多了就不拍了

你能帮帮我吗？ :)

Answer 1

合适的正则表达式是：

^http://www.toto.com/(\w+/){4}$

过滤示例：

>>> for line in lines:
...     if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
...         print line
... 
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/

Answer 2

你可以这样做：

https://regex101.com/r/gK6hR3/1

但在最后添加 $

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}

所以：

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$

特定 URL 的正则表达式

Regexp on specific URL

python

regex