特定 URL 的正则表达式

Regexp on specific URL

我有一个这样的 URL 列表:

http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html

我在这里想要的是只接受像

这样的 URLS
http://www.toto.com/something/else/again/more

仅限于此,多了就不拍了

你能帮帮我吗? :)

合适的正则表达式是:

^http://www.toto.com/(\w+/){4}$

过滤示例:

>>> for line in lines:
...     if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
...         print line
... 
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/

你可以这样做:

https://regex101.com/r/gK6hR3/1

但在最后添加 $

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}

所以:

http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$