特定 URL 的正则表达式
Regexp on specific URL
我有一个这样的 URL 列表:
http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html
我在这里想要的是只接受像
这样的 URLS
http://www.toto.com/something/else/again/more
仅限于此,多了就不拍了
你能帮帮我吗? :)
合适的正则表达式是:
^http://www.toto.com/(\w+/){4}$
过滤示例:
>>> for line in lines:
... if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
... print line
...
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
你可以这样做:
https://regex101.com/r/gK6hR3/1
但在最后添加 $
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}
所以:
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$
我有一个这样的 URL 列表:
http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html
我在这里想要的是只接受像
这样的 URLShttp://www.toto.com/something/else/again/more
仅限于此,多了就不拍了
你能帮帮我吗? :)
合适的正则表达式是:
^http://www.toto.com/(\w+/){4}$
过滤示例:
>>> for line in lines:
... if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
... print line
...
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
你可以这样做:
https://regex101.com/r/gK6hR3/1
但在最后添加 $
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}
所以:
http:\/\/www\.[a-zA-Z.-]+\/[a-zA-Z-]+[\/]{0,1}[\.a-zA-Z-]{0,}$