re.findall 从 / 分隔的路径名中获取列表目录,但允许 // 作为文字,单个 /
re.findall to get a list directories from a /-separated pathname, but allowing // as a literal, single /
标题说的差不多了。
我尝试了多种方法,包括但不限于:
>>> re.findall(r'(/+)([^/]*)', '///a//b/c///d')
[('///', 'a'), ('//', 'b'), ('/', 'c'), ('///', 'd')]
并且:
>>> re.findall('(/+[^/]*)', '///a//b/c///d')
['///a', '//b', '/c', '///d']
我想要的是这样的:
>>> re.findall(something, '///a//b/c///d')
['/', 'a/b', 'c/', 'd']
...或接近于此。请注意,此示例是相对路径,因为开头的 // 是包含整个第一个文件夹名称的单个斜杠。
我们有一些使用 string.split('/') 和列表操作的方法,但我们想探索 regex-based 解决方案。
谢谢!
你怎么看
re.findall(r'[^/]*/+[^/]*', '///a//b/c///d')
['///a', '//b', '/c', '///d']
一种选择是将您的字符串视为 3 种不同的模式,例如:
re.findall(r'(^/|[^/]+//[^/]*|[^/]+$)', '///a//b/c///d')
输出:
['/', 'a//b', 'c//', 'd']
假设转义优先于拆分(即 '///' = '/' + 分隔符),您可以这样做:
p = '///a//b/c///d'
import re # this is not the ideal tool for this kind of thing
# pattern splits '/' when it is preceded by '//' (escaped '/')
# or when it is not preceded by another '/'
# in both cases the '/' must not be followed by another '/'
pattern = r"((?<=\/\/)|(?<!\/))(?!.\/)\/"
# replace the separators by an end of line then split on it
# after unescaping the '//'
path = re.sub(pattern,"\n",p).replace("//","/").split("\n")
# or split and unescape (exclude empty parts generated by re.split)
path = [s.replace("//","/") for s in re.split(pattern,p) if s]
print(path) # ['/', 'a/b', 'c/', 'd']
然而,非重新解决方案可能更易于管理:
path = [s.replace("[=11=]","/") for s in p.replace("//","[=11=]").split("/")]
# or
path = p.replace("//","[=11=]").replace("/","\n").replace("[=11=]","/").split("\n")
print(path) # ['/', 'a/b', 'c/', 'd']
注意:要获得 ["c//","d"]
,您需要将源代码编码为 "c/////d"
标题说的差不多了。
我尝试了多种方法,包括但不限于:
>>> re.findall(r'(/+)([^/]*)', '///a//b/c///d')
[('///', 'a'), ('//', 'b'), ('/', 'c'), ('///', 'd')]
并且:
>>> re.findall('(/+[^/]*)', '///a//b/c///d')
['///a', '//b', '/c', '///d']
我想要的是这样的:
>>> re.findall(something, '///a//b/c///d')
['/', 'a/b', 'c/', 'd']
...或接近于此。请注意,此示例是相对路径,因为开头的 // 是包含整个第一个文件夹名称的单个斜杠。
我们有一些使用 string.split('/') 和列表操作的方法,但我们想探索 regex-based 解决方案。
谢谢!
你怎么看
re.findall(r'[^/]*/+[^/]*', '///a//b/c///d')
['///a', '//b', '/c', '///d']
一种选择是将您的字符串视为 3 种不同的模式,例如:
re.findall(r'(^/|[^/]+//[^/]*|[^/]+$)', '///a//b/c///d')
输出:
['/', 'a//b', 'c//', 'd']
假设转义优先于拆分(即 '///' = '/' + 分隔符),您可以这样做:
p = '///a//b/c///d'
import re # this is not the ideal tool for this kind of thing
# pattern splits '/' when it is preceded by '//' (escaped '/')
# or when it is not preceded by another '/'
# in both cases the '/' must not be followed by another '/'
pattern = r"((?<=\/\/)|(?<!\/))(?!.\/)\/"
# replace the separators by an end of line then split on it
# after unescaping the '//'
path = re.sub(pattern,"\n",p).replace("//","/").split("\n")
# or split and unescape (exclude empty parts generated by re.split)
path = [s.replace("//","/") for s in re.split(pattern,p) if s]
print(path) # ['/', 'a/b', 'c/', 'd']
然而,非重新解决方案可能更易于管理:
path = [s.replace("[=11=]","/") for s in p.replace("//","[=11=]").split("/")]
# or
path = p.replace("//","[=11=]").replace("/","\n").replace("[=11=]","/").split("\n")
print(path) # ['/', 'a/b', 'c/', 'd']
注意:要获得 ["c//","d"]
,您需要将源代码编码为 "c/////d"