re.findall 从 / 分隔的路径名中获取列表目录,但允许 // 作为文字,单个 /

re.findall to get a list directories from a /-separated pathname, but allowing // as a literal, single /

标题说的差不多了。

我尝试了多种方法,包括但不限于:

>>> re.findall(r'(/+)([^/]*)', '///a//b/c///d')
[('///', 'a'), ('//', 'b'), ('/', 'c'), ('///', 'd')]

并且:

>>> re.findall('(/+[^/]*)', '///a//b/c///d')
['///a', '//b', '/c', '///d']

我想要的是这样的:

>>> re.findall(something, '///a//b/c///d')
['/', 'a/b', 'c/', 'd']

...或接近于此。请注意,此示例是相对路径,因为开头的 // 是包含整个第一个文件夹名称的单个斜杠。

我们有一些使用 string.split('/') 和列表操作的方法,但我们想探索 regex-based 解决方案。

谢谢!

你怎么看

re.findall(r'[^/]*/+[^/]*', '///a//b/c///d')
['///a', '//b', '/c', '///d']

一种选择是将您的字符串视为 3 种不同的模式,例如:

re.findall(r'(^/|[^/]+//[^/]*|[^/]+$)', '///a//b/c///d')

输出:

['/', 'a//b', 'c//', 'd']

假设转义优先于拆分(即 '///' = '/' + 分隔符),您可以这样做:

p = '///a//b/c///d'

import re # this is not the ideal tool for this kind of thing

# pattern splits '/' when it is preceded by '//' (escaped '/')
# or when it is not preceded by another '/'
# in both cases the '/' must not be followed by another '/'

pattern = r"((?<=\/\/)|(?<!\/))(?!.\/)\/"

# replace the separators by an end of line then split on it
# after unescaping the '//'

path = re.sub(pattern,"\n",p).replace("//","/").split("\n")

# or split and unescape (exclude empty parts generated by re.split)

path = [s.replace("//","/") for s in re.split(pattern,p) if s] 

print(path) # ['/', 'a/b', 'c/', 'd']

然而,非重新解决方案可能更易于管理:

path = [s.replace("[=11=]","/") for s in p.replace("//","[=11=]").split("/")]

# or

path = p.replace("//","[=11=]").replace("/","\n").replace("[=11=]","/").split("\n")

print(path) # ['/', 'a/b', 'c/', 'd']

注意:要获得 ["c//","d"],您需要将源代码编码为 "c/////d"