re.findall 从 / 分隔的路径名中获取列表目录，但允许 // 作为文字，单个 /

Question

标题说的差不多了。

我尝试了多种方法，包括但不限于：

>>> re.findall(r'(/+)([^/]*)', '///a//b/c///d')
[('///', 'a'), ('//', 'b'), ('/', 'c'), ('///', 'd')]

并且：

>>> re.findall('(/+[^/]*)', '///a//b/c///d')
['///a', '//b', '/c', '///d']

我想要的是这样的：

>>> re.findall(something, '///a//b/c///d')
['/', 'a/b', 'c/', 'd']

...或接近于此。请注意，此示例是相对路径，因为开头的 // 是包含整个第一个文件夹名称的单个斜杠。

我们有一些使用 string.split('/') 和列表操作的方法，但我们想探索 regex-based 解决方案。

谢谢！

Answer 1

你怎么看

re.findall(r'[^/]*/+[^/]*', '///a//b/c///d')
['///a', '//b', '/c', '///d']

Answer 2

一种选择是将您的字符串视为 3 种不同的模式，例如：

re.findall(r'(^/|[^/]+//[^/]*|[^/]+$)', '///a//b/c///d')

输出：

['/', 'a//b', 'c//', 'd']

Answer 3

假设转义优先于拆分（即 '///' = '/' + 分隔符），您可以这样做：

p = '///a//b/c///d'

import re # this is not the ideal tool for this kind of thing

# pattern splits '/' when it is preceded by '//' (escaped '/')
# or when it is not preceded by another '/'
# in both cases the '/' must not be followed by another '/'

pattern = r"((?<=\/\/)|(?<!\/))(?!.\/)\/"

# replace the separators by an end of line then split on it
# after unescaping the '//'

path = re.sub(pattern,"\n",p).replace("//","/").split("\n")

# or split and unescape (exclude empty parts generated by re.split)

path = [s.replace("//","/") for s in re.split(pattern,p) if s] 

print(path) # ['/', 'a/b', 'c/', 'd']

然而，非重新解决方案可能更易于管理：

path = [s.replace("[=11=]","/") for s in p.replace("//","[=11=]").split("/")]

# or

path = p.replace("//","[=11=]").replace("/","\n").replace("[=11=]","/").split("\n")

print(path) # ['/', 'a/b', 'c/', 'd']

注意：要获得 ["c//","d"]，您需要将源代码编码为 "c/////d"

re.findall 从 / 分隔的路径名中获取列表目录，但允许 // 作为文字，单个 /

re.findall to get a list directories from a /-separated pathname, but allowing // as a literal, single /

python

python-re