python 正则表达式解析带括号的字符串

Question

我想解析带或不带括号的字符串。基本上对于 john[doe]，我想得到两个变量，基本上在 [] 之外和括号内。所以对于这个例子，我想提取 john 和 doe。字符串将始终具有这种结构。但另一个例子也可以只是 john，表示第二个变量是 "" 或 None。我怎样才能使用 re 库做到这一点？或者直接 Python，如果它比正则表达式更有效？

这是我到目前为止尝试过的：

s = sample_string.split("[")
x, y = (sample_string, None) if len(s) == 1 else (s[0], s[1][:-1])

Answer 1

可能有更好的方法，但这对我有用。

s = "john[doe]"
arr = []
x = re.split("\[", s)[1]
arr.append(re.split("\[", s)[0])
arr.append(re.split("\]", x)[0])
print(arr)

Answer 2

是否要求为此使用正则表达式？如果没有：

可能会更容易

if '[' in string:
  x, y = string.split('[')
  y = y.strip(']')
else:
  x, y = string, ''

包含正则表达式的内容可能如下所示：

if '[' in string:
  x, y = re.findall('^(.+)\[(.+)?]', string)[0]
else:
  x, y = string, ''

Answer 3

正则表达式解决方案：

r'^([^[]+)(?:\[([^\]]+)])?$'

^ 匹配字符串的开头。
([^[]+) 捕获组 1：匹配 1 个或多个不是“[”的字符。
(?: non-capturing 组开始。
\[ 匹配“[”。
([^\]]+) 捕获组 2：匹配 1 个或多个不是 ']' 的字符。
] 匹配']'
) non-capturing 组结束。
'?' non-capturing 组是可选的。

import re

tests = ['john', 'john[doe]']

for test in tests:
    m = re.match(r'^([^[]+)(?:\[([^\]]+)])?$', test)
    if m:
        print(test, '->', m[1], m[2])

打印：

john -> john None
john[doe] -> john doe

说明

首先，括号 ( ) 之间的任何内容都是捕获组。 (?: ) 之间的任何内容都是 non-capturing 组。这些类型的组中的任何一种都可以包含捕获 non-capturing 个组。 []用于定义一组字符。例如，[aqw] 匹配 'a'、'q' 或 'w'。 [a-e] 匹配 'a'、'b'、'c'、'd' 或 'e'。前导 ^ 的 [^aqw] 否定集合意味着它匹配 'a'、'q'、'w' 以外的任何字符。因此，[^\]] 匹配除 ']' 以外的任何字符（您必须在 ] 字符前面放置一个 \ 字符以“转义”它，因为在该上下文中 ] 具有特殊含义（否则它将关闭 [] 构造）。以下 + 符号表示“在此之前的一个或多个”。因此 ([^[]+) 匹配一个或多个 nay 字符那不是 [.

希望前面的解释对您有所帮助。

Answer 4

只要john[doe]是字符串类型，用replace函数应该可以解析出短语：

import re

x = str('john[doe]')
new_x = x.replace("[", " ").replace("]", "")
print(new_x)

或者，如果需要，您可以使用 match 函数：

import re

x = str('john[doe]')
m = re.match(r"(?P<first_name>\w+)\[(?P<last_name>\w+)\]", x)
name = m.group('first_name') + " " + m.group('last_name')
print(name)

在没有更多的短语要解析的情况下，我不确定两者中哪一个更快。祝你好运！ :)

python 正则表达式解析带括号的字符串

python regex parse string with brackets

python

python-re