加号和减号以及字母的正则表达式？

Question

我正在尝试使用 re 库编写一个 Python 函数，本质上是尝试提取括号内的单词。这些单词包含字母，有些单词包含加号或减号（但不是全部）。这是我目前拥有的功能：

def processArray(array):
    newArray = []
    for i in array:
        m = re.search(r"\[([A-Za-z0-9\-\+]+)\]", i).groups()[0]
        newArray.append(m)

    return newArray

传入的array参数为[['Preconditions [+Here]\n'], ['Preconditions [+Is', '+The]\n'], ['Preconditions [-Example]\n']]。我希望得到的newArray是['+Here', '-Is', '+The', '-Example']。使用我当前的函数，这是抛出的错误：

  File "file.py", line 71, in <module>
    preconditions = processArray(preconditions)
  File "file.py", line 29, in processArray
    m = re.search(r"\[([A-Za-z0-9\-\+]+)\]", i).groups()[0]
  File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/re.py", line 183, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

谁能解释为什么会出现此错误以及我可以采取什么措施来解决它？

Answer 1

您可以遍历列表内部的列表并加入所有内部列表并使用以下修复方法：

import re

def processArray(array):
    newArray = []
    for l in array:
        m = re.findall(r"[A-Za-z0-9+-]+(?=[^][]*])", " ".join(l))
        if m:
            newArray.extend(m)
    return newArray

print(processArray([['Preconditions [+Here]\n'], ['Preconditions [+Is', '+The]\n'], ['Preconditions [-Example]\n']]))
# => ['+Here', '+Is', '+The', '-Example']

看到一个Python demo。

正则表达式是 [A-Za-z0-9+-]+(?=[^][]*])，它是一种解决方法，可以匹配一个或多个字母数字或 -/+ 字符，但前提是后跟 [=15= 以外的 0+ 个字符] 和 ] 直到 ]。它不检查打开的 [。如果有必要，您将不得不运行两个正则表达式操作：

def processArray(array):
    newArray = []
    for l in array:
        m = re.findall(r"\[(.*?)]", " ".join(l))
        for n in m:
            k = re.findall(r'[A-Za-z0-9+-]+', n)
            if k:
                newArray.extend(k)
    return newArray

参见this Python demo，其中首先提取括号之间的字符串，然后在其中进行必要的匹配。

加号和减号以及字母的正则表达式？

Regex for a plus and minus sign, along with letters?

python

regex

arrays

parsing

python-re