如何使用正则表达式提取 python 中的子字符串

Question

我有一个字符串 this is title [[this is translated title]] 并且我需要提取这两个子字段。 this is title、this is translated title

我尝试使用正则表达式，但无法完成。

def translate(value):
    # Values are paseed in the form of 
    # "This is text [[This is translated text]]"
    import re
    regex = r"(.+)(\[\[.*\]\])"
    match = re.match(regex, value)
    # Return text
    first = match.group(1)

    # Return translated text
    second = match.group(2).lstrip("[[").rstrip("]]")

    return first, second

但这失败了。当字符串是 "simple plain text"

Answer 1

我找到了一个不使用正则表达式的简单方法

def trns(value):
    first, second =  value.rstrip("]]").split("[[")
    return first, second

Answer 2

您必须使用正则表达式 r'((\w.*)\[\[(\w.*)\]\]|(\w.*)) 产生 this is title in group(1) and this is translated title 在 group(2) 所以你的代码应该是

def translate(value):
    # value = "This is text [[This is translated text]]"
    import re
    regex = r'((\w.*)\[\[(\w.*)\]\]|(\w.*))'
    match = re.match(regex, value)
    result = [x for x in match.groups() if x and x!=value]
    return result if result else value

这returns如您所料。

要测试您的正则表达式，您可以使用 this.

如何使用正则表达式提取 python 中的子字符串

How to extract substring in python using regular expression

python

regex

python-3.x

python-3.5