从文本 Python 中删除括号中的时间戳

Remove timestamp in the bracket from text Python

我想删除下面示例文本数据中括号中的所有时间戳

输入:

Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website (remember to use your new password) ( 11m 31s )

预期输出:

Agent: Can I help you? Customer: Thank you Customer: I have a question about X. Agent: I can help here. Log in this website (remember to use your new password)

我试过 re.sub(r'\(.*?\)', '', data) 但它没有用,因为它删除了括号中的所有内容。如果不是时间戳,我想保留括号中的内容,例如,我想在输出中保留“(记得使用你的新密码)”。

对正则表达式还是陌生的,所以希望我能在这里得到一些指导。谢谢!

\(\s(\d{1,2}[smh]\s)+\)

仅供参考:.* 匹配除行终止符之外的所有内容。

不是正则表达式,可能效率不高,但字符串方法可以:

spam = "Agent: Can I help you? ( 3s ) Customer: Thank you( 40s ) Customer: I have a question about X. ( 8m 1s ) Agent: I can help here. Log in this website (remember to use your new password) ( 11m 31s )"

def cleanup(text):
    for word in ('Agent', 'Customer'):
        text = text.replace(word, f'\n{word}').strip()
    clean_text = [line[:line.rindex('(')] for line in text.splitlines()]

    # or in slow-motion
    # clean_text = []
    # for line in text.splitlines():
    #     idx = line.rindex('(')
    #     line = line[:idx]
    #     clean_text.append(line)

    return ' '.join(clean_text)

print(cleanup(spam))

输出

Agent: Can I help you?  Customer: Thank you Customer: I have a question about X.  Agent: I can help here. Log in this website (remember to use your new password)

编辑:正如@DRPK所建议的那样,可以通过将其设为一个衬里来优化它,这将在大语料库中发挥作用

clean_text = ' '.join([line[:line.rindex('(')] for line in text.replace("Agent", '\nAgent').replace("Customer", '\nCustomer').strip().splitlines()])
\( [^\)]++\)

您可以使用此正则表达式在您的代码中替换为“”。 我确实从 http://www.amazingregex.xyz/ 生成了它。你可以用文本例子自己生成