使用正则表达式删除首字母缩略词,基于括号后的大写字符

Removing acronyms using regex , based on uppercase characters following parenthesis

如何删除以下内容:

但是不是括号之间以大写字母开头后跟小写字母的单词,例如'(Bobby)' 或 '(Bob went to the beach..)' --> 这是我苦苦挣扎的部分。


text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z]*\)?', '', string)
  print(cleaned_acronyms)

#current output:
>> 'went to the beach' #Correct
>>'The girl -2A) is walking' #Not correct
>>'The dog obby) is being walked' #Not correct
>>'They are there' #Correct


#desired & correct output:
>> 'went to the beach'
>>'The girl is walking'
>>'The dog (Bobby) is being walked' #(Bobby) is NOT an acronym (uppercase+lowercase)
>>'They are there'

使用模式\([A-Z0-9\-]+\)

例如:

import re

text = ['ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
ptrn = re.compile(r"\([A-Z0-9\-]+\)")
for i in text:
    print(ptrn.sub("", i))

输出:

ABC went to the beach
The girl  is walking
The dog (Bobby) is being walked
They are there

在以下上下文中使用 \([A-Z\-0-9]{2,}\)?

import re

text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\([A-Z\-0-9]{2,}\)?', '', string)
  print(cleaned_acronyms)

我得到这些结果:

' went to the beach'
'The girl  is walking'
'The dog (Bobby) is being walked'
'They are there '

尝试使用负前瞻:

\((?![A-Z][a-z])[A-Z\d-]+\)?\s*

在线查看demo

  • \( - 文字开头的括号。
  • (?![A-Z][a-z]) - 否定先行断言位置后跟大写后跟小写。
  • [A-Z\d-]+ - 匹配 1+ 个大写字母字符、数字或连字符。
  • \)? - 一个可选的文字结束括号。
  • \s* - 0+ 个空白字符。

一些示例 Python 脚本:

import re
text = ['(ABC went to the beach', 'The girl (ABC-2A) is walking', 'The dog (Bobby) is being walked', 'They are there (ABC)' ]
for string in text:
  cleaned_acronyms = re.sub(r'\((?![A-Z][a-z])[A-Z\d-]+\)?\s*', '', string)
  print(cleaned_acronyms)

打印:

went to the beach
The girl is walking
The dog (Bobby) is being walked
They are there