使用正则表达式在特定单词后立即查找缩写

Question

我的目标是识别出现在@PROG$ 之后的缩写词并将其更改为@PROG$。（例如 ALI -> @PROG$）

输入

s = "背景（未分配）：我们之前的研究表明，@PROG$ (ALI) 和 C 反应蛋白 (CRP) 是可手术非小细胞肺癌 (NSCLC) 患者的独立重要预后因素."

输出

“背景（未分配）：我们之前的研究表明，@PROG$、@PROG$ 和 C 反应蛋白 (CRP) 是可手术非小细胞肺癌 (NSCLC) 患者的独立重要预后因素。”

我尝试了类似这样的方法 re.findall('($.*?$)', s)，它给了我所有的缩写。这里有什么帮助吗？我需要修复什么？

Answer 1

您可以使用 re.sub 解决方案，例如

import re
s = "Background (UNASSIGNED): Previous study of ours showed that @PROG$ (ALI) and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients."
print( re.sub(r'(@PROG$\s+)\([A-Z]+\)', r'@PROG$', s) )
# => Background (UNASSIGNED): Previous study of ours showed that @PROG$ @PROG$ and C-reactive protein (CRP) are independent significant prognostic factors in operable non-small cell lung cancer (NSCLC) patients.

见Python demo。正则表达式是

(@PROG$\s+)\([A-Z]+\)

见regex demo。详情：

(@PROG$\s+) - 第 1 组（</code> 指的是替换模式中的该组值）：<code>@PROG$ 和一个或多个空格
\( - 一个 ( 字符
[A-Z]+ - 一个或多个大写 ASCII 字母（替换为 [^()]* 以匹配括号之间的任何内容，( 和 ) 除外）
\) - 一个 ) 字符。

使用正则表达式在特定单词后立即查找缩写

find abbreviation right after a specific word using regular expression

python

regex

abbreviation