使用 python 正则表达式模块将值替换为之前出现的首字母缩略词

Question

我需要将前一个单词添加到出现在句子的-number 之前的-number。请检查输入字符串和预期输出字符串以获得更多说明。我用静态方式尝试了正则表达式的 .replace、.sub 方法，这是一种操纵输出。

输入字符串：

The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes.

预期输出字符串：

The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

代码：

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
regex1 = re.findall(r"[a-z]+\s+\(+[A-Z]+\)+-\d+\,\s+-\d\,+", string_a)
regex2 = re.findall(r"[A-Z]+-\d+\,\s+-\d\,\s+-\d\,\s+-\d\,\s+[a-z]+\s+-\d+", string_a)

Answer 1

您可以使用

import re
string_a = "The acnes stimulated the mRNA expression of interleukin (IL)-1, -8, LL-37, MMP-1, -2, -3, -9, and -13 in keratinocytes."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+(?:,\s*-\d+)*)(?:,\s*and\s+(-\d+))?")
print( pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f', and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a) )
# => The acnes stimulated the mRNA expression of interleukin (IL)-1, interleukin (IL)-8, LL-37, MMP-1, MMP-2, MMP-3, MMP-9, and MMP-13 in keratinocytes.

参见Python demo and a regex demo。

详情

\b - 单词边界
([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+) - 捕获第 1 组：一个或多个 ASCII 字母，然后是零个或多个白色 space、(、一个或多个大写 ASCII 字母，以及一个 ), OR 一个或多个大写 ASCII 字母
(\s*-\d+(?:,\s*-\d+)*) - 捕获第 2 组：零个或多个白色 space，-，一个或多个数字，然后是零个或多个逗号序列，零个或多个白色spaces, - 和一位或多位数字
(?:,\s*and\s+(-\d+))? - 一个可选的非捕获组：一个逗号，零个或多个 whitespaces，and，一个或多个 whitespaces，然后a 捕获组 3：-，一个或多个数字。

第 1 组值被添加到用作替换参数的 lambda 中所有第 2 组逗号分隔的数字之前。

如果第 3 组匹配，则附加 and+space+串联的第 1 组和第 3 组值。

使用 python 正则表达式模块将值替换为之前出现的首字母缩略词

Replace the value with the previous occurrence of acronym using python regular expression module

python

regex

string

replace

python-re