正则表达式字符串模式插入操作

Regex string pattern insert operation

输入字符串:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good.

输出字符串:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA, and -3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.

预期输出为:

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.

我没有在我的输出字符串中得到 interskin (IS)-3 部分。请查看我的代码并提出解决方案。

import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
print(string_a)
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print('\n')
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))

使用您的模式和代码,您可以在第二个交替组的末尾添加匹配的可选大写字符 [A-Z]*

\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?
                                                           ^^^^^^

Regex demo

示例

import re
string_a = "The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, -8, -97, -2ALPHA, and -3 was specific antagonist. The LL-37, -15, SAC-1, -7, and -21 in keratinocytes was good."
pattern = re.compile(r"\b([A-Za-z]+\s*\([A-Z]+\)|[A-Z]+)(\s*-\d+[A-Z]+(?:,*\s*-\d+[A-Z]*)*|\s*-\d+(?:,*\s*-\d+)*)(?:,*\s*and\s+(-\d+))?")
print(pattern.sub(lambda x: x.group(1) + f', {x.group(1)}'.join(map(str.strip, x.group(2).strip().split(','))) + (f' and {x.group(1)}{x.group(3)}' if x.group(3) else ''), string_a))

输出

The Proteinase-Activated Receptor-2 interskin (IS)-1ALPHA, interskin (IS)-8, interskin (IS)-97, interskin (IS)-2ALPHA and interskin (IS)-3 was specific antagonist. The LL-37, LL-15, SAC-1, SAC-7 and SAC-21 in keratinocytes was good.