根据字母数检索带括号的缩写的定义

Question

我需要根据括号中字母的数量检索首字母缩略词的定义。对于我正在处理的数据，括号中的字母数对应于要检索的单词数。我知道这不是获取缩写的可靠方法，但在我的情况下它会。例如：

字符串 = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'

期望输出：家族健康史 (FHH)、执业护士 (NP)

我知道如何从字符串中提取括号，但之后我就卡住了。任何帮助表示赞赏。

 import re

 a = 'Although family health history (FHH) is commonly accepted as an 
 important risk factor for common, chronic diseases, it is rarely considered 
 by a nurse practitioner (NP).'

 x2 = re.findall('(\(.*?\))', a)

 for x in x2:
    length = len(x)
    print(x, length)

Answer 1

使用正则表达式匹配找到匹配开始的位置。然后使用 python 字符串索引获取匹配开始前的子字符串。按单词拆分子串，得到最后 n 个单词。其中 n 是缩写的长度。

import re
s = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'


for match in re.finditer(r"\((.*?)\)", s):
    start_index = match.start()
    abbr = match.group(1)
    size = len(abbr)
    words = s[:start_index].split()[-size:]
    definition = " ".join(words)

    print(abbr, definition)

这会打印：

FHH family health history
NP nurse practitioner

Answer 2

这能解决您的问题吗？

a = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
splitstr=a.replace('.','').split(' ')
output=''
for i,word in enumerate(splitstr):
    if '(' in word:
        w=word.replace('(','').replace(')','').replace('.','')
        for n in range(len(w)+1):
            output=splitstr[i-n]+' '+output

print(output)

实际上，Keatinge 比我先一步

Answer 3

将 re 与 list-comprehension

结合使用

x_lst = [ str(len(i[1:-1])) for i in re.findall('(\(.*?\))', a) ]

[re.search( r'(\S+\s+){' + i + '}\(.{' + i + '}\)', a).group(0) for i in x_lst]
#['family health history (FHH)', 'nurse practitioner (NP)']

Answer 4

这个解决方案不是特别聪明，它只是简单地搜索首字母缩写词，然后建立一个模式来提取每个首字母缩写词前面的单词：

import re

string = "Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP)."

definitions = []

for acronym in re.findall(r'\(([A-Z]+?)\)', string):
    length = len(acronym)

    match = re.search(r'(?:\w+\W+){' + str(length) + r'}\(' + acronym + r'\)', string)

    definitions.append(match.group(0))

print(", ".join(definitions))

输出

> python3 test.py
family health history (FHH), nurse practitioner (NP)
>

Answer 5

一个想法，要用一个recursive pattern with PyPI regex module。

\b[A-Za-z]+\s+(?R)?\(?[A-Z](?=[A-Z]*\))\)?

See this pcre demo at regex101

\b[A-Za-z]+\s+ 匹配一个 word boundary, one or more alpha，一个或多个 white space
(?R)? 递归部分：optionally 从头开始粘贴模式
\(? 需要使括号可选以便递归适合 \)?
[A-Z](?=[A-Z]*\) 匹配一个上部字母 if followed by 关闭 ) 与任何 A-Z 之间

不检查第一个单词字母是否与缩写中位置的字母实际匹配。
不检查缩写前的左括号。要检查，请添加一个可变长度的回顾。将 [A-Z](?=[A-Z]*\)) 更改为 (?<=\([A-Z]*)[A-Z](?=[A-Z]*\)).

根据字母数检索带括号的缩写的定义

Retrieve definition for parenthesized abbreviation, based on letter count

python

regex

text

text-parsing

abbreviation