根据字母数检索带括号的缩写的定义
Retrieve definition for parenthesized abbreviation, based on letter count
我需要根据括号中字母的数量检索首字母缩略词的定义。对于我正在处理的数据,括号中的字母数对应于要检索的单词数。我知道这不是获取缩写的可靠方法,但在我的情况下它会。例如:
字符串 = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
期望输出:家族健康史 (FHH)、执业护士 (NP)
我知道如何从字符串中提取括号,但之后我就卡住了。任何帮助表示赞赏。
import re
a = 'Although family health history (FHH) is commonly accepted as an
important risk factor for common, chronic diseases, it is rarely considered
by a nurse practitioner (NP).'
x2 = re.findall('(\(.*?\))', a)
for x in x2:
length = len(x)
print(x, length)
使用正则表达式匹配找到匹配开始的位置。然后使用 python 字符串索引获取匹配开始前的子字符串。按单词拆分子串,得到最后 n 个单词。其中 n 是缩写的长度。
import re
s = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()[-size:]
definition = " ".join(words)
print(abbr, definition)
这会打印:
FHH family health history
NP nurse practitioner
这能解决您的问题吗?
a = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
splitstr=a.replace('.','').split(' ')
output=''
for i,word in enumerate(splitstr):
if '(' in word:
w=word.replace('(','').replace(')','').replace('.','')
for n in range(len(w)+1):
output=splitstr[i-n]+' '+output
print(output)
实际上,Keatinge 比我先一步
将 re
与 list-comprehension
结合使用
x_lst = [ str(len(i[1:-1])) for i in re.findall('(\(.*?\))', a) ]
[re.search( r'(\S+\s+){' + i + '}\(.{' + i + '}\)', a).group(0) for i in x_lst]
#['family health history (FHH)', 'nurse practitioner (NP)']
这个解决方案不是特别聪明,它只是简单地搜索首字母缩写词,然后建立一个模式来提取每个首字母缩写词前面的单词:
import re
string = "Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP)."
definitions = []
for acronym in re.findall(r'\(([A-Z]+?)\)', string):
length = len(acronym)
match = re.search(r'(?:\w+\W+){' + str(length) + r'}\(' + acronym + r'\)', string)
definitions.append(match.group(0))
print(", ".join(definitions))
输出
> python3 test.py
family health history (FHH), nurse practitioner (NP)
>
一个想法,要用一个recursive pattern with PyPI regex module。
\b[A-Za-z]+\s+(?R)?\(?[A-Z](?=[A-Z]*\))\)?
See this pcre demo at regex101
\b[A-Za-z]+\s+
匹配一个 word boundary, one or more alpha,一个或多个 white space
(?R)?
递归部分:optionally 从头开始粘贴模式
\(?
需要使括号可选以便递归适合 \)?
[A-Z](?=[A-Z]*\)
匹配一个上部字母 if followed by 关闭 )
与任何 A-Z 之间
- 不检查第一个单词字母是否与缩写中位置的字母实际匹配。
- 不检查缩写前的左括号。要检查,请添加一个可变长度的回顾。将
[A-Z](?=[A-Z]*\))
更改为 (?<=\([A-Z]*)[A-Z](?=[A-Z]*\))
.
我需要根据括号中字母的数量检索首字母缩略词的定义。对于我正在处理的数据,括号中的字母数对应于要检索的单词数。我知道这不是获取缩写的可靠方法,但在我的情况下它会。例如:
字符串 = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
期望输出:家族健康史 (FHH)、执业护士 (NP)
我知道如何从字符串中提取括号,但之后我就卡住了。任何帮助表示赞赏。
import re
a = 'Although family health history (FHH) is commonly accepted as an
important risk factor for common, chronic diseases, it is rarely considered
by a nurse practitioner (NP).'
x2 = re.findall('(\(.*?\))', a)
for x in x2:
length = len(x)
print(x, length)
使用正则表达式匹配找到匹配开始的位置。然后使用 python 字符串索引获取匹配开始前的子字符串。按单词拆分子串,得到最后 n 个单词。其中 n 是缩写的长度。
import re
s = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()[-size:]
definition = " ".join(words)
print(abbr, definition)
这会打印:
FHH family health history
NP nurse practitioner
这能解决您的问题吗?
a = 'Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP).'
splitstr=a.replace('.','').split(' ')
output=''
for i,word in enumerate(splitstr):
if '(' in word:
w=word.replace('(','').replace(')','').replace('.','')
for n in range(len(w)+1):
output=splitstr[i-n]+' '+output
print(output)
实际上,Keatinge 比我先一步
将 re
与 list-comprehension
x_lst = [ str(len(i[1:-1])) for i in re.findall('(\(.*?\))', a) ]
[re.search( r'(\S+\s+){' + i + '}\(.{' + i + '}\)', a).group(0) for i in x_lst]
#['family health history (FHH)', 'nurse practitioner (NP)']
这个解决方案不是特别聪明,它只是简单地搜索首字母缩写词,然后建立一个模式来提取每个首字母缩写词前面的单词:
import re
string = "Although family health history (FHH) is commonly accepted as an important risk factor for common, chronic diseases, it is rarely considered by a nurse practitioner (NP)."
definitions = []
for acronym in re.findall(r'\(([A-Z]+?)\)', string):
length = len(acronym)
match = re.search(r'(?:\w+\W+){' + str(length) + r'}\(' + acronym + r'\)', string)
definitions.append(match.group(0))
print(", ".join(definitions))
输出
> python3 test.py
family health history (FHH), nurse practitioner (NP)
>
一个想法,要用一个recursive pattern with PyPI regex module。
\b[A-Za-z]+\s+(?R)?\(?[A-Z](?=[A-Z]*\))\)?
See this pcre demo at regex101
\b[A-Za-z]+\s+
匹配一个 word boundary, one or more alpha,一个或多个 white space(?R)?
递归部分:optionally 从头开始粘贴模式\(?
需要使括号可选以便递归适合\)?
[A-Z](?=[A-Z]*\)
匹配一个上部字母 if followed by 关闭)
与任何 A-Z 之间
- 不检查第一个单词字母是否与缩写中位置的字母实际匹配。
- 不检查缩写前的左括号。要检查,请添加一个可变长度的回顾。将
[A-Z](?=[A-Z]*\))
更改为(?<=\([A-Z]*)[A-Z](?=[A-Z]*\))
.