获取缩写和定义的程序 - 获取所有小写缩写有困难
Program to grab abbreviations and definitions - trouble getting all lower case abbreviations
我有一个程序可以抓取缩写(即查找括号中的单词),然后根据缩写中的字符数,返回那么多单词并对其进行定义。到目前为止,它适用于前面的单词以大写字母开头或大多数前面的单词以大写字母开头的定义。对于后者,它会跳过 "in" 等小写字母并转到下一个。但是,我的问题是对应单词的个数都是小写的。
当前输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
Trials (IMMPACT). Some patient prefer the usual care (UC)
期望的输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
usual care (UC)
import re
s = """Too many people, but not All Awesome Dudes (AAD) only care about the
Initiative on Methods, Measurement, and Pain Assessment in Clinical
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of
doing nothing"""
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=0
for k,i in enumerate(words[::-1]):
if i[0].isupper():count+=1
if count==size:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
该标志用于 All are Awesome Dudes (AAD)
等罕见情况
import re
s = """Too many people, but not All Awesome Dudes (AAD) only care about the
Initiative on Methods, Measurement, and Pain Assessment in Clinical
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of
doing nothing
"""
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=size-1
flag=words[-1][0].isupper()
for k,i in enumerate(words[::-1]):
first_letter=i[0] if flag else i[0].upper()
if first_letter==abbr[count]:count-=1
if count==-1:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
我有一个程序可以抓取缩写(即查找括号中的单词),然后根据缩写中的字符数,返回那么多单词并对其进行定义。到目前为止,它适用于前面的单词以大写字母开头或大多数前面的单词以大写字母开头的定义。对于后者,它会跳过 "in" 等小写字母并转到下一个。但是,我的问题是对应单词的个数都是小写的。
当前输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
Trials (IMMPACT). Some patient prefer the usual care (UC)
期望的输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
usual care (UC)
import re
s = """Too many people, but not All Awesome Dudes (AAD) only care about the
Initiative on Methods, Measurement, and Pain Assessment in Clinical
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of
doing nothing"""
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=0
for k,i in enumerate(words[::-1]):
if i[0].isupper():count+=1
if count==size:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
该标志用于 All are Awesome Dudes (AAD)
import re
s = """Too many people, but not All Awesome Dudes (AAD) only care about the
Initiative on Methods, Measurement, and Pain Assessment in Clinical
Trials (IMMPACT). Some patient perfer the usual care (UC) approach of
doing nothing
"""
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=size-1
flag=words[-1][0].isupper()
for k,i in enumerate(words[::-1]):
first_letter=i[0] if flag else i[0].upper()
if first_letter==abbr[count]:count-=1
if count==-1:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)