根据首字母缩写词前的大写字符数获取缩写词
Grab abbreviations based on the number of capitalized characters preceding acronym
我有一个程序可以在段落中查找首字母缩略词,并根据首字母缩略词中的字符数根据前面的单词定义它们。但是,对于包含 "in"
和 "and"
之类的不属于首字母缩略词一部分的首字母缩略词,我的代码存在问题。基本上,如果单词以大写字母开头,我只希望它计算前面的文本。
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()[-size:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + "}"
pattern = '[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
当前输出:
All Awesome Dudes (AAD}
Measurement, and Pain Assessment in Clinical Trials (IMMPACT}
**Desired Output:**
```none
All Awesome Dudes (AAD}
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=0
for k,i in enumerate(words[::-1]):
if i[0].isupper():count+=1
if count==size:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
我对这个问题的看法:
txt = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
import re
from itertools import groupby
t = [list(g) if v else list(g)[::-1] for v, g in groupby(re.findall(r'\b[A-Z].+?\b', txt)[::-1], lambda k: k.upper() == k)]
for a, b in zip(t[::2], t[1::2]):
abbr, meaning = a[0], b[len(b) - len(a[0]):len(b) - len(a[0]) + len(a[0])]
if all(c1 == c2[0] for c1, c2 in zip(abbr, meaning)):
print(' '.join(meaning),'(' + abbr + ')')
打印:
Initiative Methods Measurement Pain Assessment Clinical Trials (IMMPACT)
All Awesome Dudes (AAD)
我有一个程序可以在段落中查找首字母缩略词,并根据首字母缩略词中的字符数根据前面的单词定义它们。但是,对于包含 "in"
和 "and"
之类的不属于首字母缩略词一部分的首字母缩略词,我的代码存在问题。基本上,如果单词以大写字母开头,我只希望它计算前面的文本。
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()[-size:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + "}"
pattern = '[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
当前输出:
All Awesome Dudes (AAD}
Measurement, and Pain Assessment in Clinical Trials (IMMPACT}
**Desired Output:**
```none
All Awesome Dudes (AAD}
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
import re
s = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
allabbre = []
for match in re.finditer(r"\((.*?)\)", s):
start_index = match.start()
abbr = match.group(1)
size = len(abbr)
words = s[:start_index].split()
count=0
for k,i in enumerate(words[::-1]):
if i[0].isupper():count+=1
if count==size:break
words=words[-k-1:]
definition = " ".join(words)
abbr_keywords = definition + " " + "(" + abbr + ")"
pattern='[A-Z]'
if re.search(pattern, abbr):
if abbr_keywords not in allabbre:
allabbre.append(abbr_keywords)
print(abbr_keywords)
输出:
All Awesome Dudes (AAD)
Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)
我对这个问题的看法:
txt = "Too many people, but not All Awesome Dudes (AAD) only care about the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT)."
import re
from itertools import groupby
t = [list(g) if v else list(g)[::-1] for v, g in groupby(re.findall(r'\b[A-Z].+?\b', txt)[::-1], lambda k: k.upper() == k)]
for a, b in zip(t[::2], t[1::2]):
abbr, meaning = a[0], b[len(b) - len(a[0]):len(b) - len(a[0]) + len(a[0])]
if all(c1 == c2[0] for c1, c2 in zip(abbr, meaning)):
print(' '.join(meaning),'(' + abbr + ')')
打印:
Initiative Methods Measurement Pain Assessment Clinical Trials (IMMPACT)
All Awesome Dudes (AAD)