打印 Python 列表中的最佳匹配,其中每个元素在内部分开
Print the best match in the Python list, where each element is separated internally
我根据文件中的元素创建了一个 Python 列表,即,当 row[0]
的元素出现在 row[3]
中时,将这两行附加到列表 'matches'
反之亦然,当 row[3]
的元素在 row[0]
中时,将它们附加到 'matches'
。列表如下所示
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
我只想打印每个元素的第一个输出或完美匹配,而不管下面的情况:
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']
如果您注意到这里,列表的每个元素都由 ";"
分隔。我试图以此为标准并进行比较。我只想要基于 ";"
之后 word/words 的每个元素的第一次出现,或者当双方的单词相同时。例如,对于外周血单核细胞,它选择了第一个,而对于白种人,它选择了第二个,因为它是完美的匹配。在投票之前,我真的很感激任何帮助。
您需要跟踪所有看到的完整字符串和拆分的子字符串,并且只将我们没有看到的添加到 res:
l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
a,b = ele.split(";",1)
# make sure we don't have not seen the full string nor left/right hand substring
# or we find exact matches both sides and we don't already have that perfect match added
if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
res.append(ele)
# keep track of all full strings and left/right substrings
seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']
我根据文件中的元素创建了一个 Python 列表,即,当 row[0]
的元素出现在 row[3]
中时,将这两行附加到列表 'matches'
反之亦然,当 row[3]
的元素在 row[0]
中时,将它们附加到 'matches'
。列表如下所示
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
我只想打印每个元素的第一个输出或完美匹配,而不管下面的情况:
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']
如果您注意到这里,列表的每个元素都由 ";"
分隔。我试图以此为标准并进行比较。我只想要基于 ";"
之后 word/words 的每个元素的第一次出现,或者当双方的单词相同时。例如,对于外周血单核细胞,它选择了第一个,而对于白种人,它选择了第二个,因为它是完美的匹配。在投票之前,我真的很感激任何帮助。
您需要跟踪所有看到的完整字符串和拆分的子字符串,并且只将我们没有看到的添加到 res:
l=['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Blood;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Black;black', 'Asian;asian', 'Asian;asian', 'Asian;caucasian', 'caucasian;caucasian', 'caucasian;caucasian', 'Seizures;seizures', 'Seizure;seizures', 'Seizures;seizures', 'Seizures;seizures', 'Abscess;abscess']
seen = set()
res = []
for ele in l:
a,b = ele.split(";",1)
# make sure we don't have not seen the full string nor left/right hand substring
# or we find exact matches both sides and we don't already have that perfect match added
if ele.lower() not in seen and not any(x.lower() in seen for x in (a,b)) or a == b and ele not in seen:
res.append(ele)
# keep track of all full strings and left/right substrings
seen.update([a.lower(),b.lower(),ele.lower()])
print(res)
['Peripheral Blood Mononuclear Cells;peripheral blood mononuclear cells', 'Hispanic or Latino;hispanic', 'Black;black', 'Asian;asian', 'caucasian;caucasian', 'Seizures;seizures', 'Abscess;abscess']