使用函数将列表转换为 {name: [other names]} 形式的字典

Converting a list into a dictionary of the form {name: [other names]} using a function

我目前有一个列表形式的字幕列表

print(valid_captions)

-> [' Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and Ron Iervolino ', ' Chuck Grodin ', ' Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth ', ' Kelly Murro and Tom Murro ', ' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ']

我想创建一个函数来遍历列表的每个元素并为每个人创建一个邻接列表,我可以在其中获取数据集中列表中出现的所有人员的唯一名称列表。我想将这个邻接列表表示为一个 python 字典,每个名称作为键,它们出现的名称列表作为值。

因此该函数将采用单个标题和 return 形式为
的字典 name: [other names in caption]} 对于每个名称,同时删除 DrMayor.

等任何标题

作为例子,我想要这个

[Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton]

到return

{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'],
 'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'],
 'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}

f 某人自己出现在标题中,return {name: []}。所以标题 'Robb Stark' 会 return {'Robb Stark': []}

我有删除标题的功能,但我得到的邻接表全错了。

def remove_title(names):
    removed_list = []
    for name in names:
        altered_name = re.split('Dr |Mayor ', name)
        removed_list+=altered_name
    try:
        while True:
            removed_list.remove('')
    except:
        pass
    return removed_list

我不清楚输入列表的格式是什么,以及除 and ,, and 以外的其他情况会出现拆分表示子列表的子字符串的情况。

首先您需要创建一个干净的嵌套列表(列表列表),然后您可以使用它来创建每个标题中所有组合的字典列表:

import re


valid_captions = [' Les Lieberman, Barri Lieberman, Isabel Kallman, Trish Iervolino, and  Ron Iervolino ',
                  ' Chuck Grodin ',
                  ' Diana Rosario, Ali Sussman, Sarah Boll, Jen Zaleski, Alysse Brennan, and Lindsay Macbeth ',
                  ' Kelly Murro and Tom Murro ', 
                  ' Ron Iervolino, Trish Iervolino, Russ Middleton, and Lisa Middleton ']

# create cleaned list of lists
cleaned_captions = []
for cap in valid_captions:
    names = re.split(", and|,|and", cap)  # you can add more patterns to split here separated by | -> make sure that if they include each other the longer one comes first
    names = [n.strip() for n in names]
    names = [re.sub("Dr |Mayor ", "", n) for n in names]  # you can add more prefixes to remove here separated by |
    cleaned_captions.append(names)

# create list of dictionaries
other_names_dict_list = [{name: [oth for oth in lst if oth != name] for name in lst} for lst in cleaned_captions]

哪个会给你:

print(cleaned_captions)
> [['Les Lieberman', 'Barri Lieberman', 'Isabel Kallman', 'Trish Iervolino', 'Ron Iervolino'], 
  ['Chuck Grodin'], 
  ['Diana Rosario', 'Ali Sussman', 'Sarah Boll', 'Jen Zaleski', 'Alysse Brennan', 'Lindsay Macbeth'], 
  ['Kelly Murro', 'Tom Murro'], 
  ['Ron Iervolino', 'Trish Iervolino', 'Russ Middleton', 'Lisa Middleton']]

print(other_names_dict_list)
> [{'Les Lieberman': ['Barri Lieberman', 'Isabel Kallman', 'Trish Iervolino', 'Ron Iervolino'], 'Barri Lieberman': ['Les Lieberman', 'Isabel Kallman', 'Trish Iervolino', 'Ron Iervolino'], 'Isabel Kallman': ['Les Lieberman', 'Barri Lieberman', 'Trish Iervolino', 'Ron Iervolino'], 'Trish Iervolino': ['Les Lieberman', 'Barri Lieberman', 'Isabel Kallman', 'Ron Iervolino'], 'Ron Iervolino': ['Les Lieberman', 'Barri Lieberman', 'Isabel Kallman', 'Trish Iervolino']}, 
  {'Chuck Grodin': []}, 
  {'Diana Rosario': ['Ali Sussman', 'Sarah Boll', 'Jen Zaleski', 'Alysse Brennan', 'Lindsay Macbeth'], 'Ali Sussman': ['Diana Rosario', 'Sarah Boll', 'Jen Zaleski', 'Alysse Brennan', 'Lindsay Macbeth'], 'Sarah Boll': ['Diana Rosario', 'Ali Sussman', 'Jen Zaleski', 'Alysse Brennan', 'Lindsay Macbeth'], 'Jen Zaleski': ['Diana Rosario', 'Ali Sussman', 'Sarah Boll', 'Alysse Brennan', 'Lindsay Macbeth'], 'Alysse Brennan': ['Diana Rosario', 'Ali Sussman', 'Sarah Boll', 'Jen Zaleski', 'Lindsay Macbeth'], 'Lindsay Macbeth': ['Diana Rosario', 'Ali Sussman', 'Sarah Boll', 'Jen Zaleski', 'Alysse Brennan']}, 
  {'Kelly Murro': ['Tom Murro'], 'Tom Murro': ['Kelly Murro']}, 
  {'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton', 'Lisa Middleton'], 'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton', 'Lisa Middleton'], 'Russ Middleton': ['Ron Iervolino', 'Trish Iervolino', 'Lisa Middleton'], 'Lisa Middleton': ['Ron Iervolino', 'Trish Iervolino', 'Russ Middleton']}]

以下是我对问题的解决方案,我创建了一个函数,该函数采用标题和 returns 形式为 {name: [other names in caption]} 的字典用于每个名称。

在函数中,我在一开始就使用字符串操作函数清理了字幕,删除了 'Mayor'、'Dr' 等标题,同时还从字幕中删除了 'and' .然后我还使用 strip() 删除任何前导或尾随空格。我将 try 和 except 用于任何异常处理,同时删除预期列表的各个元素,然后在过程的其余部分使用 for 循环。

def format_caption(caption):
    name_list = re.split('Dr |Mayor |and |, ', caption)
    name_list = [name.strip() for name in name_list]
    name_dict = {}
    try:
        while True:
            name_list.remove('')
    except:
        pass
    for name in name_list:
        name_dict.update({name:[]})
    for key, name_list_2 in name_dict.items():
        for name in name_list:
            if name != key:
                name_list_2.append(name)
    return name_dict

生成的函数以我正在寻找的格式为我提供了字幕

list=['Dr .Ron Iervolino, Trish Iervolino, and Mayor.Russ Middleton']
print(format_caption(list))

>{'Ron Iervolino': ['Trish Iervolino', 'Russ Middleton'],
'Trish Iervolino': ['Ron Iervolino', 'Russ Middleton'],
'Russ Middleton': ['Ron Iervolino', 'Russ Middleton']}