python 将每个单词映射到它自己的文本

python map each word to its own text

我有一个这样的单词列表:

 word_list=[{"word": "python",
    "repeted": 4},
    {"word": "awsome",
    "repeted": 3},
    {"word": "frameworks",
    "repeted": 2},
    {"word": "programing",
    "repeted": 2},
    {"word": "Whosebug",
    "repeted": 2},
    {"word": "work",
    "repeted": 1},
    {"word": "error",
    "repeted": 1},
    {"word": "teach",
    "repeted": 1}
    ]

,来自另一个笔记列表:

note_list = [{"note_id":1,
"note_txt":"A curated list of awesome Python frameworks"},
{"note_id":2,
"note_txt":"what is awesome Python frameworks"},
{"note_id":3,
"note_txt":"awesome Python is good to wok with it"},
{"note_id":4,
"note_txt":"use Whosebug to lern programing with python is awsome"},
{"note_id":5,
"note_txt":"error in programing is good to learn"},
{"note_id":6,
"note_txt":"Whosebug is very useful to share our knoloedge"},
{"note_id":7,
"note_txt":"teach, work"},
  ]

我想知道如何将每个单词映射到它自己的音符:

maped_list=[{"word": "python",
        "notes_ids": [1,2,3,4]},
        {"word": "awsome",
        "notes_ids": [1,2,3]},
        {"word": "frameworks",
        "notes_ids": [1,2]},
        {"word": "programing",
        "notes_ids": [4,5]},
        {"word": "Whosebug",
        "notes_ids": [4,6]},
        {"word": "work",
        "notes_ids": [7]},
        {"word": "error",
        "notes_ids": [5]},
        {"word": "teach",
        "notes_ids": [7]}
        ]

我的作品:

# i started by appending all the notes text into one list
notes_test = []
for note in note_list:
notes_test.append(note['note_txt'])
# calculate the reptition of each word
dict = {}
for sentence in notes_test:
    for word in re.split('\s', sentence): # split with whitespace
        try:
            dict[word] += 1
        except KeyError:
            dict[word] = 1
word_list= []
for key in dict.keys():
    word = {}
    word['word'] = key
    word['repeted'] = dict[key]
    word_list.append(word)

我的问题:

  1. 如何映射单词列表和注释列表以获得映射列表
  2. 你怎么看我代码的质量,有什么意见

您可以使用列表理解:

mapped_list = [{"word": w_dict["word"],
                "notes_ids": [n_dict["note_id"] for n_dict in note_list
                              if w_dict["word"].lower() in n_dict["note_txt"].lower()]
                } for w_dict in word_list]

结果将是:

[{'word': 'python', 'notes_ids': [1, 2, 3, 4]},
 {'word': 'awsome', 'notes_ids': [4]},
 {'word': 'frameworks', 'notes_ids': [1, 2]},
 {'word': 'programing', 'notes_ids': [4, 5]},
 {'word': 'Whosebug', 'notes_ids': [4, 6]},
 {'word': 'work', 'notes_ids': [1, 2, 7]},
 {'word': 'error', 'notes_ids': [5]},
 {'word': 'teach', 'notes_ids': [7]}]
  1. 尝试在创建字典的同时创建maped_list,在迭代时添加单词的索引。
  2. 不要使用dict作为变量,它是python创建dicts的保留名称,如dict(),如果你使用它,它将被覆盖。此外,您的输入不包含 space 以外的任何其他白色 space,您可以使用 sentence.split()。您可以做的另一件事是将所有单词转换为小写,因此无论是否写成大写它们都没有区别。