使用 python 给定文档标记列表的倒排索引?
Inverted index given a list of document tokens using python?
我是 python 的新手。给定文档标记列表,我需要创建一个倒排索引函数。索引将每个唯一单词映射到文档 ID 列表,按递增顺序排序。
我的代码:
def create_index(tokens):
inverted_index = {}
wordCount = {}
for k, v in tokens.items():
for word in v.lower().split():
wordCount[word] = wordCount.get(word,0)+1
if inverted_index.get(word,False):
if k not in inverted_index[word]:
inverted_index[word].append(k)
else:
inverted_index[word] = [k]
return inverted_index, wordCount
注意:当输入参数的形式为 {1:"Madam I am Adam",2: "I have never been afraid of him"}
时,这可以正常工作
我为上面的例子得到的输出:
{'madam': [1], 'afraid': [2], 'i': [1, 2], 'of': [2], 'never': [2], 'am': [1], 'been': [2], 'adam': [1], 'have': [2], 'him': [2]}
根据我的代码K,v对应列表的Key和value
当我们调用带参数的 create_index 函数时的期望输出:
index = create_index([['a', 'b'], ['a', 'c']])
>>> sorted(index.keys())
['a', 'b', 'c']
>>> index['a']
[0, 1]
index['b']
[0]
index['c']
[1]
是这样的吗?
>>> from collections import defaultdict
>>> def create_index (data):
index = defaultdict(list)
for i, tokens in enumerate(data):
for token in tokens:
index[token].append(i)
return index
>>> create_index([['a', 'b'], ['a', 'c']])
defaultdict(<class 'list'>, {'b': [0], 'a': [0, 1], 'c': [1]})
>>> index = create_index([['a', 'b'], ['a', 'c']])
>>> index.keys()
dict_keys(['b', 'a', 'c'])
>>> index['a']
[0, 1]
>>> index['b']
[0]
我是 python 的新手。给定文档标记列表,我需要创建一个倒排索引函数。索引将每个唯一单词映射到文档 ID 列表,按递增顺序排序。
我的代码:
def create_index(tokens):
inverted_index = {}
wordCount = {}
for k, v in tokens.items():
for word in v.lower().split():
wordCount[word] = wordCount.get(word,0)+1
if inverted_index.get(word,False):
if k not in inverted_index[word]:
inverted_index[word].append(k)
else:
inverted_index[word] = [k]
return inverted_index, wordCount
注意:当输入参数的形式为 {1:"Madam I am Adam",2: "I have never been afraid of him"}
我为上面的例子得到的输出:
{'madam': [1], 'afraid': [2], 'i': [1, 2], 'of': [2], 'never': [2], 'am': [1], 'been': [2], 'adam': [1], 'have': [2], 'him': [2]}
根据我的代码K,v对应列表的Key和value
当我们调用带参数的 create_index 函数时的期望输出:
index = create_index([['a', 'b'], ['a', 'c']])
>>> sorted(index.keys())
['a', 'b', 'c']
>>> index['a']
[0, 1]
index['b']
[0]
index['c']
[1]
是这样的吗?
>>> from collections import defaultdict
>>> def create_index (data):
index = defaultdict(list)
for i, tokens in enumerate(data):
for token in tokens:
index[token].append(i)
return index
>>> create_index([['a', 'b'], ['a', 'c']])
defaultdict(<class 'list'>, {'b': [0], 'a': [0, 1], 'c': [1]})
>>> index = create_index([['a', 'b'], ['a', 'c']])
>>> index.keys()
dict_keys(['b', 'a', 'c'])
>>> index['a']
[0, 1]
>>> index['b']
[0]