在以 id 为键的 dict 中将多个标签作为列表或元组读取,即 {id:(cat1,cat2,.....)}
reading multiple labels as a list or tuples in a dict with id as the key i.e {id:(cat1,cat2,.....)}
我正在为多标签文本分类算法建模。下面是我的 labels.txt 文件的一个片段,我想将这些记录转换成一个字典,该字典由 id 和元组或列表中的相应类别组成,即 {id:(cat1,cat2)}。记录不是换行分隔的。我坚持如何将这种数据转换成字典。
B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B
如果类别名称总是缩进空格而 ID 不是,您可以使用它来区分它们并在循环中将类别名称附加到由 ID 索引的字典中的列表:
r = '''B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B'''
d = {}
for l in r.splitlines():
if l.startswith(' '):
d.setdefault(i, []).append(l.lstrip())
else:
i = l
print(d)
这输出:
{'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}
我正在为多标签文本分类算法建模。下面是我的 labels.txt 文件的一个片段,我想将这些记录转换成一个字典,该字典由 id 和元组或列表中的相应类别组成,即 {id:(cat1,cat2)}。记录不是换行分隔的。我坚持如何将这种数据转换成字典。
B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B
如果类别名称总是缩进空格而 ID 不是,您可以使用它来区分它们并在循环中将类别名称附加到由 ID 索引的字典中的列表:
r = '''B0027DQHA0
Movies & TV, TV
Music, Classical
0756400120
Books, Literature & Fiction, Anthologies & Literary Collections, General
Books, Literature & Fiction, United States
Books, Science Fiction & Fantasy, Science Fiction, Anthologies
Books, Science Fiction & Fantasy, Science Fiction, Short Stories
B0000012D5
Music, Blues
Music, Pop
Music, R&B'''
d = {}
for l in r.splitlines():
if l.startswith(' '):
d.setdefault(i, []).append(l.lstrip())
else:
i = l
print(d)
这输出:
{'B0027DQHA0': ['Movies & TV, TV', 'Music, Classical'], '0756400120': ['Books, Literature & Fiction, Anthologies & Literary Collections, General', 'Books, Literature & Fiction, United States', 'Books, Science Fiction & Fantasy, Science Fiction, Anthologies', 'Books, Science Fiction & Fantasy, Science Fiction, Short Stories'], 'B0000012D5': ['Music, Blues', 'Music, Pop', 'Music, R&B']}