有没有更好的方法来获取具有列表作为值的 python 字典的 (key,itemN) 元组?
Is there a better way to get (key,itemN) tuples for python dictionary that has a list as a value?
我有一个 jsonlines 文件,其中包含以节点作为键的项目,作为值的是它连接到的其他节点的列表。
要将边添加到 networkx 图,-我认为-需要形式为 (u,v) 的元组。
我为此写了一个天真的解决方案,但我觉得对于足够大的 jsonl 文件来说它可能有点慢有没有人有更好、更 pythonic 的解决方案来建议?
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
for node in dol:
#print(node)
tpls = []
key = list(node.keys())[0]
tpls = [(key,v) for v in node[key]]
print(tpls)
<iterate through each one in the list to add them to the graph>
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
[(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
只有一把钥匙
如果字典永远不会超过一项,你可以这样做:
dol = [{0: [1, 2, 3, 4, 5, 6]}, {1: [0, 2, 3, 4, 5, 6]}]
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
多个键
但是如果一个dict可能包含多个值,你可以做一个while循环并测试dict是否不为空:
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
while local_node:
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(2, 0), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
当然,如果您需要存储生成的列表,请将其附加到列表中,而不仅仅是打印它。
只有一本大词典
如果你的 dol 对象可以是一个字典,那就更简单了,如果像 Yves Daoust 所说的那样,你需要一个邻接列表或矩阵,这里有两个例子:
纯邻接表python
一个邻接表:
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
print(adjacency_list)
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
具有pandas
的邻接矩阵
一个adjacency_matrix:
import pandas
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
adjacency_df = pandas.DataFrame(adjacency_list)
adjacency_matrix = pandas.crosstab(adjacency_df[0], adjacency_df[1],
rownames=['keys'], colnames=['values'])
print(adjacency_matrix)
# values 0 1 2 3 4 5 6
# keys
# 0 0 1 1 1 1 1 1
# 1 1 0 1 1 1 1 1
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
def process(item: dict):
for key, values in item.items():
for i in values:
yield (key, i)
results = map(process, dol)
print([list(r) for r in results])
您应该尽可能使用 yield。
当您使用 yield 并获得可以迭代的生成器时,您会发现它的内存效率更高。
生成器的内存效率更高。
您可以使用列表理解:
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
tuples = [ (n1,n2) for d in dol for n1,ns in d.items() for n2 in ns ]
print(tuples)
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2),
(1, 3), (1, 4), (1, 5), (1, 6)]
我有一个 jsonlines 文件,其中包含以节点作为键的项目,作为值的是它连接到的其他节点的列表。 要将边添加到 networkx 图,-我认为-需要形式为 (u,v) 的元组。 我为此写了一个天真的解决方案,但我觉得对于足够大的 jsonl 文件来说它可能有点慢有没有人有更好、更 pythonic 的解决方案来建议?
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
for node in dol:
#print(node)
tpls = []
key = list(node.keys())[0]
tpls = [(key,v) for v in node[key]]
print(tpls)
<iterate through each one in the list to add them to the graph>
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
[(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
只有一把钥匙
如果字典永远不会超过一项,你可以这样做:
dol = [{0: [1, 2, 3, 4, 5, 6]}, {1: [0, 2, 3, 4, 5, 6]}]
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
多个键
但是如果一个dict可能包含多个值,你可以做一个while循环并测试dict是否不为空:
for node in dol:
local_node = node.copy() # only if dict shouldn't be modified in any way
while local_node:
k, values = local_node.popitem()
print([(k, value) for value in values])
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6)]
# [(2, 0), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6)]
# [(1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
当然,如果您需要存储生成的列表,请将其附加到列表中,而不仅仅是打印它。
只有一本大词典
如果你的 dol 对象可以是一个字典,那就更简单了,如果像 Yves Daoust 所说的那样,你需要一个邻接列表或矩阵,这里有两个例子:
纯邻接表python
一个邻接表:
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
print(adjacency_list)
# [(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6)]
具有pandas
的邻接矩阵一个adjacency_matrix:
import pandas
dol = {0: [1, 2, 3, 4, 5, 6],
1: [0, 2, 3, 4, 5, 6]}
adjacency_list = [(key, value) for key, values in dol.items() for value in values]
adjacency_df = pandas.DataFrame(adjacency_list)
adjacency_matrix = pandas.crosstab(adjacency_df[0], adjacency_df[1],
rownames=['keys'], colnames=['values'])
print(adjacency_matrix)
# values 0 1 2 3 4 5 6
# keys
# 0 0 1 1 1 1 1 1
# 1 1 0 1 1 1 1 1
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
def process(item: dict):
for key, values in item.items():
for i in values:
yield (key, i)
results = map(process, dol)
print([list(r) for r in results])
您应该尽可能使用 yield。
当您使用 yield 并获得可以迭代的生成器时,您会发现它的内存效率更高。
生成器的内存效率更高。
您可以使用列表理解:
dol = [{0: [1,2,3,4,5,6]},{1: [0,2,3,4,5,6]}]
tuples = [ (n1,n2) for d in dol for n1,ns in d.items() for n2 in ns ]
print(tuples)
[(0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (1, 0), (1, 2),
(1, 3), (1, 4), (1, 5), (1, 6)]