从稀疏数据创建 python 列表理解
Create python list comprehension from sparse data
我有两个列表(键和值),它们定义了一个稀疏定义的列表,每个键位置都有值。我想将其转换为密集列表,每个位置都有值。我可以使用 for 循环执行此操作,如下所示。
keys = [1,3,5]
values = [1.0, 3.0, 5.0]
column = [None]*10
for i, k in enumerate(keys):
column[k] = values[i]
column
>>> [None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
是否可以使用列表理解创建列?
简单地说,你可以使用这一行list comprehension
:
column =[None if i not in dict(zip(keys,values)).keys() else dict(zip(keys,values))[i] for i in range(10)]
输出:
[None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
keys = [1,3,5]
values = [1.0, 3.0, 5.0]
column = [values[keys.index(i)] if i in keys else None for i in range(10)]
print(column)
输出
[None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
我接受了上面 zimdero 的回答,因为它肯定回答了这个问题。但是,存在关于大型列表的时间问题。我使用下面的代码为每个案例计时,列表大小为 10K,密度为 .4(60% 的值是 None)。
对于这种特殊情况,for 循环比列表理解快 2 个数量级,比使用 zip 的列表理解快 3 个数量级以上。查看下面的结果。
import time
import random
# setup test case
iterations = 10
sz = 10000
cutoff = sz*.60
a = random.sample(range(1,sz+1), sz)
dense = [x if x > cutoff else None for x in a]
keys = [i for i, n in enumerate(dense) if n]
values = [x for x in dense if x]
# case 1 for loop
start_time = time.time()
for i in range(iterations):
column = [None]*sz
for i, k in enumerate(keys):
column[k] = values[i]
end_time = time.time()
print("For loop time :", end_time - start_time)
# case 2 list comprehension
start_time = time.time()
for i in range(iterations):
column = [values[keys.index(i)] if i in keys else None for i in range(sz)]
end_time = time.time()
print("List comprehension time 1:", end_time - start_time)
# case 2 list comprehension
start_time = time.time()
for i in range(iterations):
column =[None if i not in dict(zip(keys,values)).keys() else
dict(zip(keys,values))[i] for i in range(sz)]
end_time = time.time()
print("List comprehension time 2:", end_time - start_time)
For loop time : 0.00599980354309082
List comprehension time 1: 6.379000186920166
List comprehension time 2: 36.09299993515015
我有两个列表(键和值),它们定义了一个稀疏定义的列表,每个键位置都有值。我想将其转换为密集列表,每个位置都有值。我可以使用 for 循环执行此操作,如下所示。
keys = [1,3,5]
values = [1.0, 3.0, 5.0]
column = [None]*10
for i, k in enumerate(keys):
column[k] = values[i]
column
>>> [None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
是否可以使用列表理解创建列?
简单地说,你可以使用这一行list comprehension
:
column =[None if i not in dict(zip(keys,values)).keys() else dict(zip(keys,values))[i] for i in range(10)]
输出:
[None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
keys = [1,3,5]
values = [1.0, 3.0, 5.0]
column = [values[keys.index(i)] if i in keys else None for i in range(10)]
print(column)
输出
[None, 1.0, None, 3.0, None, 5.0, None, None, None, None]
我接受了上面 zimdero 的回答,因为它肯定回答了这个问题。但是,存在关于大型列表的时间问题。我使用下面的代码为每个案例计时,列表大小为 10K,密度为 .4(60% 的值是 None)。
对于这种特殊情况,for 循环比列表理解快 2 个数量级,比使用 zip 的列表理解快 3 个数量级以上。查看下面的结果。
import time
import random
# setup test case
iterations = 10
sz = 10000
cutoff = sz*.60
a = random.sample(range(1,sz+1), sz)
dense = [x if x > cutoff else None for x in a]
keys = [i for i, n in enumerate(dense) if n]
values = [x for x in dense if x]
# case 1 for loop
start_time = time.time()
for i in range(iterations):
column = [None]*sz
for i, k in enumerate(keys):
column[k] = values[i]
end_time = time.time()
print("For loop time :", end_time - start_time)
# case 2 list comprehension
start_time = time.time()
for i in range(iterations):
column = [values[keys.index(i)] if i in keys else None for i in range(sz)]
end_time = time.time()
print("List comprehension time 1:", end_time - start_time)
# case 2 list comprehension
start_time = time.time()
for i in range(iterations):
column =[None if i not in dict(zip(keys,values)).keys() else
dict(zip(keys,values))[i] for i in range(sz)]
end_time = time.time()
print("List comprehension time 2:", end_time - start_time)
For loop time : 0.00599980354309082
List comprehension time 1: 6.379000186920166
List comprehension time 2: 36.09299993515015