np.where() 是否有更快的替代方案？

Question

我有一组 100 个数据文件，其中包含有关粒子的信息（ID、速度、位置等）。我需要从每个粒子中挑选出 10000 个具有特定 ID 号的特定粒子。我的做法如下

for i in range(n_files+1):
    data= load_data_file(i, datatype="double_precision")
    for j in chosen_id_arr:
        my_index= np.where((particleID_in_data)==j)
        identity.append(ID[my_index])
        x.append(x_component[my_index])
        y.append(y_component[my_index])
        z.append(z_component[my_index])

列表"chosen_id_array" 包含所有此类 ID。数据文件是根据列表索引构建的。

出于某种原因，此代码段运行速度非常慢，我一直在寻找一种更快、更高效的替代方法。非常感谢你提前。 :)

Answer 1

使用字典，您可以存储归因于粒子 ID 的位置信息，对字典使用 O(1) 查找缩放：

# What the data in a single file would look like:
data = {1:[0.5,0.1,1.], 4:[0.4,-0.2,0.1], ...}
# A lookup becomes very simple syntactically:
for ID in chosen_id_arr:
    x, y, z = data[ID]
    # Here you can process the obtained x,y,z.

这比 numpy 查找快得多。关于循环内位置数据的处理，您可以考虑为不同的粒子 ID 设置单独的位置列表，但我认为这不在问题的范围内。 pandas 包也可以在那里提供帮助。

np.where() 是否有更快的替代方案？

Is there a faster alternative to np.where()?

simulation

numpy

particles

python-3.x