根据第一列与 python 中另一个文本文件的匹配选择第二列数据

Question

我对 numpy 数组和迭代知之甚少。我有两个输入文件。两个文件的第一列代表以毫秒为单位的时间。输入文件 1 是参考值或模拟值。输入文件 2 是从测试值中获得的。我想比较（plot second vs first）input-2 的第二列当且仅当相应文件的第一列中的时间匹配时，才将文件与第一个文件的第二列匹配。我正在通过迭代尝试，但找不到合适的结果 yet.How 在匹配时查找索引？

import numpy as np
my_file=np.genfromtxt('path/input1.txt')
Sim_val=np.genfromtxt('path/input2.txt')
inp1=my_file[:,0]
inp12=my_file[:,1]
inpt2=Sim_val[:,0]
inpt21=Sim_val[:,1]
xarray=np.array(inp1)
yarray=np.array(inp12)
data=np.array([xarray,yarray])
ldata=data.T

zarray=np.array(inpt2)
tarray=np.array(inpt21)
mdata=np.array([zarray,tarray])
kdata=mdata.T

i=np.searchsorted(kdata[:,0],ldata[:,0])
print i

我的 inputfile-2 和 Inputfile-1 是

   0        5               0        5
  100       6               50       6
  200      10               200      15
  300      12               350      12
  400      15 # Obtained    400      15    #Simulated Value
  500      20   #Value      500      25
  600      0                650      0
  700      11               700      11
  800      12               850      8
  900      19               900      19
 1000     10                1000     3

使用 numpy 数组和迭代真的很困难。请任何人建议我如何解决上述问题。事实上我也有其他列，但所有操作都取决于第一列的匹配（时间匹配）。

再次非常感谢。

Answer 1

你是不是想说

import numpy as np

simulated = np.array([
    (0, 5),
    (100, 6),
    (200, 10),
    (300, 12),
    (400, 15),
    (500, 20),
    (600, 0),
    (700, 11),
    (800, 12),
    (900, 19),
    (1000, 10)
])

actual = np.array([
    (0, 5),
    (50, 6),
    (200, 15),
    (350, 12),
    (400, 15),
    (500, 25),
    (650, 0),
    (700, 11),
    (850, 8),
    (900, 19),
    (1000, 3)
])


def indexes_where_match(A, B):
    """ an iterator that goes over the indexes of wherever the entries in A's first-col and B's first-col match """
    return (i for i, (a, b) in enumerate(zip(A, B)) if a[0] == b[0])


def main():
    for i in indexes_where_match(simulated, actual):
        print(simulated[i][1], 'should be compared to', actual[i][1])

if __name__ == '__main__':
    main()

您也可以使用列切片，如下所示：

simulated_time, simulated_values = simulated[..., 0], simulated[..., 1:]
actual_time, actual_values = actual[..., 0], actual[..., 1:]

indexes_where_match = (i for i, (a, b) in enumerate(zip(simulated_time, actual_time)) if a == b)

for i in indexes_where_match:
    print(simulated_values[i], 'should be compared to', actual_values[i])


# outputs:
# [5] should be compared to [5]
# [10] should be compared to [15]
# [15] should be compared to [15]
# [20] should be compared to [25]
# [11] should be compared to [11]
# [19] should be compared to [19]
# [10] should be compared to [3]

根据第一列与 python 中另一个文本文件的匹配选择第二列数据

Selection of second columns data based on match of first column with another text file in python

numpy

matplotlib-basemap