根据多个文件查找匹配索引并打印

Find matching index based on multiple files and print

我有如下 3 个文件,所有 3 个文件的列数和行数都相同(超过数百个)。我想要的是:如果 File1 和 File2 中的数字都在特定范围内,则找到 col/row,然后将 File3 中的数字保留为相同的索引,并为其他数字标记“0”。 eg:从File1和File2中,只有col2/row2处的数字符合标准(0<88<100, 0<6<10),则保留File3中的数字8,其他数字全部赋值“0”。是否可以使用 awk 来做到这一点?还是python?谢谢。

文件 1:

-10 -10 9 
-20 88 106 
-30 300 120

文件 2:

-6 0 -7
-5 6 1
-2 18 32

文件 3:

4 3 5 
6 8 8
10 23 14

输出

0 0 0
0 8 0
0 0 0

关注 awk 会有所帮助。

awk '
FNR==1                 { count++             }  ##Checking condition if FNR==1 then increment variable count with 1 each time.
count==1               {                        ##Checking condition if count is either 1 or 2 if yes then do following.
   for(i=1;i<=NF;i++)  {                        ##Starting a usual for loop from variable value 1 to till value of NF here and doing following.
     if($i>0 && $i<100){ a[FNR,i]++          }  ##Checking condition if a field value is greater than 0 and lesser than 100 then increment 1 count for array a whose index is line_number and column_number here. So this will have the record of which ever line whichever column has values in range and if count is 2 then we should print it.
}}
count==2               {
   for(i=1;i<=NF;i++)  {
     if($i>0 && $i<10) { a[FNR,i]++          }
}}
count==3               {                        ##Checking condition if variable count is 3 here then do following.
   for(j=1;j<=NF;j++)  { $j=a[FNR,j]==2?$j:0 }; ##Starting a for loop here from 1 to till NF value and checking condition if array a with index of line_number and column_number is 2(means both File1 and File2 have same ranges) then keep its same value else make it 0 as per OP request.
   print                                     }  ##Printing the current line edited/non-edited value here.
' File1 File2 File3                             ##Mentioning all Input_file(s) here.

输出如下。

0 0 0
0 8 0
0 0 0

你的回答很好awk

以下是在 Python 中使用 numpy 执行此操作的方法。

首先,阅读文件:

import numpy as np
arrays=[]
for fn in ('file1', 'file2', 'file3'):
    with open(fn) as f:
        arrays.append(np.array([line.split() for line in f],dtype=float))

然后创建一个掩码矩阵来过滤所需的条件:

mask=(arrays[0]>0) & (arrays[0]<100) & (arrays[1]>0) & (arrays[1]<10)

然后用掩码乘以第三个数组(arrays[2]是第三个文件):

>>> arrays[2] * mask.astype(float)
[[0. 0. 0.]
 [0. 8. 0.]
 [0. 0. 0.]]