使用 awk 计算两个文件中的出现次数和计算值

Question

我有 2 个文件，文件 1 大约有 100 万行，文件 2 大约有 1000 行。

file1 包含一些字段（tower_id、user_id、signal_strength）并且看起来像这样：

"0001","00abcde","0.65"
"0002","00abcde","0.35"
"0005","00bcdef","1.0"
"0001","00cdefg","0.1"
"0003","00cdefg","0.4"
"0008","00cdefg","0.3"
"0009","00cdefg","0.2"

file2 包含其他字段（tower_id、x_position、y_position），看起来像这样：

"0001","34","22"
"0002","78","56"
"0003","12","32"
"0004","79","45"
"0005","36","37"
"0006","87","99"
"0007","27","93"
"0008","55","04"
"0009","02","03"

每个user_id的signal_strength总和为1。我需要根据每个塔的信号强度计算用户位置，通过计算每个用户的塔数，并计算 strength_signal 与 tower_position 值的乘积，如下所示：

"00abcde" --> 0.65*34+0.35*78, 0.65*22+0.35*56
"00bcdef" --> 1.0*36, 1.0*37
"00cdefg" --> 0.1*34+0.4*12+0.3*55+0.2*02, 0.1*22+0.4*32+0.3*04+0.2*03

所以输出文件应该看起来像这样 (user_id, computed_x_position, computed_y_position):

00abcde,49.4,33.9
00bcdef,36,37
00cdefg,25.1,16.8

我的想法是使用 awk，以某种方式使用“已读”功能以及 file1 和 file2 作为输入文件（如 awk 'NR==FNR {some commands} {print some values}' file1 file2 > outputfile ），但我不知道该怎么做。谁能帮帮我？

Answer 1

这可能是您想要的：

awk -F '[,"]+' '
    NR==FNR { towx[] = ; towy[] = ; next }
            { usrx[] += towx[] * ; usry[] += towy[] *  }
    END     { for (usr in usrx) printf "%s,%.1f,%.1f\n",
                                       usr, usrx[usr], usry[usr] }
' file2 file1 # file2 precedes file1

使用 awk 计算两个文件中的出现次数和计算值

count number of occurences and compute values in two files using awk

awk

sum

computed-field