两个文件第二列的数学

Question

我有两个文件，

>cat foo.txt
QGP 1044
TGP 634
KGP 616
DGA 504
PGP 481
KGD 465
QGE 456
TGD 393
DGS 367
TGA 366

>cat bar.txt
QGP 748.6421
TGP 564.0048
KGP 568.7543
DGA 193.6391
PGP 405.1929
KGD 248.7047
QGE 287.7652
TGD 246.6278
DGS 143.6255
TGA 210.1166

两个文件中的第 1 列相同。我需要像这样进行数学运算，

(foo.txt$column2 - bar.txt$column2)/sqrt(bar.txt$column2)

并输出column1和数学运算的column2。我不知道如何使用 awk 遍历每一行。非常感谢任何帮助！

Answer 1

惯用技术是：遍历第一个文件，并创建从 $1 到 $2 的映射。然后，迭代第二个文件，并使用当前 $1

的映射

awk '
    NR == FNR { # this condition is true for the lines of the first file [1]
        foo[] = 
        next
    }
    {
        print , (foo[] - ) / sqrt()
    }
' foo.txt bar.txt

产出

QGP 10.7947
TGP 2.94732
KGP 1.98107
DGA 22.3034
PGP 3.76599
KGD 13.7153
QGE 9.91737
TGD 9.32047
DGS 18.6388
TGA 10.754

[1]: NR == FNR

FNR为当前文件的记录号。 NR 是目前看到的所有文件的总记录数。这些值只对第一个文件是相同的。当第一个文件为空时，这会崩溃。在这种情况下，NR == FNR 对于至少有一行的第一个文件为真。比较靠谱的条件是：

awk '
    FILENAME == ARGV[1] {
        do stuff for the first file
        next
    }
    {
        this action is for each subsequent file
    }
' file1 file2 ...

Answer 2

Perl 解决方案：

paste foo.txt bar.txt | \
  perl -F'\t' -lane 'print join "\t", $F[0], ( ($F[1] - $F[3]) / ($F[3])**0.5 );' > out.txt'

Perl one-liner 使用这些命令行标志：
-e ：告诉 Perl 查找代码 in-line，而不是在文件中。
-n ：一次循环输入一行，默认分配给 $_。
-l : 在执行代码 in-line 之前去除输入行分隔符（默认情况下在 *NIX 上为 "\n"），并在打印时附加它。
-a : 在空格或 -F 选项中指定的正则表达式上将 $_ 拆分为数组 @F。
-F'/\t/' ：在 TAB 上拆分为 @F，而不是在空格上。数组 @F 是 zero-indexed.

另请参见：
perldoc perlrun: how to execute the Perl interpreter: command line switches

Answer 3

您可以使用 join:

$ join foo.txt bar.txt | awk '{print ( - )/sqrt()}'

或（假设文件已正确排序）使用 awk 读取交替行：

$ awk '{getline b < "bar.txt"; split(b, a); print ( - a[2])/sqrt(a[2])}' foo.txt

Answer 4

另一种写法：

$ awk '{
    if( in a)                       # if index has been met before ie. 2nd file
        print ,(a[]-)/sqrt()  # compute and output
    else                              # else 1st file 
        a[]=                      # hash the value
}' foo bar

一些输出：

QGP 10.7947
TGP 2.94732
KGP 1.98107
...

两个文件第二列的数学

math on second column of two files

math

bash

awk

loops