转换 table 替换其他文件中的所有元素

Question

我正在尝试将制表符分隔文件中的所有 ICD 代码转换为生物信息学项目的 Phecode（基于 ICD-Phecode 转换 table 制表符分隔文件）。我从下面的 Whosebug post:

代码中找到了一个很好的起点

awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { =a[] }1' TABLE OLD_FILE

Replacing values in large table using conversion table

但我不希望“第一列中的所有值都根据转换 table 进行了更改”（以上代码）我希望 002.txt 中所有列中的所有值是根据 table ICD9toPhecode.txt 和 ICD10toPhecode.txt 的转换而改变。所以我将 awk 脚本更改为以下内容，但它不起作用，它什么也没做：

awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { for (i = 1; i <= $NR; ++i) $i=a[] }1' ICD9toPhecode.txt 002.txt
awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { for (i = 1; i <= $NR; ++i) $i=a[] }1' ICD10toPhecode.txt 002.txt

ICD9toPhecode.txt和ICD10toPhecode.txt中的第一列是ICD9或ICD10代码，第二列是Phecode。

002.txt 中的每一列都是 ICD9 或 ICD10 代码。

编辑：仍然无法正常工作 如何写入文件？

这是 ICD10 代码的匿名患者数据 002.txt 样本 OLD_FILE

1   2   3   4   5   6   7   8
K40.9   K43.9   N20.0   N20.1   N23 N39.0   R69 Z88.1
B96.8   D12.6   E11.6   E87.6   I44.7   K40.9   K43.9   K52.9
NOT

这里是转换table（ICD10toPhecode.txt）或者TABLE

icd10cm phecode
K40.9   550.1
K43.9   550.5
N20.0   594.1
N20.1   594.3
N23 594.8
N39.0   591
R69 1019
Z88.1   960.1
B96.8   041
D12.6   208
E11.6   250.2
E87.6   276.14
I44.7   426.32
K40.9   550.1
K43.9   550.5
K52.9   558
XNO    17

这是我应该得到的（ICD10代码转换为Phecodes）（002_output.txt）：

1   2   3   4   5   6   7   8
550.1   550.5   594.1   594.3   594.8   591 1019    960.1
041 208 250.2   276.14  426.32  550.1   550.5   558

但我在 002_output.txt 中实际得到的是 002.txt

的重复

我需要知道的是如何改变：

awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { for (i = 1; i <= $NR; ++i) $i=a[] }1' ICD9toPhecode.txt 002.txt
awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { for (i = 1; i <= $NR; ++i) $i=a[] }1' ICD10toPhecode.txt 002.txt

具体改ICD10toPhecode.txt 002.txt

我需要将输出写入 002_output.txt。不能这么简单

ICD10toPhecode.txt 002.txt > 002_output.txt

输出与 002.txt

相同的内容

TESTABLE 测试用例（对于 table，请参阅我在上面 post 使用这些名称编辑的代码片段）：

awk '
   # Ignore header
   NR==1{ next }
   # Load first file
   FNR==NR { a[]=; next }
   {
      # Foreach value
      for (i = 1; i <= $NR; ++i) {
          # if the value is in second file
          if ($i in a) {         
                # then replace it
                $i = a[$i]       # NOTE - $i __not__  !
          }
      }
      # print it!
      print
   }
' ICD10toPhecode.txt 002.txt > 002_output.txt

基于：

awk 'NR==1 { next } FNR==NR { a[]=; next }  in a { =a[] }1' TABLE OLD_FILE

我很确定在我的 TESTABLE 测试案例中我搞砸了我的 for 循环也许 FNR==NR { a[]=; next } 我需要 link $1 ICD 代码和 $2 Phecodes in ICD10toPhecode.txt 并在 002.txt（多于一列）

的所有字段中用 Phecodes 替换 ICD 代码

Answer 1

循环必须在条件之外。 IE。您要检查每一列，而不仅仅是 in a。考虑一种更具可读性的多行格式。

awk '
   # Ignore header
   NR==1{ next }
   # Load first file
   FNR==NR { a[]=; next }
   {
      # Foreach value
      for (i = 1; i <= $NR; ++i) {
          # if the value is in second file
          if ($i in a) {         
                # then replace it
                $i = a[$i]       # NOTE - $i __not__  !
          }
      }
      # print it!
      print
   }
'

Answer 2

我在你的代码中看到的错误是在你的循环中使用 $NR 而不是 NF，跳过第二个文件的第一行而不是按原样打印它，并且不使用制表符作为 in/out 分隔符。这显然是您需要的：

$ awk '
    BEGIN { FS=OFS="\t" }
    NR==FNR { map[]=; next }
    FNR>1 {
        for (i=1; i<=NF; i++) {
            if ($i in map) {
                $i = map[$i]
            }
        }
    }
    { print }
' ICD10toPhecode.txt 002_ICD.txt
1       2       3       4       5       6       7       8
550.1   550.5   594.1   594.3   594.8   591     1019    960.1
041     208     250.2   276.14  426.32  550.1   550.5   558

转换 table 替换其他文件中的所有元素

Conversion table replace all elements in other file

bash

awk

sed

large-data