如何找到与部分字符串的匹配项，然后使用 awk 从参考文件中删除该字符串？

Question

我有一个问题一直在努力解决，但一直无法弄清楚如何解决。我有一个参考文件，其中按条形码列出了我库存中的所有设备。

参考文件：

PTR10001,PRINTER,SN A
PTR10002,PRINTER,SN B
PTR10003,PRINTER,SN C 
MON10001,MONITOR,SN A
MON10002,MONITOR,SN B
MON10003,MONITOR,SN C
CPU10001,COMPUTER,SN A
CPU10002,COMPUTER,SN B
CPU10003,COMPUTER,SN C

我想做的是制作一个文件，我只需要把我需要的缩写放在上面。文件 2 将如下所示：

PTR
CPU
MON
MON

所需的输出将是一个文件，该文件会通过条形码告诉我哪些商品需要下架。
所需的输出文件：

PTR10001
CPU10001
MON10001
MON10002

如输出所示，因为我不能有 2 个相同的条形码，所以我需要它来查看参考文件并找到第一个匹配项。将数字复制到输出文件后，我想从参考文件中删除该数字，这样它就不会重复该数字。

我已经尝试了 awk 的几次迭代，但未能获得所需的输出。
我得到的最接近的是以下代码：

awk -F'/' '{ key = substr(,1,3) } NR==FNR {id[key]=; next} key in id { =id[key] } { print }' $file1 $file2 > $file3

我正在用 ksh 写这篇文章，并且想使用 awk，因为我认为这将是解决问题的最佳方法。感谢您帮助我。

Answer 1

第一个解法：

根据您的详细描述，我认为顺序无关紧要，因为您想知道从货架上取下什么。所以你可以反其道而行之，先阅读file2，数一数，然后去货架上拿。

awk -F, 'FNR==NR{c[[=10=]]++; next} c[substr(,1,3)]-->0{print }' file2 file1

输出：

PTR10001
MON10001
MON10002
CPU10001

第二种方案：

您的 awk 与您想要的非常接近，但您的数组中需要第二个维度，而不是覆盖现有的 ID。我们将使用伪二维数组（顺便说一句 GNU awk 有真正的二维数组）来存储像 PTR10001,PTR10002,PTR10003 这样的 id，我们用 split 检索它们，然后我们删除也来自货架。

> cat tst.awk
BEGIN { FS="," }

NR==FNR {
    key=substr(,1,3)
    ids[key] = (ids[key]? ids[key] "," : ) #append new id.
    next
}

[=12=] in ids {
    split(ids[[=12=]], tmp, ",")
    print(tmp[1])
    ids[[=12=]]=substr(ids[[=12=]],length(tmp[1])+2) #remove from shelf
}

输出

awk -f tst.awk file1 file2
PTR10001
CPU10001
MON10001
MON10002

这里我们保留 file2 的顺序，因为这是基于您尝试过的想法。

Answer 2

您能否尝试在 GNU awk.

中使用显示的示例进行跟踪、编写和测试

awk '
FNR==NR{
  iniVal[[=10=]]++
  next
}
{
  counter=substr([=10=],1,3)
}
counter in iniVal{
  if(++currVal[counter]<=iniVal[counter]){
     print 
     if(currVal[counter]==iniVal[counter]){ delete iniVal[[=10=]] }
  }
}
' Input_file2  FS="," Input_file1

说明： 为以上添加详细说明。

awk '                                           ##Starting awk program from here.
FNR==NR{                                        ##Checking condition if FNR==NR which is true when Input_file2 is being read.
  iniVal[[=11=]]++                                  ##Creating array iniVal with index of current line with increment of 1 each time it comes here.
  next                                          ##next will skip all further statements from here.
}
{
  counter=substr([=11=],1,3)                        ##Creating counter variable which has 1st 3 characters of Input_file1 here.
}
counter in iniVal{                              ##Checking if counter is present in iniVal then do following.
  if(++currVal[counter]<=iniVal[counter]){      ##Checking if currValarray with index of counter value is lesser than or equal to iniVal then do following.
     print                                    ##Printing 1st field of current line here.
     if(currVal[counter]==iniVal[counter]){     ##Checking if currVal value is equal to iniVal with index of counter here.
       delete iniVal[[=11=]]                        ##If above condition is TRUE then deleting iniVal here.
     }
  }
}
' Input_file2  FS="," Input_file1               ##Mentioning Input_file names here.

如何找到与部分字符串的匹配项，然后使用 awk 从参考文件中删除该字符串？

How to find a match to a partial string and then delete the string from the reference file using awk?

awk

ksh