将并行变量“{}”作为 awk 变量传递

Question

我想根据保持相同顺序的单词列表（list_of_words 的第二列）以相同的顺序提取 ids.ped 中的所有行。

ids.ped 文件:

2425 NA19901 0
2472 NA20291 0
2476 NA20298 0
1328 NA06989 0
...

我想用 awk 和 parallel 来完成这个任务。

我尝试了以下方法：

cut -f2 list_of_words |
    parallel -j35 --keep-order \
    awk -v id={} 'BEGIN{FS=" "}{if( == id){print ,}}' ids.ped

但是，我得到了错误

/bin/bash: -c: line 0: syntax error near unexpected token `('
/bin/bash: -c: line 0: `awk -v id= BEGIN{FS=" "}{if( == id){print ,}} ids.ped'

看来我不能这样通过{}。

备注：

ids.ped 很大，这就是我想要并行化的方式
我想使用 awk 因为我想根据秒提取行 ids.ped

出于某种原因，我不明白为什么 grep -w 会两次提取某些行，这是我宁愿使用 awk.

的原因之一

欢迎任何其他有效解决此问题的答案。谢谢。

Answer 1

我无法重现您的参数传递问题（文件开头是否有空列？）但由于 parallel 它解释参数的方式，我确实遇到了语法错误。

/opt/local/bin/bash: -c: line 0: syntax error near unexpected token `('
/opt/local/bin/bash: -c: line 0: `awk -v id=NA20291 BEGIN{FS=" "}{if( == id){print ,}} foo.txt'

您有三种选择来解决问题；您可以将 -q 选项添加到 parallel 到 "protect against evaluation by the subshell":

cut -f2 list_of_words |
    parallel -j35 -q --keep-order \
    awk -v id="{}" 'BEGIN{FS=" "}{if( == id){print ,}}' ids.ped

您可以将awk代码移动到一个单独的文件中；命令的其余部分非常简单，不需要转义：

cut -f2 list_of_words |
    parallel -j35 --keep-order awk -v id={} -f foo.awk ids.ped

foo.awk 的内容：

#!/usr/bin/awk
BEGIN {
    FS=" "
}

{
    if( == id){
        print ,
    }
}

或者，您可以想出如何转义该命令。上面链接的手册说 "most people will never need more quoting than putting '\' in front of the special characters."

cut -f2 list_of_words |
    parallel -j35 --keep-order \
    awk -v id="{}" \''BEGIN{FS=" "}{if( == id){print ,}}'\' ids.ped

Answer 2

正如@miken32 所说，将 awk 脚本作为参数提供给 parallel 可能很棘手，但这是一种方法：

parallel -j1 --keep-order \
  awk -v id="{}" "'"'{ if ( == id ) { print , }}'"'" ids.ped

原始问题没有提供 list_of_words 的示例，但这里有一个脚本说明 parallel 与 awk 的用法：

$ cat check
#!/bin/bash

function DATA {
cat<<EOF
1328    NA06989
2425    NA19901
EOF
    }


DATA | cut -f2 |
    parallel -j2 --keep-order awk -v id="{}" "'"'{ if ( == id ) { print , }}'"'" ids.ped

$ ./check
NA06989 0
NA19901 0


$ parallel --version
GNU parallel 20160122

[已在 Mac 上使用 /usr/bin/awk、gawk 和 mawk 进行了测试。]

将并行变量“{}”作为 awk 变量传递

Pass parallel variable "{}" as awk variable

bash

awk

gnu-parallel