将两个不同列上每个第 n 次出现的 'foo' 和 'bar' 替换为相应列中所提供文件的第 n 行的数字

Question

我有一个如下所示的 source.txt 文件，其中包含两列数据。 source.txt的列的格式包括[]（方括号），如我的source.txt:

[hot] [water]
[16] [boots and, juice]

我还有另一个 target.txt 文件，其中包含空行以及每行末尾的句点：

the weather is today (foo) but we still have (bar). 

= (

the next bus leaves at (foo) pm, we can't forget to take the (bar).

我想将 target.txt 的第 n 行的 foo 替换为 source.txt 的 第一列 的“相应内容” ]，同时将target.txt的第n行的bar替换为source. txt的第二列的“各自内容”。[=36] =]

我试图搜索其他来源并了解我将如何做，起初我已经有一个命令可以用来替换但我无法适应它：

awk 'NR==FNR {a[NR]=[=14=]; next} /foo/{gsub("foo", a[++i])} 1' source.txt target.txt > output.txt;

我记得看到过一种使用包含两列数据的 gsub 的方法，但我不记得究竟有什么区别。

编辑 POST: 有时在 = 和 ( 和 ) 之间使用一些符号阅读 target.txt 文本。我添加了这个符号，因为如果这些符号在 target.txt 文件

中，一些答案将不起作用

注意：target.txt 行的数量以及因此文件中 bar 和 foo 的出现次数可能会有所不同，我只是展示了一个示例。但是每行foo和bar出现的次数分别是1次

Answer 1

awk '
    NR==FNR { # build lookup

        # delete gumph
        gsub(/(^[[:space:]]*\[)|(\][[:space:]]*$)/, "")

        # split
        split([=10=], a, /\][[:space:]]+\[/)

        # store
        foo[FNR] = a[1]
        bar[FNR] = a[2]

        next
    }

    !/[^[:space:]]/ { next } # ignore blank lines

    { # do replacements
        VFNR++ # FNR - (ignored lines)

        # can use sub if foo/bar only appear once
        gsub(/\<foo\>/, foo[VFNR])
        gsub(/\<bar\>/, bar[VFNR])

        print
    }
' source.txt target.txt

注意：\< 和 \> 不在 POSIX 中，但被某些版本的 awk（例如 gawk）接受。我不确定 POSIX awk 正则表达式是否有“单词边界”。

Answer 2

根据您展示的示例，请尝试以下答案。在 GNU awk.

中编写和测试

awk -F'\[|\] \[|\]' '
FNR==NR{
  foo[FNR]=
  bar[FNR]=
  next
}
NF{
  gsub(/\<foo\>/,foo[++count])
  gsub(/\<bar\>/,bar[count])
}
1
' source.txt FS=" " target.txt

解释：为以上添加详细解释。

awk -F'\[|\] \[|\]' '       ##Setting field separator as [ OR ] [ OR ] here.
FNR==NR{                        ##Checking condition FNR==NR which will be TRUE when source.txt will be read.
  foo[FNR]=                   ##Creating foo array with index of FNR and value of 2nd field here.   
  bar[FNR]=                   ##Creating bar array with index of FNR and value of 3rd field here.
  next                          ##next will skip all further statements from here.
}
NF{                             ##If line is NOT empty then do following.
  gsub(/\<foo\>/,foo[++count])  ##Globally substituting foo with array foo value, whose index is count.
  gsub(/\<bar\>/,bar[count])    ##Globally substituting bar with array of bar with index of count.
}
1                               ##printing line here.
' source.txt FS=" " target.txt  ##Mentioning Input_files names here.

编辑： 还添加以下解决方案，它将处理 n 次出现的 [...] 在源中匹配它们目标文件也。因为这是 OP 的工作解决方案（在评论中确认），所以在此处添加。当 source.txt 包含 &.

时，也公平警告这将失败

awk '
FNR==NR{
  while(match([=12=],/\[[^]]*\]/)){
    arr[++count]=substr([=12=],RSTART+1,RLENGTH-2)
    [=12=]=substr([=12=],RSTART+RLENGTH)
  }
  next
}
{
  line=[=12=]
  while(match(line,/\(?[[:space:]]*(\<foo\>|\<bar\>)[[:space:]]*\)?/)){
    val=substr(line,RSTART,RLENGTH)
    sub(val,arr[++count1])
    line=substr(line,RSTART+RLENGTH)
  }
}
1
' source.txt target.txt

Answer 3

在每个 Unix 机器上的任何 shell 中使用任何 awk：

$ cat tst.awk
BEGIN {
    FS="[][]"
    tags["foo"]
    tags["bar"]
}
NR==FNR {
    map["foo",NR] = 
    map["bar",NR] = 
    next
}
{
    found = 0
    head = ""
    while ( match([=10=],/\([^)]+)/) ) {
        tag = substr([=10=],RSTART+1,RLENGTH-2)
        if ( tag in tags ) {
            if ( !found++ ) {
                lineNr++
            }
            val = map[tag,lineNr]
        }
        else {
            val = substr([=10=],RSTART,RLENGTH)
        }
        head = head substr([=10=],1,RSTART-1) val
        [=10=] = substr([=10=],RSTART+RLENGTH)
    }
    print head [=10=]
}

$ awk -f tst.awk source.txt target.txt
the weather is today hot but we still have water.

= (

the next bus leaves at 16 pm, we can't forget to take the boots and, juice.

将两个不同列上每个第 n 次出现的 'foo' 和 'bar' 替换为相应列中所提供文件的第 n 行的数字

Replace each nth occurrence of 'foo' and 'bar' on two distincts columns by numerically respective nth line of a supplied file in respective columns

string

awk

text-processing

replace

gsub