如何按照某种模式连接 2 个文件?

How do I concatenate 2 files follow a some pattern?

我想做的只是连接 2 个文件,如下例所示:

file 1        file 2
C1            O1             
C3            O3
..            O5
              O7
              O9
              O11
              O13
              O15
              O17
              O19
              ..

所需的输出文件是:

file 3
C1
O1
O9
O17
C3
O3
O11
O19
..
..

因此,模式是:首先是 C1 和 O1,然后是文件 2 中的 3 行(因此,打印 O9);然后文件 2 中的另外 3 行(因此,打印 O17)。然后打印 C3 和 O3,在文件 2 (O10) 中输出 3 行,在文件 2 (O18) 中输出 3 行;然后C5 ...等

我尝试用 cat | paste - - - ... 做点什么,但没用 :(

有什么建议吗?

非常感谢

编辑

我忘了告诉你它们是大文件。 :)

这是我的输入文件

cat file 1
C             18     -2.182951850        -0.000000000        -6.517815410
C             20     -4.127401075         0.000000000        -0.446529291
C             22     -3.314258919        -2.494999886       -15.624910016
C             24     -6.071850300         0.000000000         5.624757806
C             26     -2.023950100         0.000000000         5.624757806
C             28     -4.286402584        -0.000000000       -12.589102506
C             30     -6.230851809        -0.000000000        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291

cat file 2
O             34     -1.393125174        -0.640765928        -5.738276269
O             36     -3.337574640        -0.640765928         0.333010828
O             38     -2.524270589         1.854234106       -14.845370570
O             40     -5.282024106        -0.640765928         6.404297925
O             42     -2.182951850         1.281531856        -6.517815410
O             44     -4.127401075         1.281531856        -0.446529291
O             46     -3.314258919        -1.213468178       -15.624910016
O             48     -6.071850300         1.281531856         5.624757806
O             50     -2.972778044        -0.640765928        -7.297355528
O             52     -4.917227269        -0.640765928        -1.226068432
O             54     -4.104085113         1.854234106       -16.404449463
O             56     -6.861676614        -0.640765928         4.845217687
O             58     -2.813776294         0.640765779         4.845217687
O             60     -5.076228778         0.640765779       -13.368642136
O             62     -7.020678123         0.640765779        -7.297355528
O             64     -0.869326828         0.640765779        -1.226068432
O             66     -2.023950100        -1.281531708         5.624757806
O             68     -4.286402584        -1.281531708       -12.589102506
O             70     -6.230851809        -1.281531708        -6.517815410
O             72     -0.079500634        -1.281531708        -0.446529291
O             74     -1.234123906         0.640765779         6.404297925
O             76     -3.496576390         0.640765779       -11.809563365
O             78     -5.441025615         0.640765779        -5.738276269
O             80      0.710325077         0.640765779         0.333010828

C18 之后必须是 O34、O42 和 O50。然后C20接着是O36、O44和O52等等:

cat file 3
C             18     -2.182951850        -0.000000000        -6.517815410 
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
..             ..      ............        .............       .........

Tom代码生成的输出是这样的:

Tom output
C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269
and     so   on

有什么建议吗?

谢谢

我建议使用 awk 来执行此操作:

# first file
NR == FNR { 
    a[NR] = [=10=]  # save each line into array
    ++len
    next        # skip further blocks
}

{ b[FNR] = [=10=] } # save each line from 2nd file into array

END {
    # loop through and print
    for (i = 1; i <= len; ++i) {
        print a[i]
        for (j = i; j <= FNR; j += 4) print b[j]
    }
}

脚本可以是 运行 比如 awk -f script.awk file1 file2.

您所描述的(通过评论中的确认)是一种模式

  • 由一条C线组成
  • 对一组九个 O 行进行采样,从一个与 C 行具有相同偏移量的行开始。

为了处理这个问题,我会使用带有 9 行 "sliding window" 的 awk 作为缓冲区。

我建议不要使用 Tom 的解决方案,即按顺序将 awk 指向两个文件并将一个文件读入数组,而是同时从两个文件中读取,这样您就不会占用太多内存来保存数组。

这就是我的意思,作为一条线:

awk '{a[NR]=[=10=];delete a[NR-10];} NR>9{getline Cline < "fileC";print Cline;print a[NR-9]; print a[NR-5]; print a[NR-1];}' fileO

为便于阅读(和评论)而拆分,如下所示:

awk '
  {
    a[NR]=[=11=];        # Store our current "O" line in an array
    delete a[NR-10]; # Clean the array as we step through the file
  }

  NR>9 {
    getline Cline < "fileC";  # Get the next "C" line...
    print Cline;              # ... and print it
    print a[NR-9];            # \ 
    print a[NR-5];            #  > Print the three "O" lines for this 
    print a[NR-1];            # /
  }
' fileO

请注意您的 "O" 行数正确,因为如果最后一组 "O" 行不完整,则不会打印。

你的示例数据的输出如下所示:

C             18     -2.182951850        -0.000000000        -6.517815410
O             34     -1.393125174        -0.640765928        -5.738276269
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
C             20     -4.127401075         0.000000000        -0.446529291
O             36     -3.337574640        -0.640765928         0.333010828
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
C             22     -3.314258919        -2.494999886       -15.624910016
O             38     -2.524270589         1.854234106       -14.845370570
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
C             24     -6.071850300         0.000000000         5.624757806
O             40     -5.282024106        -0.640765928         6.404297925
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
C             26     -2.023950100         0.000000000         5.624757806
O             42     -2.182951850         1.281531856        -6.517815410
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
C             28     -4.286402584        -0.000000000       -12.589102506
O             44     -4.127401075         1.281531856        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
C             30     -6.230851809        -0.000000000        -6.517815410
O             46     -3.314258919        -1.213468178       -15.624910016
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
C             32     -0.079500634         0.000000000        -0.446529291
O             48     -6.071850300         1.281531856         5.624757806
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
C             32     -0.079500634         0.000000000        -0.446529291
O             50     -2.972778044        -0.640765928        -7.297355528
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
C             32     -0.079500634         0.000000000        -0.446529291
O             52     -4.917227269        -0.640765928        -1.226068432
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
C             32     -0.079500634         0.000000000        -0.446529291
O             54     -4.104085113         1.854234106       -16.404449463
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
C             32     -0.079500634         0.000000000        -0.446529291
O             56     -6.861676614        -0.640765928         4.845217687
O             64     -0.869326828         0.640765779        -1.226068432
O             72     -0.079500634        -1.281531708        -0.446529291
C             32     -0.079500634         0.000000000        -0.446529291
O             58     -2.813776294         0.640765779         4.845217687
O             66     -2.023950100        -1.281531708         5.624757806
O             74     -1.234123906         0.640765779         6.404297925
C             32     -0.079500634         0.000000000        -0.446529291
O             60     -5.076228778         0.640765779       -13.368642136
O             68     -4.286402584        -1.281531708       -12.589102506
O             76     -3.496576390         0.640765779       -11.809563365
C             32     -0.079500634         0.000000000        -0.446529291
O             62     -7.020678123         0.640765779        -7.297355528
O             70     -6.230851809        -1.281531708        -6.517815410
O             78     -5.441025615         0.640765779        -5.738276269

你是这个意思吗?