如何按照某种模式连接 2 个文件?
How do I concatenate 2 files follow a some pattern?
我想做的只是连接 2 个文件,如下例所示:
file 1 file 2
C1 O1
C3 O3
.. O5
O7
O9
O11
O13
O15
O17
O19
..
所需的输出文件是:
file 3
C1
O1
O9
O17
C3
O3
O11
O19
..
..
因此,模式是:首先是 C1 和 O1,然后是文件 2 中的 3 行(因此,打印 O9);然后文件 2 中的另外 3 行(因此,打印 O17)。然后打印 C3 和 O3,在文件 2 (O10) 中输出 3 行,在文件 2 (O18) 中输出 3 行;然后C5 ...等
我尝试用 cat | paste - - - ...
做点什么,但没用 :(
有什么建议吗?
非常感谢
编辑
我忘了告诉你它们是大文件。 :)
这是我的输入文件
cat file 1
C 18 -2.182951850 -0.000000000 -6.517815410
C 20 -4.127401075 0.000000000 -0.446529291
C 22 -3.314258919 -2.494999886 -15.624910016
C 24 -6.071850300 0.000000000 5.624757806
C 26 -2.023950100 0.000000000 5.624757806
C 28 -4.286402584 -0.000000000 -12.589102506
C 30 -6.230851809 -0.000000000 -6.517815410
C 32 -0.079500634 0.000000000 -0.446529291
cat file 2
O 34 -1.393125174 -0.640765928 -5.738276269
O 36 -3.337574640 -0.640765928 0.333010828
O 38 -2.524270589 1.854234106 -14.845370570
O 40 -5.282024106 -0.640765928 6.404297925
O 42 -2.182951850 1.281531856 -6.517815410
O 44 -4.127401075 1.281531856 -0.446529291
O 46 -3.314258919 -1.213468178 -15.624910016
O 48 -6.071850300 1.281531856 5.624757806
O 50 -2.972778044 -0.640765928 -7.297355528
O 52 -4.917227269 -0.640765928 -1.226068432
O 54 -4.104085113 1.854234106 -16.404449463
O 56 -6.861676614 -0.640765928 4.845217687
O 58 -2.813776294 0.640765779 4.845217687
O 60 -5.076228778 0.640765779 -13.368642136
O 62 -7.020678123 0.640765779 -7.297355528
O 64 -0.869326828 0.640765779 -1.226068432
O 66 -2.023950100 -1.281531708 5.624757806
O 68 -4.286402584 -1.281531708 -12.589102506
O 70 -6.230851809 -1.281531708 -6.517815410
O 72 -0.079500634 -1.281531708 -0.446529291
O 74 -1.234123906 0.640765779 6.404297925
O 76 -3.496576390 0.640765779 -11.809563365
O 78 -5.441025615 0.640765779 -5.738276269
O 80 0.710325077 0.640765779 0.333010828
C18 之后必须是 O34、O42 和 O50。然后C20接着是O36、O44和O52等等:
cat file 3
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
.. .. ............ ............. .........
Tom代码生成的输出是这样的:
Tom output
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
O 74 -1.234123906 0.640765779 6.404297925
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
O 76 -3.496576390 0.640765779 -11.809563365
C 22 -3.314258919 -2.494999886 -15.624910016
O 38 -2.524270589 1.854234106 -14.845370570
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
O 78 -5.441025615 0.640765779 -5.738276269
and so on
有什么建议吗?
谢谢
我建议使用 awk 来执行此操作:
# first file
NR == FNR {
a[NR] = [=10=] # save each line into array
++len
next # skip further blocks
}
{ b[FNR] = [=10=] } # save each line from 2nd file into array
END {
# loop through and print
for (i = 1; i <= len; ++i) {
print a[i]
for (j = i; j <= FNR; j += 4) print b[j]
}
}
脚本可以是 运行 比如 awk -f script.awk file1 file2
.
您所描述的(通过评论中的确认)是一种模式
- 由一条C线组成
- 对一组九个 O 行进行采样,从一个与 C 行具有相同偏移量的行开始。
为了处理这个问题,我会使用带有 9 行 "sliding window" 的 awk 作为缓冲区。
我建议不要使用 Tom 的解决方案,即按顺序将 awk 指向两个文件并将一个文件读入数组,而是同时从两个文件中读取,这样您就不会占用太多内存来保存数组。
这就是我的意思,作为一条线:
awk '{a[NR]=[=10=];delete a[NR-10];} NR>9{getline Cline < "fileC";print Cline;print a[NR-9]; print a[NR-5]; print a[NR-1];}' fileO
为便于阅读(和评论)而拆分,如下所示:
awk '
{
a[NR]=[=11=]; # Store our current "O" line in an array
delete a[NR-10]; # Clean the array as we step through the file
}
NR>9 {
getline Cline < "fileC"; # Get the next "C" line...
print Cline; # ... and print it
print a[NR-9]; # \
print a[NR-5]; # > Print the three "O" lines for this
print a[NR-1]; # /
}
' fileO
请注意您的 "O" 行数正确,因为如果最后一组 "O" 行不完整,则不会打印。
你的示例数据的输出如下所示:
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
C 22 -3.314258919 -2.494999886 -15.624910016
O 38 -2.524270589 1.854234106 -14.845370570
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
C 24 -6.071850300 0.000000000 5.624757806
O 40 -5.282024106 -0.640765928 6.404297925
O 48 -6.071850300 1.281531856 5.624757806
O 56 -6.861676614 -0.640765928 4.845217687
C 26 -2.023950100 0.000000000 5.624757806
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
C 28 -4.286402584 -0.000000000 -12.589102506
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
C 30 -6.230851809 -0.000000000 -6.517815410
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
C 32 -0.079500634 0.000000000 -0.446529291
O 48 -6.071850300 1.281531856 5.624757806
O 56 -6.861676614 -0.640765928 4.845217687
O 64 -0.869326828 0.640765779 -1.226068432
C 32 -0.079500634 0.000000000 -0.446529291
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
C 32 -0.079500634 0.000000000 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
C 32 -0.079500634 0.000000000 -0.446529291
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
C 32 -0.079500634 0.000000000 -0.446529291
O 56 -6.861676614 -0.640765928 4.845217687
O 64 -0.869326828 0.640765779 -1.226068432
O 72 -0.079500634 -1.281531708 -0.446529291
C 32 -0.079500634 0.000000000 -0.446529291
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
O 74 -1.234123906 0.640765779 6.404297925
C 32 -0.079500634 0.000000000 -0.446529291
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
O 76 -3.496576390 0.640765779 -11.809563365
C 32 -0.079500634 0.000000000 -0.446529291
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
O 78 -5.441025615 0.640765779 -5.738276269
你是这个意思吗?
我想做的只是连接 2 个文件,如下例所示:
file 1 file 2
C1 O1
C3 O3
.. O5
O7
O9
O11
O13
O15
O17
O19
..
所需的输出文件是:
file 3
C1
O1
O9
O17
C3
O3
O11
O19
..
..
因此,模式是:首先是 C1 和 O1,然后是文件 2 中的 3 行(因此,打印 O9);然后文件 2 中的另外 3 行(因此,打印 O17)。然后打印 C3 和 O3,在文件 2 (O10) 中输出 3 行,在文件 2 (O18) 中输出 3 行;然后C5 ...等
我尝试用 cat | paste - - - ...
做点什么,但没用 :(
有什么建议吗?
非常感谢
编辑
我忘了告诉你它们是大文件。 :)
这是我的输入文件
cat file 1
C 18 -2.182951850 -0.000000000 -6.517815410
C 20 -4.127401075 0.000000000 -0.446529291
C 22 -3.314258919 -2.494999886 -15.624910016
C 24 -6.071850300 0.000000000 5.624757806
C 26 -2.023950100 0.000000000 5.624757806
C 28 -4.286402584 -0.000000000 -12.589102506
C 30 -6.230851809 -0.000000000 -6.517815410
C 32 -0.079500634 0.000000000 -0.446529291
cat file 2
O 34 -1.393125174 -0.640765928 -5.738276269
O 36 -3.337574640 -0.640765928 0.333010828
O 38 -2.524270589 1.854234106 -14.845370570
O 40 -5.282024106 -0.640765928 6.404297925
O 42 -2.182951850 1.281531856 -6.517815410
O 44 -4.127401075 1.281531856 -0.446529291
O 46 -3.314258919 -1.213468178 -15.624910016
O 48 -6.071850300 1.281531856 5.624757806
O 50 -2.972778044 -0.640765928 -7.297355528
O 52 -4.917227269 -0.640765928 -1.226068432
O 54 -4.104085113 1.854234106 -16.404449463
O 56 -6.861676614 -0.640765928 4.845217687
O 58 -2.813776294 0.640765779 4.845217687
O 60 -5.076228778 0.640765779 -13.368642136
O 62 -7.020678123 0.640765779 -7.297355528
O 64 -0.869326828 0.640765779 -1.226068432
O 66 -2.023950100 -1.281531708 5.624757806
O 68 -4.286402584 -1.281531708 -12.589102506
O 70 -6.230851809 -1.281531708 -6.517815410
O 72 -0.079500634 -1.281531708 -0.446529291
O 74 -1.234123906 0.640765779 6.404297925
O 76 -3.496576390 0.640765779 -11.809563365
O 78 -5.441025615 0.640765779 -5.738276269
O 80 0.710325077 0.640765779 0.333010828
C18 之后必须是 O34、O42 和 O50。然后C20接着是O36、O44和O52等等:
cat file 3
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
.. .. ............ ............. .........
Tom代码生成的输出是这样的:
Tom output
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
O 74 -1.234123906 0.640765779 6.404297925
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
O 76 -3.496576390 0.640765779 -11.809563365
C 22 -3.314258919 -2.494999886 -15.624910016
O 38 -2.524270589 1.854234106 -14.845370570
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
O 78 -5.441025615 0.640765779 -5.738276269
and so on
有什么建议吗?
谢谢
我建议使用 awk 来执行此操作:
# first file
NR == FNR {
a[NR] = [=10=] # save each line into array
++len
next # skip further blocks
}
{ b[FNR] = [=10=] } # save each line from 2nd file into array
END {
# loop through and print
for (i = 1; i <= len; ++i) {
print a[i]
for (j = i; j <= FNR; j += 4) print b[j]
}
}
脚本可以是 运行 比如 awk -f script.awk file1 file2
.
您所描述的(通过评论中的确认)是一种模式
- 由一条C线组成
- 对一组九个 O 行进行采样,从一个与 C 行具有相同偏移量的行开始。
为了处理这个问题,我会使用带有 9 行 "sliding window" 的 awk 作为缓冲区。
我建议不要使用 Tom 的解决方案,即按顺序将 awk 指向两个文件并将一个文件读入数组,而是同时从两个文件中读取,这样您就不会占用太多内存来保存数组。
这就是我的意思,作为一条线:
awk '{a[NR]=[=10=];delete a[NR-10];} NR>9{getline Cline < "fileC";print Cline;print a[NR-9]; print a[NR-5]; print a[NR-1];}' fileO
为便于阅读(和评论)而拆分,如下所示:
awk '
{
a[NR]=[=11=]; # Store our current "O" line in an array
delete a[NR-10]; # Clean the array as we step through the file
}
NR>9 {
getline Cline < "fileC"; # Get the next "C" line...
print Cline; # ... and print it
print a[NR-9]; # \
print a[NR-5]; # > Print the three "O" lines for this
print a[NR-1]; # /
}
' fileO
请注意您的 "O" 行数正确,因为如果最后一组 "O" 行不完整,则不会打印。
你的示例数据的输出如下所示:
C 18 -2.182951850 -0.000000000 -6.517815410
O 34 -1.393125174 -0.640765928 -5.738276269
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
C 20 -4.127401075 0.000000000 -0.446529291
O 36 -3.337574640 -0.640765928 0.333010828
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
C 22 -3.314258919 -2.494999886 -15.624910016
O 38 -2.524270589 1.854234106 -14.845370570
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
C 24 -6.071850300 0.000000000 5.624757806
O 40 -5.282024106 -0.640765928 6.404297925
O 48 -6.071850300 1.281531856 5.624757806
O 56 -6.861676614 -0.640765928 4.845217687
C 26 -2.023950100 0.000000000 5.624757806
O 42 -2.182951850 1.281531856 -6.517815410
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
C 28 -4.286402584 -0.000000000 -12.589102506
O 44 -4.127401075 1.281531856 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
C 30 -6.230851809 -0.000000000 -6.517815410
O 46 -3.314258919 -1.213468178 -15.624910016
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
C 32 -0.079500634 0.000000000 -0.446529291
O 48 -6.071850300 1.281531856 5.624757806
O 56 -6.861676614 -0.640765928 4.845217687
O 64 -0.869326828 0.640765779 -1.226068432
C 32 -0.079500634 0.000000000 -0.446529291
O 50 -2.972778044 -0.640765928 -7.297355528
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
C 32 -0.079500634 0.000000000 -0.446529291
O 52 -4.917227269 -0.640765928 -1.226068432
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
C 32 -0.079500634 0.000000000 -0.446529291
O 54 -4.104085113 1.854234106 -16.404449463
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
C 32 -0.079500634 0.000000000 -0.446529291
O 56 -6.861676614 -0.640765928 4.845217687
O 64 -0.869326828 0.640765779 -1.226068432
O 72 -0.079500634 -1.281531708 -0.446529291
C 32 -0.079500634 0.000000000 -0.446529291
O 58 -2.813776294 0.640765779 4.845217687
O 66 -2.023950100 -1.281531708 5.624757806
O 74 -1.234123906 0.640765779 6.404297925
C 32 -0.079500634 0.000000000 -0.446529291
O 60 -5.076228778 0.640765779 -13.368642136
O 68 -4.286402584 -1.281531708 -12.589102506
O 76 -3.496576390 0.640765779 -11.809563365
C 32 -0.079500634 0.000000000 -0.446529291
O 62 -7.020678123 0.640765779 -7.297355528
O 70 -6.230851809 -1.281531708 -6.517815410
O 78 -5.441025615 0.640765779 -5.738276269
你是这个意思吗?