如何修复文本 cdr 文件中的换行符?
How to fix line breaks in the text cdr file?
我有以下来自 cdr 文件的输出,如以下代码片段所示。
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2
,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
从代码片段中可以看出,第 2 行应该是第 1 行的一部分,但是有一个换行符将交易分成两行。要求如下
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
如何在已生成的文件中完成此操作? cdr 文件中有数千行类似的行需要更正 .TIA
请您尝试以下操作:
awk '
/^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ { # if the record starts with a date
if (line) print line # flush the line buffer
line = ""
}
{
line = line [=10=] # append the current record to the line buffer
}
END {
if (line) print line # flush the line buffer at the end of the file
}
' file.txt
提供示例的输出:
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
$ awk '{printf "%s%s", (/^[0-9]{4}(-[0-9]{2}){2}/ ? ors : ""), [=10=]; ors=ORS} END{print ""}' file
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
awk -F ':' 'NF!=37 { curr=[=10=]; getline; [=10=] = curr [=10=] }1' file
输出:
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
我假设每行应该有 37 列。
参见:8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
这可能对你有用 (GNU sed):
sed ':a;N;/\n....-..-.. ..:..:..,/!s/\n//;ta;P;D' file
在模式 space 中打开 2 行 window。
如果第二行不是以日期和时间开头,删除换行符(将第 2 行附加到第 1 行),附加下一行并重复直到失败。
Print/delete 两行中的第一行并重复。
我有以下来自 cdr 文件的输出,如以下代码片段所示。
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2
,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
从代码片段中可以看出,第 2 行应该是第 1 行的一部分,但是有一个换行符将交易分成两行。要求如下
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
如何在已生成的文件中完成此操作? cdr 文件中有数千行类似的行需要更正 .TIA
请您尝试以下操作:
awk '
/^[[:digit:]]{4}-[[:digit:]]{2}-[[:digit:]]{2}/ { # if the record starts with a date
if (line) print line # flush the line buffer
line = ""
}
{
line = line [=10=] # append the current record to the line buffer
}
END {
if (line) print line # flush the line buffer at the end of the file
}
' file.txt
提供示例的输出:
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
$ awk '{printf "%s%s", (/^[0-9]{4}(-[0-9]{2}){2}/ ? ors : ""), [=10=]; ors=ORS} END{print ""}' file
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6
2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
awk -F ':' 'NF!=37 { curr=[=10=]; getline; [=10=] = curr [=10=] }1' file
输出:
2021-10-27 00:06:53,203:16344:5:0:4:573192000019::6:0:4:573160001511:*999#:1:1:573160032001:1:6:732123904909775:1:1:573156068892:::::SUCCESS:PULL:2021-10-2700:06:53.203:101630:20076482:28389:2,2:aa8c2b31-ac49-4c16-9e2f-f8a83ba63cd6 2021-10-27 00:06:57,120:16344:5:0:4:573192000019::6:0:4:573160001511:*111#:1:1:573160032002:1:6:732123907508180:1:1:573134396303:::::SUCCESS:PULL:2021-10-27 00:06:57.12:101631:26706476:11566:3,3,3192244169:d21e7dca-6dfa-43e6-8bcd-b95ebd35cdea
我假设每行应该有 37 列。
参见:8 Powerful Awk Built-in Variables – FS, OFS, RS, ORS, NR, NF, FILENAME, FNR
这可能对你有用 (GNU sed):
sed ':a;N;/\n....-..-.. ..:..:..,/!s/\n//;ta;P;D' file
在模式 space 中打开 2 行 window。
如果第二行不是以日期和时间开头,删除换行符(将第 2 行附加到第 1 行),附加下一行并重复直到失败。
Print/delete 两行中的第一行并重复。