unix中" "之间如何查找和替换
How to search and replace , between " " in unix
输入:
20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
理想结果:
20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30
如何去掉引号之间的逗号?
引号之间也有没有逗号的行。
我需要删除 ,"JUDICIARY, STATE COURTS (STATE COURTS)"
中的逗号(两次出现在同一行)。
和有几个字段在double
之间用逗号
这是一个演示如何操作的脚本 — 欢迎来到 sed
中 goto
的世界。这是使用 BSD sed
编写的,它使用 -E
来启用扩展的正则表达式; GNU sed
使用 -r
完成相同的任务。
sed -E -e 's/^/A: /p; s/^A: /B: /' \
-e ':again' \
-e 's/^(([^"]*|"[^",]*")*)("[^"]*),([^"]*")//' \
-e 't again' \
data
假设数据在名为 data
的文件中。第一个 -e
简单地回显带有 A:
前缀的原始输入,然后将前缀更改为 B:
。这是调试 material。第二个 -e
创建一个可以跳转到的标签 again
。如果上一步进行了替换,第四个 -e
会跳转到 again
标签。
所有的兴奋都在第三个-e
。该模式查找行的开头,然后是零次或多次出现的序列
“不是双引号”或“双引号后跟零个或多个 'not double quote' 和一个双引号”,后跟一个双引号,一个 'not double quote' 序列,一个逗号,更多 'not double quotes'和双引号。这被前缀替换,双引号之间逗号之前的部分和双引号之间逗号之后的部分。
给定一个数据文件:
2000,"xxxx,xxxx",192.168.3.2
2000,"xx,xx,xx",192.16.3.2
2000,"xxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
201,"x,x",192.168.3.2,"yyy"
201,"xx",192.168.3.2,"yyy",2211
201,"xxx",192.168.3.2,"y,y"
201,"xxx",192.168.3.2,"yyy"
201,"x,x",192.168.3.2,"y,y"
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
脚本产生输出:
A: 2000,"xxxx,xxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 2000,"xx,xx,xx",192.16.3.2
B: 2000,"xxxxxx",192.16.3.2
A: 2000,"xxxxxxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2,"yyyyyyyyyy"
A: 20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
B: 201,"xx",192.168.3.2,"yy","aaaaccccdddd",192,"zzzz",234
A: 201,"x,x",192.168.3.2,"yyy"
B: 201,"xx",192.168.3.2,"yyy"
A: 201,"xx",192.168.3.2,"yyy",2211
B: 201,"xx",192.168.3.2,"yyy",2211
A: 201,"xxx",192.168.3.2,"y,y"
B: 201,"xxx",192.168.3.2,"yy"
A: 201,"xxx",192.168.3.2,"yyy"
B: 201,"xxx",192.168.3.2,"yyy"
A: 201,"x,x",192.168.3.2,"y,y"
B: 201,"xx",192.168.3.2,"yy"
A: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
B: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30
请注意:这很难。如果您有选择,请使用可识别 CSV 格式的工具。比如Python自带CSV模块; Perl 有 Text::CSV
(和附属模块 Text::CSV_PP
和 Text::CSV_XS
)可以处理这个;有用于操作 CSV 文件的自定义工具。
另请注意,Microsoft 支持的符号与 RFC 4180 略有不同,这是 Internet World 试图合理化 Microsoft 使用的内容(初步近似)。
输入:
20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
理想结果:
20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30
如何去掉引号之间的逗号? 引号之间也有没有逗号的行。
我需要删除 ,"JUDICIARY, STATE COURTS (STATE COURTS)"
中的逗号(两次出现在同一行)。
和
这是一个演示如何操作的脚本 — 欢迎来到 sed
中 goto
的世界。这是使用 BSD sed
编写的,它使用 -E
来启用扩展的正则表达式; GNU sed
使用 -r
完成相同的任务。
sed -E -e 's/^/A: /p; s/^A: /B: /' \
-e ':again' \
-e 's/^(([^"]*|"[^",]*")*)("[^"]*),([^"]*")//' \
-e 't again' \
data
假设数据在名为 data
的文件中。第一个 -e
简单地回显带有 A:
前缀的原始输入,然后将前缀更改为 B:
。这是调试 material。第二个 -e
创建一个可以跳转到的标签 again
。如果上一步进行了替换,第四个 -e
会跳转到 again
标签。
所有的兴奋都在第三个-e
。该模式查找行的开头,然后是零次或多次出现的序列
“不是双引号”或“双引号后跟零个或多个 'not double quote' 和一个双引号”,后跟一个双引号,一个 'not double quote' 序列,一个逗号,更多 'not double quotes'和双引号。这被前缀替换,双引号之间逗号之前的部分和双引号之间逗号之后的部分。
给定一个数据文件:
2000,"xxxx,xxxx",192.168.3.2
2000,"xx,xx,xx",192.16.3.2
2000,"xxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
201,"x,x",192.168.3.2,"yyy"
201,"xx",192.168.3.2,"yyy",2211
201,"xxx",192.168.3.2,"y,y"
201,"xxx",192.168.3.2,"yyy"
201,"x,x",192.168.3.2,"y,y"
Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
脚本产生输出:
A: 2000,"xxxx,xxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 2000,"xx,xx,xx",192.16.3.2
B: 2000,"xxxxxx",192.16.3.2
A: 2000,"xxxxxxxx",192.168.3.2
B: 2000,"xxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxx,xxxxxxxxxxxx",192.168.3.2,"yyyyy,yyyyy"
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2,"yyyyyyyyyy"
A: 20000000,"xxxxxxxxxxxxx,xxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
B: 20000000,"xxxxxxxxxxxxxxxxxxxxxxxx",192.168.3.2
A: 201,"x,x",192.168.3.2,"y,y","aaaa,cccc,dddd",192,"zzzz",234
B: 201,"xx",192.168.3.2,"yy","aaaaccccdddd",192,"zzzz",234
A: 201,"x,x",192.168.3.2,"yyy"
B: 201,"xx",192.168.3.2,"yyy"
A: 201,"xx",192.168.3.2,"yyy",2211
B: 201,"xx",192.168.3.2,"yyy",2211
A: 201,"xxx",192.168.3.2,"y,y"
B: 201,"xxx",192.168.3.2,"yy"
A: 201,"xxx",192.168.3.2,"yyy"
B: 201,"xxx",192.168.3.2,"yyy"
A: 201,"x,x",192.168.3.2,"y,y"
B: 201,"xx",192.168.3.2,"yy"
A: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY, STATE COURTS (STATE COURTS)",112.78.224.213/30
B: Exchange subsidary,Passed,00021423SNG,R-JAM-05-03,US (First Exchange),20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.212.12/30,00052312SNG,R-JPODIU-023-07,US (First Exchange) ,20000000,"JUDICIARY STATE COURTS (STATE COURTS)",112.78.224.213/30
请注意:这很难。如果您有选择,请使用可识别 CSV 格式的工具。比如Python自带CSV模块; Perl 有 Text::CSV
(和附属模块 Text::CSV_PP
和 Text::CSV_XS
)可以处理这个;有用于操作 CSV 文件的自定义工具。
另请注意,Microsoft 支持的符号与 RFC 4180 略有不同,这是 Internet World 试图合理化 Microsoft 使用的内容(初步近似)。