处理双引号内的逗号 + awk
handling commas inside double quotes + awk
这是我的文件
$ cat -v test2
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=10=] -"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=10=] - #2"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4"
此命令在末尾添加一列
$ awk -F, -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1' test2 | cat -v
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=11=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=11=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4","0"
我的问题是这一行
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
改成这样
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
这个 "0.139818"
在错误的地方。
结果不像其他的。问题似乎是包含在该列中双引号中的逗号:
"OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006"
实现此目标的最佳方法是什么?这就是我想要的线条,就像其他线条一样。
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)","0.139818"
也许我需要整理数据,尤其是在它进入 awk 之前这一行。
EDIT1 答案已解决
将分隔符从 , 更改为 ;并在末尾添加新列
$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Device Model";"Product Description";"Data_Volume_MB"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"Samsung SM-G900I";" Plan";"0.131383"
"2015-10-06";"592";"620";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY STD - TRIAL - #16";"0"
"2015-10-06";"007";"290";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=13=] -";"0"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan";"46.5744"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)";"0.139818"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"Samsung SM-G360G";" CARRYOVER PLAN";"108.486"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"Apple iPhone S (A1530)";"PREPAY STD - [=13=] - #2";"18.9218"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"HUAWEI HUAWEI G526-L11";"PREPAY STD - - #4";"0"
将分隔符从 , 更改为 |并在末尾添加新列
$ sed 's/","/"|"/g' < test2 | awk -F'|' -v OFS='|' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date"|"MSISDN"|"IMEI"|"Data Volume (Bytes)"|"Device Manufacturer"|"Device Model"|"Product Description"|"Data_Volume_MB"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|" Plan"|"0.131383"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY STD - TRIAL - #16"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - [=14=] -"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"
"2016-04-27"|"498"|"220"|"146610"|"Guangdong Oppo Mobile Telecommunications Corp Ltd"|"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006"|".95 Carryover Plan (1GB)"|"0.139818"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|" CARRYOVER PLAN"|"108.486"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - [=14=] - #2"|"18.9218"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD - - #4"|"0"
将分隔符从 , 更改为 ;并将其插入倒数第二列之前
$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$(NF-1)=q"Data_Volume_MB"q FS $(NF-1)} NR>1{n=; gsub(/"/,"",n); $(NF-1)= q n/(1024*1024)q FS $(NF-1)}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Data_Volume_MB";"Device Model";"Product Description"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"0.131383";"Samsung SM-G900I";" Plan"
"2015-10-06";"592";"620";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY STD - TRIAL - #16"
"2015-10-06";"007";"290";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=15=] -"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"46.5744";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"0.139818";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"108.486";"Samsung SM-G360G";" CARRYOVER PLAN"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"18.9218";"Apple iPhone S (A1530)";"PREPAY STD - [=15=] - #2"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"0";"HUAWEI HUAWEI G526-L11";"PREPAY STD - - #4"
我建议先更改您的字段分隔符,就像这样(这里我将其从 ,
更改为 |
):
sed 's/","/"|"/g' < test2 > newfile
然后在 newfile
上使用您的 awk
代码。
你当然可以把所有这些都放在一行中(我在这里没有使用你的 awk
代码,而是使用我自己的 awk
代码作为例子):
sed 's/","/"|"/g' < test2 | awk 'BEGIN{FS="|"} {print }'
为了回应 OP 评论,请务必 运行 您的命令(注意我将 -F,
更改为 -F"|"
:
sed 's/","/"|"/g' < test2 | awk -F"|" -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1'
使用您的数据,这是我的结果:
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=13=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$ Carryover Plan (1GB)","0.139818"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=13=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4","0"
这是我的文件
$ cat -v test2
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=10=] -"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=10=] - #2"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4"
此命令在末尾添加一列
$ awk -F, -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1' test2 | cat -v
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=11=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=11=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4","0"
我的问题是这一行
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
改成这样
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
这个 "0.139818"
在错误的地方。
结果不像其他的。问题似乎是包含在该列中双引号中的逗号:
"OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006"
实现此目标的最佳方法是什么?这就是我想要的线条,就像其他线条一样。
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)","0.139818"
也许我需要整理数据,尤其是在它进入 awk 之前这一行。
EDIT1 答案已解决
将分隔符从 , 更改为 ;并在末尾添加新列
$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Device Model";"Product Description";"Data_Volume_MB"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"Samsung SM-G900I";" Plan";"0.131383"
"2015-10-06";"592";"620";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY STD - TRIAL - #16";"0"
"2015-10-06";"007";"290";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=13=] -";"0"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan";"46.5744"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)";"0.139818"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"Samsung SM-G360G";" CARRYOVER PLAN";"108.486"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"Apple iPhone S (A1530)";"PREPAY STD - [=13=] - #2";"18.9218"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"HUAWEI HUAWEI G526-L11";"PREPAY STD - - #4";"0"
将分隔符从 , 更改为 |并在末尾添加新列
$ sed 's/","/"|"/g' < test2 | awk -F'|' -v OFS='|' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date"|"MSISDN"|"IMEI"|"Data Volume (Bytes)"|"Device Manufacturer"|"Device Model"|"Product Description"|"Data_Volume_MB"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|" Plan"|"0.131383"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY STD - TRIAL - #16"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - [=14=] -"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"
"2016-04-27"|"498"|"220"|"146610"|"Guangdong Oppo Mobile Telecommunications Corp Ltd"|"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006"|".95 Carryover Plan (1GB)"|"0.139818"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|" CARRYOVER PLAN"|"108.486"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - [=14=] - #2"|"18.9218"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD - - #4"|"0"
将分隔符从 , 更改为 ;并将其插入倒数第二列之前
$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$(NF-1)=q"Data_Volume_MB"q FS $(NF-1)} NR>1{n=; gsub(/"/,"",n); $(NF-1)= q n/(1024*1024)q FS $(NF-1)}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Data_Volume_MB";"Device Model";"Product Description"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"0.131383";"Samsung SM-G900I";" Plan"
"2015-10-06";"592";"620";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY STD - TRIAL - #16"
"2015-10-06";"007";"290";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=15=] -"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"46.5744";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"0.139818";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"108.486";"Samsung SM-G360G";" CARRYOVER PLAN"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"18.9218";"Apple iPhone S (A1530)";"PREPAY STD - [=15=] - #2"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"0";"HUAWEI HUAWEI G526-L11";"PREPAY STD - - #4"
我建议先更改您的字段分隔符,就像这样(这里我将其从 ,
更改为 |
):
sed 's/","/"|"/g' < test2 > newfile
然后在 newfile
上使用您的 awk
代码。
你当然可以把所有这些都放在一行中(我在这里没有使用你的 awk
代码,而是使用我自己的 awk
代码作为例子):
sed 's/","/"|"/g' < test2 | awk 'BEGIN{FS="|"} {print }'
为了回应 OP 评论,请务必 运行 您的命令(注意我将 -F,
更改为 -F"|"
:
sed 's/","/"|"/g' < test2 | awk -F"|" -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1'
使用您的数据,这是我的结果:
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=13=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$ Carryover Plan (1GB)","0.139818"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=13=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD - - #4","0"