处理双引号内的逗号 + awk

handling commas inside double quotes + awk

这是我的文件

$ cat -v test2
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=10=] -"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=10=] - #2"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD -  - #4"

此命令在末尾添加一列

$ awk -F, -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1' test2 | cat -v
"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=11=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=11=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD -  - #4","0"

我的问题是这一行

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"

改成这样

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)"

这个 "0.139818" 在错误的地方。 结果不像其他的。问题似乎是包含在该列中双引号中的逗号: "OPPO X9076,OPPO R6006,"0.139818",OPPO N5116,OPPO X9006"

实现此目标的最佳方法是什么?这就是我想要的线条,就像其他线条一样。

"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO N5116,OPPO X9006",".95 Carryover Plan (1GB)","0.139818"

也许我需要整理数据,尤其是在它进入 awk 之前这一行。


EDIT1 答案已解决

将分隔符从 , 更改为 ;并在末尾添加新列

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Device Model";"Product Description";"Data_Volume_MB"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"Samsung SM-G900I";" Plan";"0.131383"
"2015-10-06";"592";"620";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16";"0"
"2015-10-06";"007";"290";"0";"Apple Inc";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=13=] -";"0"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan";"46.5744"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)";"0.139818"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"Samsung SM-G360G";" CARRYOVER PLAN";"108.486"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"Apple iPhone S (A1530)";"PREPAY STD - [=13=] - #2";"18.9218"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"HUAWEI HUAWEI G526-L11";"PREPAY STD -  - #4";"0"

将分隔符从 , 更改为 |并在末尾添加新列

$ sed 's/","/"|"/g' < test2 | awk -F'|' -v OFS='|' -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{n=; gsub(/"/,"",n); = q n/(1024*1024)q}1'
"Rec Open Date"|"MSISDN"|"IMEI"|"Data Volume (Bytes)"|"Device Manufacturer"|"Device Model"|"Product Description"|"Data_Volume_MB"
"2015-10-06"|"427"|"060"|"137765"|"Samsung Korea"|"Samsung SM-G900I"|" Plan"|"0.131383"
"2015-10-06"|"592"|"620"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY  STD - TRIAL - #16"|"0"
"2015-10-06"|"007"|"290"|"0"|"Apple Inc"|"Apple iPhone 6 (A1586)"|"PREPAY PLUS - [=14=] -"|"0"
"2015-10-06"|"592"|"050"|"48836832"|"Apple Inc"|"Apple iPhone 5S (A1530)"|"Talk and Text Connect Flexi Plan"|"46.5744"
"2016-04-27"|"498"|"220"|"146610"|"Guangdong Oppo Mobile Telecommunications Corp Ltd"|"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006"|".95 Carryover Plan (1GB)"|"0.139818"
"2015-10-06"|"409"|"720"|"113755347"|"Samsung Korea"|"Samsung SM-G360G"|" CARRYOVER PLAN"|"108.486"
"2015-10-06"|"742"|"620"|"19840943"|"Apple Inc"|"Apple iPhone S (A1530)"|"PREPAY STD - [=14=] - #2"|"18.9218"
"2015-10-06"|"387"|"180"|"0"|"HUAWEI Technologies Co Ltd"|"HUAWEI HUAWEI G526-L11"|"PREPAY STD -  - #4"|"0"

将分隔符从 , 更改为 ;并将其插入倒数第二列之前

$ sed 's/","/";"/g' < test2 | awk -F';' -v OFS=';' -v q='"' 'NR==1{$(NF-1)=q"Data_Volume_MB"q FS $(NF-1)} NR>1{n=; gsub(/"/,"",n); $(NF-1)= q n/(1024*1024)q FS $(NF-1)}1'
"Rec Open Date";"MSISDN";"IMEI";"Data Volume (Bytes)";"Device Manufacturer";"Data_Volume_MB";"Device Model";"Product Description"
"2015-10-06";"427";"060";"137765";"Samsung Korea";"0.131383";"Samsung SM-G900I";" Plan"
"2015-10-06";"592";"620";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY  STD - TRIAL - #16"
"2015-10-06";"007";"290";"0";"Apple Inc";"0";"Apple iPhone 6 (A1586)";"PREPAY PLUS - [=15=] -"
"2015-10-06";"592";"050";"48836832";"Apple Inc";"46.5744";"Apple iPhone 5S (A1530)";"Talk and Text Connect Flexi Plan"
"2016-04-27";"498";"220";"146610";"Guangdong Oppo Mobile Telecommunications Corp Ltd";"0.139818";"OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006";".95 Carryover Plan (1GB)"
"2015-10-06";"409";"720";"113755347";"Samsung Korea";"108.486";"Samsung SM-G360G";" CARRYOVER PLAN"
"2015-10-06";"742";"620";"19840943";"Apple Inc";"18.9218";"Apple iPhone S (A1530)";"PREPAY STD - [=15=] - #2"
"2015-10-06";"387";"180";"0";"HUAWEI Technologies Co Ltd";"0";"HUAWEI HUAWEI G526-L11";"PREPAY STD -  - #4"

我建议先更改您的字段分隔符,就像这样(这里我将其从 , 更改为 |):

sed 's/","/"|"/g' < test2 > newfile

然后在 newfile 上使用您的 awk 代码。

你当然可以把所有这些都放在一行中(我在这里没有使用你的 awk 代码,而是使用我自己的 awk 代码作为例子):

sed 's/","/"|"/g' < test2 | awk 'BEGIN{FS="|"} {print  }'

为了回应 OP 评论,请务必 运行 您的命令(注意我将 -F, 更改为 -F"|":

    sed 's/","/"|"/g' < test2 | awk -F"|" -v OFS=, -v q='"' 'NR==1{=q"Data_Volume_MB"q} NR>1{=; gsub(/"/,"",); = q /(1024*1024)q}1'

使用您的数据,这是我的结果:

"Rec Open Date","MSISDN","IMEI","Data Volume (Bytes)","Device Manufacturer","Device Model","Product Description","Data_Volume_MB"
"2015-10-06","427","060","137765","Samsung Korea","Samsung SM-G900I"," Plan","0.131383"
"2015-10-06","592","620","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY  STD - TRIAL - #16","0"
"2015-10-06","007","290","0","Apple Inc","Apple iPhone 6 (A1586)","PREPAY PLUS - [=13=] -","0"
"2015-10-06","592","050","48836832","Apple Inc","Apple iPhone 5S (A1530)","Talk and Text Connect Flexi Plan","46.5744"
"2016-04-27","498","220","146610","Guangdong Oppo Mobile Telecommunications Corp Ltd","OPPO X9076,OPPO R6006,OPPO R6001,OPPO N5116,OPPO X9006","$ Carryover Plan (1GB)","0.139818"
"2015-10-06","409","720","113755347","Samsung Korea","Samsung SM-G360G"," CARRYOVER PLAN","108.486"
"2015-10-06","742","620","19840943","Apple Inc","Apple iPhone S (A1530)","PREPAY STD - [=13=] - #2","18.9218"
"2015-10-06","387","180","0","HUAWEI Technologies Co Ltd","HUAWEI HUAWEI G526-L11","PREPAY STD -  - #4","0"