使用 awk 在 .CSV 逗号分隔文件中添加双引号
Add double quotes in .CSV comma delimited file using awk
您好,我需要制作一个大的 csv 文件(2000 万行),为每个逗号分隔的字段添加双引号。 csv 文件有 8 个字段,逗号分隔如下:
'2016-03-12','12393659','134',,'35533605',189348,9798,gmail.com;live_com.com
'2016-03-12','12390103','138',,'35438006',5133,1897,google.com
'2016-03-12','45616164','139',,'01318800',10945593,596633,facebook.com;tumblr.com;t.co
'2016-03-12','45673436','38',,'86441702',4350985,150327,serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net
如您所见,前 3 个字段在单引号之间,第 4 个是空白,第 5 个在单引号之间,第 6 到第 8 个仅用逗号分隔。
我想得到以下结果(也是第 4 个字段,即使为空也需要用双引号引起来):
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985,"150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
我通过混合使用 sed 和 awk 获得了部分结果:
sed -e s/\'//g inpu.csv > output.csv eliminate quotes
awk '{gsub(/[^,]+/,"\"&\"")}1' output.csv > output1.csv add double quotes
但是第 4 个字段不是双引号,我需要尽可能减少详细说明时间。
无论如何,以更好的性能和第 4 个字段双引号帮助在 awk 中完成所有工作。
非常感谢您的帮助。 M.Tave
试试这个 awk 单行代码:
awk -F, -v OFS="," -v re="^'?|'?$" -v q='"'
'{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' file
想法是,使用 gsub()
为那些非空字段添加双引号。那些空字段,只需在头和尾添加 "
即可。替换正则表达式被定义为脚本外的 awk 变量,以避免逃逸。
它适用于您在此处输入的数据:
kent$ awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' f
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
如果您的数据真的那么简单,没有嵌入引号或换行符或任何其他内容,那么您只需要:
$ awk -F"'?,'?" -v OFS='","' '{=; gsub(/^.|$/,"\"")} 1' file
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
您好,我需要制作一个大的 csv 文件(2000 万行),为每个逗号分隔的字段添加双引号。 csv 文件有 8 个字段,逗号分隔如下:
'2016-03-12','12393659','134',,'35533605',189348,9798,gmail.com;live_com.com
'2016-03-12','12390103','138',,'35438006',5133,1897,google.com
'2016-03-12','45616164','139',,'01318800',10945593,596633,facebook.com;tumblr.com;t.co
'2016-03-12','45673436','38',,'86441702',4350985,150327,serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net
如您所见,前 3 个字段在单引号之间,第 4 个是空白,第 5 个在单引号之间,第 6 到第 8 个仅用逗号分隔。 我想得到以下结果(也是第 4 个字段,即使为空也需要用双引号引起来):
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985,"150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
我通过混合使用 sed 和 awk 获得了部分结果:
sed -e s/\'//g inpu.csv > output.csv eliminate quotes
awk '{gsub(/[^,]+/,"\"&\"")}1' output.csv > output1.csv add double quotes
但是第 4 个字段不是双引号,我需要尽可能减少详细说明时间。 无论如何,以更好的性能和第 4 个字段双引号帮助在 awk 中完成所有工作。 非常感谢您的帮助。 M.Tave
试试这个 awk 单行代码:
awk -F, -v OFS="," -v re="^'?|'?$" -v q='"'
'{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' file
想法是,使用 gsub()
为那些非空字段添加双引号。那些空字段,只需在头和尾添加 "
即可。替换正则表达式被定义为脚本外的 awk 变量,以避免逃逸。
它适用于您在此处输入的数据:
kent$ awk -F, -v OFS="," -v re="^'?|'?$" -v q='"' '{for(i=1;i<=NF;i++)if($i)gsub(re,q,$i);else $i=q$i q}7' f
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"
如果您的数据真的那么简单,没有嵌入引号或换行符或任何其他内容,那么您只需要:
$ awk -F"'?,'?" -v OFS='","' '{=; gsub(/^.|$/,"\"")} 1' file
"2016-03-12","12393659","134","","35533605","189348","9798","gmail.com;live_com.com"
"2016-03-12","12390103","138","","35438006","5133","1897","google.com"
"2016-03-12","45616164","139","","01318800","10945593","596633","facebook.com;tumblr.com;t.co"
"2016-03-12","45673436","38","","86441702","4350985","150327","serving-sys.com;chartboost.com;admarvel.com;mydas.mobi;adap.tv;cloudfront.net"