Linux 命令删除整个 csv 文件的自动换行

Linux command to remove word wrap for entire csv file

这是我在 csv 文件中的示例数据。如您所见,对于 ID = '51126',有一列包含自动换行格式的数据,数据是使用 atl+enter 输入的。我需要删除自动换行并为整个 csv 文件输入一行。文件中有很多这样的自动换行!

ID,OPPORTUNITY ID,CREATED_DATE,TIR NAME,MS Rep,SRC_SSR_REP,REGION,HP PBM NAME,COMPANY NAME,COMPANY ADDRESS,COMPANY CITY,COMPANY STATE,COMPANY ZIPCODE,COMPANY AMID,COMPANY USER CONTACT NAME,COMPANY USER TITLE,COMPANY USER PHONE,COMPANY USER EMAIL,PARTNER COMPANY NAME,PARTNER REP NAME,PARTNER REP EMAIL,PARTNER LID,WHOLESALER,PURCHASEDGE AC NUMBER,USAGE PERIOD,DEAL TYPE,CLWB WORKED ON,DEAL NUMBER,NAMED TERRITORY SLED,MONO HP SHARE %,COLOR HP SHARE %,TOTAL HP TONER SHARE %,DEAL VALUE MONO,DEAL VALUE COLOR,TOTAL TONER DEAL VALUE,EST DISCOUNT VALUE,REBATE TYPE MONO,REBATE TYPE COLOR,DISCOUNT TYPE,DEAL START DATE,DEAL END DATE,DEAL EXTENDED END DATE,DEAL POSITION,ECLIPSE ID,ECLIPSE DEAL STATUS,ECLIPSE APPROVED DATE,ECLIPSE DEAL APPROVED BY,LOST REASON,USAGE FILE LOCATION,CREARTED BY,MODIFIED BY,MODIFIED DATE,FINALISATION_RECEIVED_DATE,FINALISATION_WORKED_DATE,DEAL_PROCESSED_BY,DEAL_FINALISED_BY,FUNNEL_COMMENT,AV_SENT_DATE,PL_REMAN_VALUE,PL_REMAN_SHARE,FINALISATION_DOC_PATH,TIME ELAPSING ON,APPROVAL SENT DATE,APPROVAL RECEIVED DATE,SECONDARY_WHOLESALER,PREVIOUSECLIPSE_ID,PurchasEdge_(Y/N),HP_TONER_UNITS,PL_REMAN_UNITS,FINALISATION_COMMENTS,RENEWAL_POSITION,PROGRAM_NAME,CUSTOMERONBOARDEDON
51128,OPP-048699,3/23/2020 21:02,Adam Dohm,Cheryl Glenn,Tiffany Debose,MARKET SOURCE,,"Flathead Valley School District (Kalispell, Whitefish, Columbia Falls)",233 1st Ave E,Kalispell,MT,59901,,Joe Biangone,Purchasing,406-758-8392,biangonej@sd5.k12.mt.us,TONERPORT INCORPORATED, ,,10293955,ESSENDANT,,12 months,Renewal,,CL091515474R4-A,SLED,97,100,98,21592,16781,38373,2452,Defend,Defend,Defend,4/15/2020 0:00,4/14/2021 0:00,4/14/2021 0:00,Won,42921984,,,,,/E/Data/Funnel/Submit/FLATHEAD VALLEY SCHOOL DISTRICT USAGE_51128.xlsx,Tiffany Debose,Tiffany Debose,3/26/2020 14:49,3/26/2020 0:00,,Bhavana P V,,,,613.97,1.6,,,,,NA,42085906,N,179,3,3/26 - Deal added on eclipse ,,SMBA,
51126,OPP-048697,3/23/2020 19:52,Xavier Weems,,Tiffany Debose,EAST,Vladimir Jaksic,"Gray Television, Inc.","​Gray Television, Inc.
4370 Peachtree Rd, NE.
​Atlanta, Ga  30319
​

",,GA,30319,DN042973875,Dottie Boudreau,Manager,404-266-8333,dottie@gray.tv,"STAPLES, INC", ,,"10264576,10252948",NA,,12 months,New,,CL200351126,Commercial - Named,84,89,86,16143,7335,23478,3149,Defend,Defend,Defend,,,,AV summary and PPT sent,,,,,,"/E/Data/Funnel/Submit/GRAY TELEVISION, INC USAGE_51126.xlsb",Tiffany Debose,Tiffany Debose,3/26/2020 8:55,,,Deepthi K,,,3/26/2020 0:00,3239.96,13.8,,6/24/2020 0:00,,,NA,,N,168,27,3/24/2020 - sent for specialist approval 3/26/2020 - aV sent,,MCBigDeal,
51125,OPP-048696,3/23/2020 18:01,Xavier Weems,,Tiffany Debose,WEST,Jenni HoGlin,STURM FINANCIAL GROUP,3033 East First Avenue,Denver,CO,80206,,,,,,"STAPLES, INC", ,,"10264576,10252948",NA,,12 months,New,,CL200351125,Commercial - Non Named,42,87,65,10201,14198,24399,6369,Winback,Defend,Winback,,,,AV summary and PPT sent,,,,,,/E/Data/Funnel/Submit/STURM FINANCIAL GROUP USAGE_51125.xlsx,Tiffany Debose,Tiffany Debose,3/24/2020 7:49,,,Teja Ravi,,,3/24/2020 0:00,8417.66,34.5,,6/22/2020 0:00,,,NA,,N,127,67,3/24-AV Summary and PPT sent,,SMBA,

输出应该如下所示。我只输入了ID=51126和51125供大家参考,还有51128呢! 共有73列!

"ID","OPPORTUNITY ID","CREATED_DATE","TIR NAME","MS Rep","SRC_SSR_REP","REGION","HP PBM NAME","COMPANY NAME","COMPANY ADDRESS","COMPANY CITY","COMPANY STATE","COMPANY ZIPCODE","COMPANY AMID","COMPANY USER CONTACT NAME","COMPANY USER TITLE","COMPANY USER PHONE","COMPANY USER EMAIL","PARTNER COMPANY NAME","PARTNER REP NAME","PARTNER REP EMAIL","PARTNER LID","WHOLESALER","PURCHASEDGE AC NUMBER","USAGE PERIOD","DEAL TYPE","CLWB WORKED ON","DEAL NUMBER","NAMED TERRITORY SLED","MONO HP SHARE %","COLOR HP SHARE %","TOTAL HP TONER SHARE %","DEAL VALUE MONO","DEAL VALUE COLOR","TOTAL TONER DEAL VALUE","EST DISCOUNT VALUE","REBATE TYPE MONO","REBATE TYPE COLOR","DISCOUNT TYPE","DEAL START DATE","DEAL END DATE","DEAL EXTENDED END DATE","DEAL POSITION","ECLIPSE ID","ECLIPSE DEAL STATUS","ECLIPSE APPROVED DATE","ECLIPSE DEAL APPROVED BY","LOST REASON","USAGE FILE LOCATION","CREARTED BY","MODIFIED BY","MODIFIED DATE","FINALISATION_RECEIVED_DATE","FINALISATION_WORKED_DATE","DEAL_PROCESSED_BY","DEAL_FINALISED_BY","FUNNEL_COMMENT","AV_SENT_DATE","PL_REMAN_VALUE","PL_REMAN_SHARE","FINALISATION_DOC_PATH","TIME ELAPSING ON","APPROVAL SENT DATE","APPROVAL RECEIVED DATE","SECONDARY_WHOLESALER","PREVIOUSECLIPSE_ID","PurchasEdge_(Y/N)","HP_TONER_UNITS","PL_REMAN_UNITS","FINALISATION_COMMENTS","RENEWAL_POSITION","PROGRAM_NAME","CUSTOMERONBOARDEDON"
"51126","OPP-048697","3/23/2020 19:52",Xavier Weems","","Tiffany Debose","EAST","Vladimir Jaksic","Gray Television, Inc.","​Gray Television, Inc. 4370 Peachtree Rd, NE. Atlanta, Ga  30319","","GA","30319","DN042973875","Dottie Boudreau","Manager","404-266-8333","dottie@gray.tv","STAPLES, INC","","","10264576,10252948","NA","","12 months","New","","CL200351126","Commercial - Named","84","89","86","16143","7335","23478","3149","Defend","Defend","Defend","","","","AV summary and PPT sent","","","","","","/E/Data/Funnel/Submit/GRAY TELEVISION, INC USAGE_51126.xlsb","Tiffany Debose","Tiffany Debose","3/26/2020 8:55","","","Deepthi K","","","3/26/2020 0:00","3239.96","13.8","","6/24/2020 0:00","","","NA","","N","168","27","3/24/2020 - sent for specialist approval 3/26/2020 - aV sent","","MCBigDeal",""
"51125","OPP-048696","3/23/2020 18:01","Xavier Weems","","Tiffany Debose","WEST","Jenni HoGlin","STURM FINANCIAL GROUP","3033 East First Avenue","Denver","CO","80206","","","","","","STAPLES, INC","","","10264576,10252948","NA","","12 months","New","","CL200351125","Commercial - Non Named","42","87","65","10201","14198","24399","6369","Winback","Defend","Winback","","","","AV summary and PPT sent","","","","","","/E/Data/Funnel/Submit/STURM FINANCIAL GROUP USAGE_51125.xlsx","Tiffany Debose","Tiffany Debose","3/24/2020 7:49","","","Teja Ravi","","","3/24/2020 0:00","8417.66","34.5","","6/22/2020 0:00","","","NA","","N","127","67","3/24-AV Summary and PPT sent","","SMBA",""

我试过下面的代码来删除自动换行!

awk -F '"[^"]+"' 'NF<73{s = s [=13=]; next} s{print s; s=""} 1; END{if (s) print s}' file

还有

awk -F, 'NF!=73&&!line{line=[=14=];next} NF!=73&&line{line=line [=14=]} {n=split(line, a, ",")} n==73{print line;line=""}' file.csv

似乎没有任何实际效果!

请在不使用任何外部 unix 包的情况下建议 linux 代码

试试这个

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", [=10=], RT) }'  

演示

$gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", [=11=], RT) }'  < file1.txt
ID,OPPORTUNITY ID,CREATED_DATE,TIR NAME,MS Rep,SRC_SSR_REP,REGION,HP PBM NAME,COMPANY NAME,COMPANY ADDRESS,COMPANY CITY,COMPANY STATE,COMPANY ZIPCODE,COMPANY AMID,COMPANY USER CONTACT NAME,COMPANY USER TITLE,COMPANY USER PHONE,COMPANY USER EMAIL,PARTNER COMPANY NAME,PARTNER REP NAME,PARTNER REP EMAIL,PARTNER LID,WHOLESALER,PURCHASEDGE AC NUMBER,USAGE PERIOD,DEAL TYPE,CLWB WORKED ON,DEAL NUMBER,NAMED TERRITORY SLED,MONO HP SHARE %,COLOR HP SHARE %,TOTAL HP TONER SHARE %,DEAL VALUE MONO,DEAL VALUE COLOR,TOTAL TONER DEAL VALUE,EST DISCOUNT VALUE,REBATE TYPE MONO,REBATE TYPE COLOR,DISCOUNT TYPE,DEAL START DATE,DEAL END DATE,DEAL EXTENDED END DATE,DEAL POSITION,ECLIPSE ID,ECLIPSE DEAL STATUS,ECLIPSE APPROVED DATE,ECLIPSE DEAL APPROVED BY,LOST REASON,USAGE FILE LOCATION,CREARTED BY,MODIFIED BY,MODIFIED DATE,FINALISATION_RECEIVED_DATE,FINALISATION_WORKED_DATE,DEAL_PROCESSED_BY,DEAL_FINALISED_BY,FUNNEL_COMMENT,AV_SENT_DATE,PL_REMAN_VALUE,PL_REMAN_SHARE,FINALISATION_DOC_PATH,TIME ELAPSING ON,APPROVAL SENT DATE,APPROVAL RECEIVED DATE,SECONDARY_WHOLESALER,PREVIOUSECLIPSE_ID,PurchasEdge_(Y/N),HP_TONER_UNITS,PL_REMAN_UNITS,FINALISATION_COMMENTS,RENEWAL_POSITION,PROGRAM_NAME,CUSTOMERONBOARDEDON
51128,OPP-048699,3/23/2020 21:02,Adam Dohm,Cheryl Glenn,Tiffany Debose,MARKET SOURCE,,"Flathead Valley School District (Kalispell, Whitefish, Columbia Falls)",233 1st Ave E,Kalispell,MT,59901,,Joe Biangone,Purchasing,406-758-8392,biangonej@sd5.k12.mt.us,TONERPORT INCORPORATED, ,,10293955,ESSENDANT,,12 months,Renewal,,CL091515474R4-A,SLED,97,100,98,21592,16781,38373,2452,Defend,Defend,Defend,4/15/2020 0:00,4/14/2021 0:00,4/14/2021 0:00,Won,42921984,,,,,/E/Data/Funnel/Submit/FLATHEAD VALLEY SCHOOL DISTRICT USAGE_51128.xlsx,Tiffany Debose,Tiffany Debose,3/26/2020 14:49,3/26/2020 0:00,,Bhavana P V,,,,613.97,1.6,,,,,NA,42085906,N,179,3,3/26 - Deal added on eclipse ,,SMBA,
51126,OPP-048697,3/23/2020 19:52,Xavier Weems,,Tiffany Debose,EAST,Vladimir Jaksic,"Gray Television, Inc.","​Gray Television, Inc.4370 Peachtree Rd, NE.​Atlanta, Ga  30319​",,GA,30319,DN042973875,Dottie Boudreau,Manager,404-266-8333,dottie@gray.tv,"STAPLES, INC", ,,"10264576,10252948",NA,,12 months,New,,CL200351126,Commercial - Named,84,89,86,16143,7335,23478,3149,Defend,Defend,Defend,,,,AV summary and PPT sent,,,,,,"/E/Data/Funnel/Submit/GRAY TELEVISION, INC USAGE_51126.xlsb",Tiffany Debose,Tiffany Debose,3/26/2020 8:55,,,Deepthi K,,,3/26/2020 0:00,3239.96,13.8,,6/24/2020 0:00,,,NA,,N,168,27,3/24/2020 - sent for specialist approval 3/26/2020 - aV sent,,MCBigDeal,
51125,OPP-048696,3/23/2020 18:01,Xavier Weems,,Tiffany Debose,WEST,Jenni HoGlin,STURM FINANCIAL GROUP,3033 East First Avenue,Denver,CO,80206,,,,,,"STAPLES, INC", ,,"10264576,10252948",NA,,12 months,New,,CL200351125,Commercial - Non Named,42,87,65,10201,14198,24399,6369,Winback,Defend,Winback,,,,AV summary and PPT sent,,,,,,/E/Data/Funnel/Submit/STURM FINANCIAL GROUP USAGE_51125.xlsx,Tiffany Debose,Tiffany Debose,3/24/2020 7:49,,,Teja Ravi,,,3/24/2020 0:00,8417.66,34.5,,6/22/2020 0:00,,,NA,,N,127,67,3/24-AV Summary and PPT sent,,SMBA,
$

假设字段内和每条记录末尾的换行符是 \n 因为如果它在字段内是 \n 并且在导出的每条记录末尾是 \r\n通过 MS-Excel 那么这将是微不足道的,下面使用 GNU awk 进行各种扩展(多字符 RSRTFPAT\s) .

这将合并以下行:

awk -v RS='"[^"]+"' -v ORS= '{
    gsub(/\n/,"",RT)
    print [=10=] RT
}'

这将删除 leading/trailing 个空格并将每个字段括在引号中:

awk -v FPAT='[^,]*|"[^"]+"' -v OFS=',' '{
    for (i=1;i<=NF;i++) {
        gsub(/^"?\s*|\s*"?$/,"",$i)
        printf "\"%s\"%s", $i, (i<NF ? OFS : ORS)
    }
}'

所以你可以在管道中一起使用它们:

$ awk -v RS='"[^"]+"' -v ORS= '{gsub(/\n/,"",RT); print [=12=] RT}' file |
    awk -v FPAT='[^,]*|"[^"]+"' -v OFS=',' '{for (i=1;i<=NF;i++) {gsub(/^"?\s*|\s*"?$/,"",$i); printf "\"%s\"%s", $i, (i<NF ? OFS : ORS)} }'
"ID","OPPORTUNITY ID","CREATED_DATE","TIR NAME","MS Rep","SRC_SSR_REP","REGION","HP PBM NAME","COMPANY NAME","COMPANY ADDRESS","COMPANY CITY","COMPANY STATE","COMPANY ZIPCODE","COMPANY AMID","COMPANY USER CONTACT NAME","COMPANY USER TITLE","COMPANY USER PHONE","COMPANY USER EMAIL","PARTNER COMPANY NAME","PARTNER REP NAME","PARTNER REP EMAIL","PARTNER LID","WHOLESALER","PURCHASEDGE AC NUMBER","USAGE PERIOD","DEAL TYPE","CLWB WORKED ON","DEAL NUMBER","NAMED TERRITORY SLED","MONO HP SHARE %","COLOR HP SHARE %","TOTAL HP TONER SHARE %","DEAL VALUE MONO","DEAL VALUE COLOR","TOTAL TONER DEAL VALUE","EST DISCOUNT VALUE","REBATE TYPE MONO","REBATE TYPE COLOR","DISCOUNT TYPE","DEAL START DATE","DEAL END DATE","DEAL EXTENDED END DATE","DEAL POSITION","ECLIPSE ID","ECLIPSE DEAL STATUS","ECLIPSE APPROVED DATE","ECLIPSE DEAL APPROVED BY","LOST REASON","USAGE FILE LOCATION","CREARTED BY","MODIFIED BY","MODIFIED DATE","FINALISATION_RECEIVED_DATE","FINALISATION_WORKED_DATE","DEAL_PROCESSED_BY","DEAL_FINALISED_BY","FUNNEL_COMMENT","AV_SENT_DATE","PL_REMAN_VALUE","PL_REMAN_SHARE","FINALISATION_DOC_PATH","TIME ELAPSING ON","APPROVAL SENT DATE","APPROVAL RECEIVED DATE","SECONDARY_WHOLESALER","PREVIOUSECLIPSE_ID","PurchasEdge_(Y/N)","HP_TONER_UNITS","PL_REMAN_UNITS","FINALISATION_COMMENTS","RENEWAL_POSITION","PROGRAM_NAME","CUSTOMERONBOARDEDON"
"51128","OPP-048699","3/23/2020 21:02","Adam Dohm","Cheryl Glenn","Tiffany Debose","MARKET SOURCE","","Flathead Valley School District (Kalispell, Whitefish, Columbia Falls)","233 1st Ave E","Kalispell","MT","59901","","Joe Biangone","Purchasing","406-758-8392","biangonej@sd5.k12.mt.us","TONERPORT INCORPORATED","","","10293955","ESSENDANT","","12 months","Renewal","","CL091515474R4-A","SLED","97","100","98","21592","16781","38373","2452","Defend","Defend","Defend","4/15/2020 0:00","4/14/2021 0:00","4/14/2021 0:00","Won","42921984","","","","","/E/Data/Funnel/Submit/FLATHEAD VALLEY SCHOOL DISTRICT USAGE_51128.xlsx","Tiffany Debose","Tiffany Debose","3/26/2020 14:49","3/26/2020 0:00","","Bhavana P V","","","","613.97","1.6","","","","","NA","42085906","N","179","3","3/26 - Deal added on eclipse","","SMBA",""
"51126","OPP-048697","3/23/2020 19:52","Xavier Weems","","Tiffany Debose","EAST","Vladimir Jaksic","Gray Television, Inc.","Gray Television, Inc.4370 Peachtree Rd, NE.​Atlanta, Ga  30319","","GA","30319","DN042973875","Dottie Boudreau","Manager","404-266-8333","dottie@gray.tv","STAPLES, INC","","","10264576,10252948","NA","","12 months","New","","CL200351126","Commercial - Named","84","89","86","16143","7335","23478","3149","Defend","Defend","Defend","","","","AV summary and PPT sent","","","","","","/E/Data/Funnel/Submit/GRAY TELEVISION, INC USAGE_51126.xlsb","Tiffany Debose","Tiffany Debose","3/26/2020 8:55","","","Deepthi K","","","3/26/2020 0:00","3239.96","13.8","","6/24/2020 0:00","","","NA","","N","168","27","3/24/2020 - sent for specialist approval 3/26/2020 - aV sent","","MCBigDeal",""
"51125","OPP-048696","3/23/2020 18:01","Xavier Weems","","Tiffany Debose","WEST","Jenni HoGlin","STURM FINANCIAL GROUP","3033 East First Avenue","Denver","CO","80206","","","","","","STAPLES, INC","","","10264576,10252948","NA","","12 months","New","","CL200351125","Commercial - Non Named","42","87","65","10201","14198","24399","6369","Winback","Defend","Winback","","","","AV summary and PPT sent","","","","","","/E/Data/Funnel/Submit/STURM FINANCIAL GROUP USAGE_51125.xlsx","Tiffany Debose","Tiffany Debose","3/24/2020 7:49","","","Teja Ravi","","","3/24/2020 0:00","8417.66","34.5","","6/22/2020 0:00","","","NA","","N","127","67","3/24-AV Summary and PPT sent","","SMBA",""

否则,请参阅 以了解如何通过一次调用任何 awk 来完成您想要的操作。

这可能对你有用 (GNU sed):

sed -E ':a;N;s/^([^"]*("[^"]*"[^"]*)*"[^"\n]*)\n//;ta;P;D' file |
sed -E ':a;s/^([^"]*("[^"]*"[^"]*)*"[^",]*),/\n/;ta;s/"//g;s/[^,]*/"&"/g;y/\n/,/'

解决方案分为两部分:

  1. 删除双引号之间的所有换行符。
  2. 用双引号将任何逗号分隔的字段括起来

第一次 sed 调用附加以下行(删除中间的换行符),直到一行具有一组平衡的双引号。打印这些行中的第一行,其余行与下一行一起处理,直到打印完所有行。

第二次调用,用换行符替换双引号内的所有逗号,删除所有双引号,所有非逗号字段都用双引号括起来。然后换行符被逗号替换。