awk 用于文本处理 csv 文件
awk for text processing a cvs file
我有一些大的 *.cvs 文本文件,如下所示:
Word,Tag,Lemma
Off,aa,off
short,aa,short
and,sfg3eþ,and
tall,sþghen,tall
deers,aþ,deer
in,never,in
Africa,nc,Africa
frv.,aa,frv.
---,ta,---
,,
All,nhfn,all
allowed,lhfnsf,allow
personell,c,personell
aggr.,lheþsf,aggr.
with,aþ,with
23,ta,23
as.,nvfn,as.
sillable.,lheþsf,sillable.
,,
Á,aþ,á
我需要处理此文件,以便将第一列放入这样的列表中:
{[Off short and tall deers in Africa frv],[All allowed personnel aggr. with 23 as syllable.],[Á......],...n]}
最后需要一个:]}
我尝试过的:
awk 'BEGIN {FS=",";print"{["} /",,"/ {print"],["} END {print"]}"}' 079.cvs
这只是打印:
{[
]}
我也发现了这个:
cat 080.csv | cut -d ',' -f3 >>D.txt
这其实很有用:
Off
short
and
tall
....
但实际上是 "deep" 文件并且缺少列表元素。
更新
awk -F, 'NR==1{printf "{["; next} /^--/||!{if(a)printf "],["; a=0; next} {printf "%s ",; a=1} END{printf "]}"}' file
{[Off short and tall deers in Africa frv. ],[All allowed personell aggr. with 23 as. sillable. ],[Á ]}
我有一些大的 *.cvs 文本文件,如下所示:
Word,Tag,Lemma
Off,aa,off
short,aa,short
and,sfg3eþ,and
tall,sþghen,tall
deers,aþ,deer
in,never,in
Africa,nc,Africa
frv.,aa,frv.
---,ta,---
,,
All,nhfn,all
allowed,lhfnsf,allow
personell,c,personell
aggr.,lheþsf,aggr.
with,aþ,with
23,ta,23
as.,nvfn,as.
sillable.,lheþsf,sillable.
,,
Á,aþ,á
我需要处理此文件,以便将第一列放入这样的列表中:
{[Off short and tall deers in Africa frv],[All allowed personnel aggr. with 23 as syllable.],[Á......],...n]}
最后需要一个:]}
我尝试过的:
awk 'BEGIN {FS=",";print"{["} /",,"/ {print"],["} END {print"]}"}' 079.cvs
这只是打印: {[ ]}
我也发现了这个:
cat 080.csv | cut -d ',' -f3 >>D.txt
这其实很有用:
Off
short
and
tall
....
但实际上是 "deep" 文件并且缺少列表元素。
更新
awk -F, 'NR==1{printf "{["; next} /^--/||!{if(a)printf "],["; a=0; next} {printf "%s ",; a=1} END{printf "]}"}' file
{[Off short and tall deers in Africa frv. ],[All allowed personell aggr. with 23 as. sillable. ],[Á ]}