awk 命令拆分第 n 个字段文本并作为新行插入
Awk command to split nth field text and insert as new rows
这是我之前问题的延续,只是检查我是否能够同时处理这个模型
我有一个巨大的 csv 文件,其中有一个可变长度的字段 11,例如
"xx","x",x,x,x,xx,xx,"x",x,11,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD00000ddfgD00000dsdfD00000snfjD00000djffD00000wedfD00000asdfZ"
"xx","x",x,x,x,xx,xx,"x",x,5,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD"
将字段11拆分为10大小后,我需要6-9个字符。然后我必须将它作为新行插入
我需要如下输出,
"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa"
"xx","x",x,x,x,xx,xx,"x",x,11,"bbbb"
"xx","x",x,x,x,xx,xx,"x",x,11,"abcd"
.
.
.
"xx","x",x,x,x,xx,xx,"x",x,11,"asdf"
"xx","x",x,x,x,xx,xx,"x",x,5,"djff"
.
.
"xx","x",x,x,x,xx,xx,"x",x,5,"deds"
while read -r line1; do
icount=$[icount+1]
col_11=$( echo $line1 | cut -d',' -f11 )
col_10=$( echo $line1 | cut -d',' -f1,2,3,4,5,7,10)
#echo $col_11
col_11_trim=$(echo "$col_11" | tr -d '"')
#echo $col_11_trim
echo $col_11_trim | fold -w10 > $path/col_11_extract
while read -r line2; do
ocount=$[ocount+1]
strng_cut=$(echo $line2 | cut -c6-9)
echo "$col_10",\""$strng_cut"\" >> $path/final_out
done < $path/col_11_extract
done < $input
与awk
:
awk 'BEGIN{FS=OFS=","}
{
eleven=;
len=length(eleven);
for(i=2; i<len-1; i=i+10){
="\"" substr(eleven, i+5, 4) "\"";
print;
}
}' file
for
循环从位置 2
开始并以 len-1
结束,因为字段 11 中有引号。
输出:
"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa"
"xx","x",x,x,x,xx,xx,"x",x,11,"bbbb"
"xx","x",x,x,x,xx,xx,"x",x,11,"abcd"
"xx","x",x,x,x,xx,xx,"x",x,11,"dwas"
"xx","x",x,x,x,xx,xx,"x",x,11,"deds"
"xx","x",x,x,x,xx,xx,"x",x,11,"ddfg"
"xx","x",x,x,x,xx,xx,"x",x,11,"dsdf"
"xx","x",x,x,x,xx,xx,"x",x,11,"snfj"
"xx","x",x,x,x,xx,xx,"x",x,11,"djff"
"xx","x",x,x,x,xx,xx,"x",x,11,"wedf"
"xx","x",x,x,x,xx,xx,"x",x,11,"asdf"
"xx","x",x,x,x,xx,xx,"x",x,5,"aaaa"
"xx","x",x,x,x,xx,xx,"x",x,5,"bbbb"
"xx","x",x,x,x,xx,xx,"x",x,5,"abcd"
"xx","x",x,x,x,xx,xx,"x",x,5,"dwas"
"xx","x",x,x,x,xx,xx,"x",x,5,"deds"
这是我之前问题的延续,只是检查我是否能够同时处理这个模型
我有一个巨大的 csv 文件,其中有一个可变长度的字段 11,例如
"xx","x",x,x,x,xx,xx,"x",x,11,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD00000ddfgD00000dsdfD00000snfjD00000djffD00000wedfD00000asdfZ"
"xx","x",x,x,x,xx,xx,"x",x,5,"00000aaaaD00000bbbbD00000abcdD00000dwasD00000dedsD"
将字段11拆分为10大小后,我需要6-9个字符。然后我必须将它作为新行插入 我需要如下输出,
"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa"
"xx","x",x,x,x,xx,xx,"x",x,11,"bbbb"
"xx","x",x,x,x,xx,xx,"x",x,11,"abcd"
.
.
.
"xx","x",x,x,x,xx,xx,"x",x,11,"asdf"
"xx","x",x,x,x,xx,xx,"x",x,5,"djff"
.
.
"xx","x",x,x,x,xx,xx,"x",x,5,"deds"
while read -r line1; do
icount=$[icount+1]
col_11=$( echo $line1 | cut -d',' -f11 )
col_10=$( echo $line1 | cut -d',' -f1,2,3,4,5,7,10)
#echo $col_11
col_11_trim=$(echo "$col_11" | tr -d '"')
#echo $col_11_trim
echo $col_11_trim | fold -w10 > $path/col_11_extract
while read -r line2; do
ocount=$[ocount+1]
strng_cut=$(echo $line2 | cut -c6-9)
echo "$col_10",\""$strng_cut"\" >> $path/final_out
done < $path/col_11_extract
done < $input
与awk
:
awk 'BEGIN{FS=OFS=","}
{
eleven=;
len=length(eleven);
for(i=2; i<len-1; i=i+10){
="\"" substr(eleven, i+5, 4) "\"";
print;
}
}' file
for
循环从位置 2
开始并以 len-1
结束,因为字段 11 中有引号。
输出:
"xx","x",x,x,x,xx,xx,"x",x,11,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,11,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,11,"abcd" "xx","x",x,x,x,xx,xx,"x",x,11,"dwas" "xx","x",x,x,x,xx,xx,"x",x,11,"deds" "xx","x",x,x,x,xx,xx,"x",x,11,"ddfg" "xx","x",x,x,x,xx,xx,"x",x,11,"dsdf" "xx","x",x,x,x,xx,xx,"x",x,11,"snfj" "xx","x",x,x,x,xx,xx,"x",x,11,"djff" "xx","x",x,x,x,xx,xx,"x",x,11,"wedf" "xx","x",x,x,x,xx,xx,"x",x,11,"asdf" "xx","x",x,x,x,xx,xx,"x",x,5,"aaaa" "xx","x",x,x,x,xx,xx,"x",x,5,"bbbb" "xx","x",x,x,x,xx,xx,"x",x,5,"abcd" "xx","x",x,x,x,xx,xx,"x",x,5,"dwas" "xx","x",x,x,x,xx,xx,"x",x,5,"deds"