如何根据 headers 长度添加字段分隔符?
How to add field separator based on headers length?
我正在尝试为以下文本格式添加分隔符(实际文件有更多字段)。
我看到的是每个字段的长度由每个 header.
下面的每个下划线块 ------------
的长度给出
输入:
NAME ADDRESS PHONE
--------------------- ------------------------------------------------------------ ------------
CLARK KENT 344 Clinton Street, Apartment 3D, midtown Metropolis 11111111
TONY STARK Malibu Point 10880, 902XX 22222222
PETER PARKER 15th Street, Queens, New York City, New York 33333333
所需输出:
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
到目前为止,我的尝试打印了每个 header 的长度,但我不知道如何在以下位置添加字段分隔符 |
:
$ awk 'FNR == 2 {for(i=1; i<=NF; i++) {print length($i)}}'
21
60
12
请帮忙解决这个问题
您可以将此 awk 用于任何版本的 awk
:
awk -v OFS='|' '
NR == 1 {
h = [=10=]
next
}
NR == 2 {
for(i=1; i<NF; i++)
w[i] = (i == 1 ? 1 : w[i-1] + 1) + length($i)
[=10=] = h
}
{
for(i=1; i<=length(w); i++)
[=10=] = substr([=10=], 1, w[i]) "|" substr([=10=], w[i]+i)
} 1' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
基于提供的示例数据的旧解决方案
您可以尝试此 sed
匹配子字符串,其中包含 2 个以上的空格后跟 1 个非空格,并在它们之间插入 |
:
sed -nE '/^-{3,}/! {s/([[:blank:]]{2,})([^[:blank:]])/|/gp;}' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
使用 GNU awk
wid=$(awk '
NR == 2 {
for (i=1; i<=NF; i++) printf "%d ", 1 + length($i)
exit
}
' file)
gawk -v FIELDWIDTHS="$wid" '
NR != 2 {
for (i=1; i<NF; i++) printf "%s|", $i
print $NF
}
' file
到位FIELDWIDTHS
$ awk -v OFS='|' 'NR==1 {h=[=10=]; next}
NR==2 {for(i=1;i<=NF;i++) f=f FS 1+length($i);
FIELDWIDTHS=f;
[=10=]=h}
{=}1' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
对于 FIELDWIDTHS 使用 GNU awk:
$ cat tst.awk
BEGIN { OFS="|" }
NR==1 { hdr=[=10=]; next }
NR==2 {
nf = split([=10=],f)
for (i=1; i<=nf; i++) {
FIELDWIDTHS = (i>1 ? FIELDWIDTHS " 1 " : "") length(f[i])
}
[=10=] = hdr
}
{
for (i=1; i<=NF; i+=2) {
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
我正在尝试为以下文本格式添加分隔符(实际文件有更多字段)。
我看到的是每个字段的长度由每个 header.
下面的每个下划线块------------
的长度给出
输入:
NAME ADDRESS PHONE
--------------------- ------------------------------------------------------------ ------------
CLARK KENT 344 Clinton Street, Apartment 3D, midtown Metropolis 11111111
TONY STARK Malibu Point 10880, 902XX 22222222
PETER PARKER 15th Street, Queens, New York City, New York 33333333
所需输出:
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
到目前为止,我的尝试打印了每个 header 的长度,但我不知道如何在以下位置添加字段分隔符 |
:
$ awk 'FNR == 2 {for(i=1; i<=NF; i++) {print length($i)}}'
21
60
12
请帮忙解决这个问题
您可以将此 awk 用于任何版本的 awk
:
awk -v OFS='|' '
NR == 1 {
h = [=10=]
next
}
NR == 2 {
for(i=1; i<NF; i++)
w[i] = (i == 1 ? 1 : w[i-1] + 1) + length($i)
[=10=] = h
}
{
for(i=1; i<=length(w); i++)
[=10=] = substr([=10=], 1, w[i]) "|" substr([=10=], w[i]+i)
} 1' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
基于提供的示例数据的旧解决方案
您可以尝试此 sed
匹配子字符串,其中包含 2 个以上的空格后跟 1 个非空格,并在它们之间插入 |
:
sed -nE '/^-{3,}/! {s/([[:blank:]]{2,})([^[:blank:]])/|/gp;}' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
使用 GNU awk
wid=$(awk '
NR == 2 {
for (i=1; i<=NF; i++) printf "%d ", 1 + length($i)
exit
}
' file)
gawk -v FIELDWIDTHS="$wid" '
NR != 2 {
for (i=1; i<NF; i++) printf "%s|", $i
print $NF
}
' file
到位FIELDWIDTHS
$ awk -v OFS='|' 'NR==1 {h=[=10=]; next}
NR==2 {for(i=1;i<=NF;i++) f=f FS 1+length($i);
FIELDWIDTHS=f;
[=10=]=h}
{=}1' file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333
对于 FIELDWIDTHS 使用 GNU awk:
$ cat tst.awk
BEGIN { OFS="|" }
NR==1 { hdr=[=10=]; next }
NR==2 {
nf = split([=10=],f)
for (i=1; i<=nf; i++) {
FIELDWIDTHS = (i>1 ? FIELDWIDTHS " 1 " : "") length(f[i])
}
[=10=] = hdr
}
{
for (i=1; i<=NF; i+=2) {
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ awk -f tst.awk file
NAME |ADDRESS |PHONE
CLARK KENT |344 Clinton Street, Apartment 3D, midtown Metropolis |11111111
TONY STARK |Malibu Point 10880, 902XX |22222222
PETER PARKER |15th Street, Queens, New York City, New York |33333333