如何根据 headers 长度添加字段分隔符?

How to add field separator based on headers length?

我正在尝试为以下文本格式添加分隔符(实际文件有更多字段)。

我看到的是每个字段的长度由每个 header.

下面的每个下划线块 ------------ 的长度给出

输入:

NAME                  ADDRESS                                                      PHONE       
--------------------- ------------------------------------------------------------ ------------
CLARK KENT            344 Clinton Street, Apartment 3D, midtown Metropolis         11111111    
TONY STARK            Malibu Point 10880, 902XX                                    22222222    
PETER PARKER          15th Street, Queens, New York City, New York                 33333333

所需输出:

NAME                 |ADDRESS                                                     |PHONE       
CLARK KENT           |344 Clinton Street, Apartment 3D, midtown Metropolis        |11111111    
TONY STARK           |Malibu Point 10880, 902XX                                   |22222222    
PETER PARKER         |15th Street, Queens, New York City, New York                |33333333

到目前为止,我的尝试打印了每个 header 的长度,但我不知道如何在以下位置添加字段分隔符 |

$ awk 'FNR == 2 {for(i=1; i<=NF; i++) {print length($i)}}'
21
60
12

请帮忙解决这个问题

您可以将此 awk 用于任何版本的 awk:

awk -v OFS='|' '
NR == 1 {
   h = [=10=]
   next
}
NR == 2 {
   for(i=1; i<NF; i++)
      w[i] = (i == 1 ? 1 : w[i-1] + 1) + length($i)
   [=10=] = h
}
{
   for(i=1; i<=length(w); i++)
     [=10=] = substr([=10=], 1, w[i]) "|" substr([=10=], w[i]+i)
} 1' file

NAME                  |ADDRESS                                                     |PHONE
CLARK KENT            |344 Clinton Street, Apartment 3D, midtown Metropolis        |11111111
TONY STARK            |Malibu Point 10880, 902XX                                   |22222222
PETER PARKER          |15th Street, Queens, New York City, New York                |33333333

基于提供的示例数据的旧解决方案

您可以尝试此 sed 匹配子字符串,其中包含 2 个以上的空格后跟 1 个非空格,并在它们之间插入 |

sed -nE '/^-{3,}/! {s/([[:blank:]]{2,})([^[:blank:]])/|/gp;}' file

NAME                  |ADDRESS                                                      |PHONE
CLARK KENT            |344 Clinton Street, Apartment 3D, midtown Metropolis         |11111111
TONY STARK            |Malibu Point 10880, 902XX                                    |22222222
PETER PARKER          |15th Street, Queens, New York City, New York                 |33333333

使用 GNU awk

wid=$(awk '
  NR == 2 {
    for (i=1; i<=NF; i++) printf "%d ", 1 + length($i)
    exit
  }
' file)

gawk -v FIELDWIDTHS="$wid" '
  NR != 2 {
    for (i=1; i<NF; i++) printf "%s|", $i
    print $NF
  }
' file

到位FIELDWIDTHS

 $ awk -v OFS='|' 'NR==1 {h=[=10=]; next} 
                   NR==2 {for(i=1;i<=NF;i++) f=f FS 1+length($i); 
                          FIELDWIDTHS=f; 
                          [=10=]=h} 
                         {=}1' file

NAME                  |ADDRESS                                                      |PHONE
CLARK KENT            |344 Clinton Street, Apartment 3D, midtown Metropolis         |11111111
TONY STARK            |Malibu Point 10880, 902XX                                    |22222222
PETER PARKER          |15th Street, Queens, New York City, New York                 |33333333

对于 FIELDWIDTHS 使用 GNU awk:

$ cat tst.awk
BEGIN { OFS="|" }
NR==1 { hdr=[=10=]; next }
NR==2 {
    nf = split([=10=],f)
    for (i=1; i<=nf; i++) {
        FIELDWIDTHS = (i>1 ? FIELDWIDTHS " 1 " : "") length(f[i])
    }
    [=10=] = hdr
}
{
    for (i=1; i<=NF; i+=2) {
        printf "%s%s", $i, (i<NF ? OFS : ORS)
    }
}

$ awk -f tst.awk file
NAME                 |ADDRESS                                                     |PHONE
CLARK KENT           |344 Clinton Street, Apartment 3D, midtown Metropolis        |11111111
TONY STARK           |Malibu Point 10880, 902XX                                   |22222222
PETER PARKER         |15th Street, Queens, New York City, New York                |33333333