我在 Unix 中有一个制表符分隔的文件,它有数据问题

I have a Tab separated file in Unix which has data issue

我必须确保每行有 4 列,但输入数据相当混乱:

ID 字段和 "god bless me" 最后一列 PNumber 不是空字段。

正如你所见,第 4 行因为 "Description column" 中的换行符而变得混乱,它跨越了多行。

ID  Name    Description Phnumber
1051    John    5674 I am doing good, is this task we need to fix   908342
1065    Rohit               9876246
10402   rob I am    
    doing good, 
    is this task we need to fix     908341
105552  "Julin rob hain"    i know what to do just let me do it     
    "
    "
    "
    "
    "
    " 
908452   1051   Dave    I am doing reporting this week  88889999

也许截图会更容易看出问题

每一行都以一个数字开始,以一个数字结束。每行应有 4 列。

期望的输出

ID      Name    Description                                         Phnumber
1051    John    5674 I am doing good, is this task we need to fix    908342
1065    Rohit                                                        9876246
10402   rob   I am doing good, 563 is this task we need to fix       908341
105552  "Julin rob hain" i know what to do just let me do it         908452   
1051    Dave    I am doing reporting this week                      88889999

数据为样本数据实际文件有12列。是的,列之间可以有数字,少数是日期字段(如 2017-03-02)

awk 救援!

假设除第一个和最后一个字段外,所有数字字段都没有出现

awk 'NR==1; 
     NR>1 {for(i=1;i<=NF;i++) 
             {if($i~/[0-9]+/) s=!s; printf "%s", $i (s?OFS:RS)}}' file


ID  Name    Description Phnumber
1051 John I am doing good, is this task we need to fix 908342
10423 rob I am doing good, is this task we need to fix 908341
1052 Julin rob hain i know what to do just let me do it " " " " " " 908452
1051 Dave I am doing reporting this week 88889999

也许将 OFS 设置为 \t 以获得更多结构

这成功了

猫file_name | perl -0pe 's/\n(?!([0-9]{6}|$)\t)//g' | perl -0pe 's/\r(?!([0-9]{6}|$)\t)//g' | sed '/^$/d'