Linux

Question

我想根据以下要求将平面数据文件转换为新文件：

1) Change tab delimited to pile delimited "|".

2) Remove any leading and trailing SPACE on each "column".

3) Some columns are NULL, I want to keep the null. e.g., A||B. (2nd column is null.)

示例：

原文件(test.dat)有一行制表符分隔的数据，共7列，"NY"后2列为NULL:

 A  New York    NY          Meal - Seafood       Grocery Department

请注意某些字段中有 leading/trailing space：

(" A ", "Meal - Seafood   ", "  Grocery Department   ")

这是我想要在新文件中的最终版本：

A|New York|NY|||Meal - Seafood|Grocery Department

Can any one write a sample code or shell script that I can use in Linux to ouput a new file?

谢谢！

Answer 1

posix 字符 class [[:space:]] 并且可以使用 * 来匹配零个或多个空格。文字 \t 匹配制表符。喜欢，

$ sed "s/^[[:space:]]*//" test.dat | sed "s/[[:space:]]*\t[[:space:]]*/|/g" \
     | sed "s/[[:space:]]*$//" 
A|New York|NY|Meal - Seafood|Grocery Department   

$ cat test.dat
A   New York    NY   Meal - Seafood Grocery Department

Answer 2

您可以使用 awk.

鉴于：

$ cat -t file.tsv
 A ^INew York^INY^I^I^IMeal - Seafood   ^I  Grocery Department

（选项卡显示为 ^I）那里...）

$ awk 'BEGIN{FS="\t"; OFS="|"} 
     {for (i=1; i<=NF;i++) {
         gsub(/^[ ]+/,"",$i); gsub(/[ ]+$/,"",$i) 
        }
  } 1' file.tsv
A|New York|NY|||Meal - Seafood|Grocery Department

Linux - 将制表符分隔转换为管道分隔并删除前导和尾随 space

Linux - convert tab delimited to pipe delimited AND remove leading & trailing space

shell

notepad++

delimiter

trim