Linux - 将制表符分隔转换为管道分隔并删除前导和尾随 space
Linux - convert tab delimited to pipe delimited AND remove leading & trailing space
我想根据以下要求将平面数据文件转换为新文件:
1) Change tab delimited to pile delimited "|".
2) Remove any leading and trailing SPACE on each "column".
3) Some columns are NULL, I want to keep the null. e.g., A||B. (2nd column is null.)
示例:
原文件(test.dat)有一行制表符分隔的数据,共7列,"NY"后2列为NULL:
A New York NY Meal - Seafood Grocery Department
请注意某些字段中有 leading/trailing space:
(" A ", "Meal - Seafood ", " Grocery Department ")
这是我想要在新文件中的最终版本:
A|New York|NY|||Meal - Seafood|Grocery Department
Can any one write a sample code or shell script that I can use in Linux to ouput a new file?
谢谢!
posix 字符 class [[:space:]]
并且可以使用 *
来匹配零个或多个空格。文字 \t
匹配制表符。喜欢,
$ sed "s/^[[:space:]]*//" test.dat | sed "s/[[:space:]]*\t[[:space:]]*/|/g" \
| sed "s/[[:space:]]*$//"
A|New York|NY|Meal - Seafood|Grocery Department
$ cat test.dat
A New York NY Meal - Seafood Grocery Department
您可以使用 awk
.
鉴于:
$ cat -t file.tsv
A ^INew York^INY^I^I^IMeal - Seafood ^I Grocery Department
(选项卡显示为 ^I
)那里...)
$ awk 'BEGIN{FS="\t"; OFS="|"}
{for (i=1; i<=NF;i++) {
gsub(/^[ ]+/,"",$i); gsub(/[ ]+$/,"",$i)
}
} 1' file.tsv
A|New York|NY|||Meal - Seafood|Grocery Department
我想根据以下要求将平面数据文件转换为新文件:
1) Change tab delimited to pile delimited "|".
2) Remove any leading and trailing SPACE on each "column".
3) Some columns are NULL, I want to keep the null. e.g., A||B. (2nd column is null.)
示例:
原文件(test.dat)有一行制表符分隔的数据,共7列,"NY"后2列为NULL:
A New York NY Meal - Seafood Grocery Department
请注意某些字段中有 leading/trailing space:
(" A ", "Meal - Seafood ", " Grocery Department ")
这是我想要在新文件中的最终版本:
A|New York|NY|||Meal - Seafood|Grocery Department
Can any one write a sample code or shell script that I can use in Linux to ouput a new file?
谢谢!
posix 字符 class [[:space:]]
并且可以使用 *
来匹配零个或多个空格。文字 \t
匹配制表符。喜欢,
$ sed "s/^[[:space:]]*//" test.dat | sed "s/[[:space:]]*\t[[:space:]]*/|/g" \
| sed "s/[[:space:]]*$//"
A|New York|NY|Meal - Seafood|Grocery Department
$ cat test.dat
A New York NY Meal - Seafood Grocery Department
您可以使用 awk
.
鉴于:
$ cat -t file.tsv
A ^INew York^INY^I^I^IMeal - Seafood ^I Grocery Department
(选项卡显示为 ^I
)那里...)
$ awk 'BEGIN{FS="\t"; OFS="|"}
{for (i=1; i<=NF;i++) {
gsub(/^[ ]+/,"",$i); gsub(/[ ]+$/,"",$i)
}
} 1' file.tsv
A|New York|NY|||Meal - Seafood|Grocery Department