检测并转换文件列表的编码

Question

我有一个包含 ISO-8859 和 UTF8 编码文件的目录。我想将所有 ISO 文件转换为 UTF8 编码，并保持 UTF8 文件不变。到目前为止，我得到了这个：

for isoFile in `file exports/invoice/* | grep "ISO-8859"`; do iconv -f iso-8859-1 -t utf-8 "$isoFile" -o "$isoFile"; done

问题是 file exports/invoice/* | grep "ISO-8859" returns 这种格式的文件列表：

exports/invoice/2014.03547.html:                 HTML document, ISO-8859 text, with very long lines, with CRLF, LF line terminators

这当然不适用于 iconv。我需要从此字符串中提取文件名并通过 iconv 运行它。

Answer 1

您可以使用以下命令从此字符串中提取文件名：

cut -d' ' -f1 //to select first column

rev | cut -c 2- | rev //to remove ':' from the end of the name

所以提取文件名的整个命令是这样的：

file exports/invoice/* | grep "ISO-8859" | cut -d' ' -f1 | rev | cut -c 2- | rev

它会 return 给你：exports/invoice/2014.03547.html

Answer 2

awk 易于使用：

file exports/invoice/* | grep "ISO-8859" | awk -F':' '{print }'

Detect and convert encoding for list of files