如何使用shell统计UTF-8编码文件中的汉字

Question

cat doc.txt and the following characters will show:

你好 Hello!
这是中文。This is a Chinese doc.

我可以使用命令

wc -w doc.txt

但它会显示：

8 doc.txt

这个命令把字符你好和这是中文都当作一个单词，而实际上你好是两个汉字，这是中文四个

我想要的是把这些中文单词算对（例子中有12个单词），谁能帮帮忙？

Answer 1

您可以使用 -m 或 --chars 选项：

$ echo -n "你好" | wc -m

输出：

how to use shell to count Chinese characters in file encoded in UTF-8