在hadoop中对输出文本文件进行排序，有没有办法不排序就可以查看输出？或使用不同的排序方法？

Question

所以基本上我使用 mapreduce 来计算我保存在 hadoop 中的文本文件的字数，现在我想查看输出。

目前这是我在网上看到的唯一命令：

bin/hadoop fs -cat output/part-r-00000 | sort -k 2 -n -r | less

到目前为止我只是对这个命令感到困惑，它只是对输出进行排序吗？我可以在不排序的情况下查看输出吗？

此命令是否对字数进行排序，否则按字母顺序显示所有内容？您是否有任何其他方法可以推荐对保存的文本 fie，小说进行排序？

我也可以不排序只看wordcount的输出文件吗？

Answer 1

Can I view the output without sorting it?

就-cat吧

bin/hadoop fs -cat output/part-r-00000 | less

或者将输出文件从HDFS拷贝到Local FS中使用

bin/hadoop fs -get output/part-r-00000  /tmp/output

Is this command sorting the wordcount display everything in alphabetical order otherwise?

sort -k 2 -n -r：对第 2 列 (-k 2) 按数字 (-n) 倒序 (-r) 排序。

假设第二列包含计数，这会将单词从出现次数最多到最少排序。至于不同的排序方式，我觉得这是更好的一种。如果要按字母顺序对内容进行排序，只需使用 sort。参考 sort manual.

Sorting the output text file in hadoop, is there a way to view the output without sorting it? or using different sorting method?