grep命令——分两列打印每个文件中某个单词的文件名和重复次数

Question

我有一个包含文件的文件夹，我想做的是编写一个 shell 脚本来打印文件的名称以及某个单词在这些文件中重复的次数。

我的输出应该是这样的：

filename 3
filename 12
filename 24
…

文件名只包含文件名，不包含路径和扩展名。

我设法使用 for 循环做到了，但我认为执行时间不是很有效，所以我的另一个想法是使用 grep 命令：

grep -c “word" */*.txt

我得到的输出如下所示：

folder/filename.txt:3

我尝试使用 cut 命令，但我不知道如何避免减少单词在不同文件中出现的次数，并且文件名和文件名之间必须有一个 space数字。

grep -c “word" */*.txt | cut -d'/' -f2 | cut -d'.' -f1

知道如何使用 grep 或其他替代方法执行此操作吗？

Answer 1

您使用 cut 做得很好。当您可以使用 cut 解决问题时，大多数情况下您已经找到了可靠的快速解决方案。
在这种情况下，您需要修复 cut 命令，因为它会产生丑陋的结果。

# Ugly cutting
grep -c "word" */*.txt | cut -d'/' -f2 | tr ':' '.' | cut -d"." -f1,3 | tr '.' ' '

这里修复 cut 是错误的，但你可以学到很酷的东西

# going weird
# Combine first colums
grep -c "word" */*.txt | cut -d'/' -f2 | cut -d"." -f1
# with second column
grep -c "word" */*.txt | cut -d'/' -f2 | cut -d":" -f2
# using paste and process substitution
paste -d" " <(grep -c "word" */*.txt | cut -d'/' -f2 | cut -d"." -f1) <(grep -c "word" */*.txt | cut -d'/' -f2 | cut -d":" -f2)

不，这不是解决问题的方法。使用 sed 和

grep -c "word" */*.txt | sed 's#.*/##;s#\..*:# #'
# or shorter
grep -c "word" */*.txt | sed 's#.*/\([^.]*\).*:# #'

grep命令——分两列打印每个文件中某个单词的文件名和重复次数

Grep command - file names and numbers of repetitions of a certain word in each file printed in two columns

unix

bash

shell

grep

command