循环中的 GREP 命令

Question

我在一个文件夹中有大约 3000 个文件。我的文件包含如下数据：

VISITERM_0 VISITERM_20 VISITERM_35 .....等等

每个文件都没有像上面一样的值。它们从 0 到 99 不等。

我想知道文件夹中有多少个文件有每个 VISITERMS。例如，如果 VISITERM_0 存在于文件夹中的 300 个文件中，那么我需要它来打印

VISITERM_0  300

类似如果有1000个文件包含VISITERM_1，我需要它来打印 VISITERM_11000

所以，我想打印从 VISITERM_0 到 VISITERM_99.

的 VISITERM 和包含它们的文件数

我使用了 grep 命令

 grep VISITERM_0 * -l | wc -l

但是，这是针对单个术语的，我想从 VISITERM_0 循环到 VISITERM_99。请帮忙！

Answer 1

#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax

# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
  # Calculate number of matches...
  num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
  # ...and print the result.
  printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done

Answer 2

这是一个 gnu awk（gnu 由于 RS 中的多个字符）应该做的：

awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *

示例：

cat file1
VISITERM_0 VISITERM_320 VISITERM_35

cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11

给出：

awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1

工作原理：

awk -v RS=" |\n" '              # Set record selector to space or new line
    {n=split(,a,"VISITERM_")  # Split record using "VISITERM_" as separator and store hits of split in "n"
    if (n==2 && a[2]<100)       # If "n" is "2" (does contain "ISITERM_") and has number less "100"
        b[a[2]]++}              # Count the hit of each number and stor it in array "b"
END {for (i in b)               # Walk trough array "b"
    print "VISITERM_"i,b[i]}    # Print the hits
' file*                         # Read the files

PS
如果所有内容都在一行中，请更改为 RS=" "。那么它应该适用于大多数 awk

循环中的 GREP 命令

GREP command in loop

linux

bash

grep