循环中的 GREP 命令
GREP command in loop
我在一个文件夹中有大约 3000 个文件。我的文件包含如下数据:
VISITERM_0 VISITERM_20 VISITERM_35 .....等等
每个文件都没有像上面一样的值。它们从 0 到 99 不等。
我想知道文件夹中有多少个文件有每个 VISITERMS。例如,如果 VISITERM_0 存在于文件夹中的 300 个文件中,那么我需要它来打印
VISITERM_0 300
类似如果有1000个文件包含VISITERM_1,我需要它来打印
VISITERM_11000
所以,我想打印从 VISITERM_0 到 VISITERM_99.
的 VISITERM 和包含它们的文件数
我使用了 grep 命令
grep VISITERM_0 * -l | wc -l
但是,这是针对单个术语的,我想从 VISITERM_0 循环到 VISITERM_99。请帮忙!
#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax
# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
# Calculate number of matches...
num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
# ...and print the result.
printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done
这是一个 gnu awk
(gnu 由于 RS 中的多个字符)应该做的:
awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *
示例:
cat file1
VISITERM_0 VISITERM_320 VISITERM_35
cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11
给出:
awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1
工作原理:
awk -v RS=" |\n" ' # Set record selector to space or new line
{n=split(,a,"VISITERM_") # Split record using "VISITERM_" as separator and store hits of split in "n"
if (n==2 && a[2]<100) # If "n" is "2" (does contain "ISITERM_") and has number less "100"
b[a[2]]++} # Count the hit of each number and stor it in array "b"
END {for (i in b) # Walk trough array "b"
print "VISITERM_"i,b[i]} # Print the hits
' file* # Read the files
PS
如果所有内容都在一行中,请更改为 RS=" "
。那么它应该适用于大多数 awk
我在一个文件夹中有大约 3000 个文件。我的文件包含如下数据:
VISITERM_0 VISITERM_20 VISITERM_35 .....等等
每个文件都没有像上面一样的值。它们从 0 到 99 不等。
我想知道文件夹中有多少个文件有每个 VISITERMS。例如,如果 VISITERM_0 存在于文件夹中的 300 个文件中,那么我需要它来打印
VISITERM_0 300
类似如果有1000个文件包含VISITERM_1,我需要它来打印 VISITERM_11000
所以,我想打印从 VISITERM_0 到 VISITERM_99.
的 VISITERM 和包含它们的文件数我使用了 grep 命令
grep VISITERM_0 * -l | wc -l
但是,这是针对单个术语的,我想从 VISITERM_0 循环到 VISITERM_99。请帮忙!
#!/bin/bash
# ^^- the above is important; #!/bin/sh would allow only POSIX syntax
# use a C-style for loop, which is a bash extension
for ((i=0; i<100; i++)); do
# Calculate number of matches...
num_matches=$(find . -type f -exec grep -l -e "VISITERM_$i" '{}' + | wc -l)
# ...and print the result.
printf 'VISITERM_%d\t%d\n' "$i" "$num_matches"
done
这是一个 gnu awk
(gnu 由于 RS 中的多个字符)应该做的:
awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' *
示例:
cat file1
VISITERM_0 VISITERM_320 VISITERM_35
cat file2
VISITERM_0 VISITERM_20 VISITERM_32
VISITERM_20 VISITERM_42 VISITERM_11
给出:
awk -v RS=" |\n" '{n=split(,a,"VISITERM_");if (n==2 && a[2]<100) b[a[2]]++} END {for (i in b) print "VISITERM_"i,b[i]}' file*
VISITERM_0 2
VISITERM_11 1
VISITERM_20 2
VISITERM_32 1
VISITERM_35 1
VISITERM_42 1
工作原理:
awk -v RS=" |\n" ' # Set record selector to space or new line
{n=split(,a,"VISITERM_") # Split record using "VISITERM_" as separator and store hits of split in "n"
if (n==2 && a[2]<100) # If "n" is "2" (does contain "ISITERM_") and has number less "100"
b[a[2]]++} # Count the hit of each number and stor it in array "b"
END {for (i in b) # Walk trough array "b"
print "VISITERM_"i,b[i]} # Print the hits
' file* # Read the files
PS
如果所有内容都在一行中,请更改为 RS=" "
。那么它应该适用于大多数 awk