单词重复的正则表达式

Question

我需要一个用于 sed 的正则表达式（请只使用 sed），它可以帮助我确定某个单词是否在一个单词中出现 3 次，所以打印这一行...

假设这是文件：

abc abc gh abc
abcabc abc
 ab ab cd ab xx ab
ababab cc ababab
abab abab cd abab

所以输出是：

P1 F1

abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

这就是我正在尝试的

sed -n '/\([^ ]\+\)[ ]+/p'

它不起作用...：/我做错了什么？？

单词是否在开头并不重要，它们不需要按顺序出现

Answer 1

您需要在 </code></p> 之间添加 <code>.*

$ sed -n '/\b\([^ ]\+\)\b.*\b\b.*\b\b/p' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

我假设您的输入仅包含空格和单词字符。

Answer 2

我知道它要求 sed，但我见过的所有带 sed 的系统也有 awk，所以这是一个 awk 解决方案：

awk -F"[^[:alnum:]]" '{delete a;for (i=1;i<=NF;i++) a[$i]++;for (i in a) if (a[i]>2) {print [=10=];next}}' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

与正则表达式解决方案相比，这可能更容易理解。

awk -F"[^[:alnum:]]" # Set field separator to anything other than alpha and numerics characters.
'{
delete a            # Delete array "a"
for (i=1;i<=NF;i++) # Loop trough one by one word
    a[$i]++         # Store number of hits of word in array "a"
for (i in a)        # Loop trough the array "a"
    if (a[i]>2) {   # If one word is found more than two times:
        print [=11=]    # Print the line
        next        # Skip to next line, so its not printed double if other word is found three times
    }
}' file             # Read the file

单词重复的正则表达式

regex for word repetition

regex

unix

sed