如何从另一个文件中删除具有匹配模式的行？

Question

我想删除 FILE1 中包含 FILE2 中模式的行。
我如何使用 shell/bash 或 Tcl 执行此操作？

例如：

文件 1：

This is ECO_01  
This is ECO_02  
This is ECO_03  
This is ECO_04

文件 2：

ECO_02  
ECO_04

输出：

This is ECO_01   
This is ECO_03

Answer 1

最通用的解决方案是

$ grep -vf file2 file1

请注意，任何字段上的任何子字符串匹配都将被计算在内。如果你只限制精确匹配到一个精确的字段（这里假设最后一个）

$ awk 'NR==FNR{a[]; next} !($NF in a)' file2 file1

Answer 2

你只需要使用 sed 命令（如下所示）从 FILE1 中删除匹配的行。

macOS:

for i in `cat FILE2.txt`
do
sed -i '' "/$i/d" FILE1.txt
done

Linux:

for i in `cat FILE2.txt`
do
sed -i '/$i/d' FILE1.txt
done

Answer 3

在 Tcl 中，您将加载模式文件并使用它们进行过滤。保持主要过滤流程从标准输入到标准输出可能是最简单的；您可以很容易地重定向那些 from/to 文件。由于您似乎想使用“is pattern a substring of”作为匹配规则，您可以使用 string first 来实现，导致此代码：

# Load in the patterns from the file named by the first argument
set f [open [lindex $argv 0]]
set patterns [split [string trimright [read $f] \n] \n]
close $f

# Factor out the actual matching
proc matches {theString} {
    global patterns
    foreach pat $patterns {
        # Change the next line to use other matching rules
        if {[string first $pat $theString] >= 0} {
            return true
        }
    }
    return false
}

# Read all input lines and print all non-matching lines
while {[gets stdin line] >= 0} {
    if {![match $line]} {
        puts $line
    }
}

我发现分解出带有关键位的过程很有帮助，比如“这一行是否匹配我的任何模式？”您可能会像这样调用上面的代码：

tclsh doFiltering.tcl patterns.txt <input.txt >output.txt

Answer 4

另一种 Tcl 解决方案：

set fid [open file2 r]
set patterns [lmap line [split [read -nonewline $fid] \n] {string trim $line}]
close $fid

set fid [open file1 r]
set lines [split [read -nonewline $fid] \n]
close $fid

set wanted [lsearch -inline -all -regexp -not $lines [join $patterns "|"]]
puts [join $wanted \n]

This is ECO_01  
This is ECO_03

参考：lsearch man page

如何从另一个文件中删除具有匹配模式的行？

How to delete line with matching pattern from another file?

bash

shell

scripting

tcl