使用第一列比较两个文件,在将列附加到输出时打印差异
Compare two files using first column, print diff while appending column to output
我有两个要比较的制表符分隔文件。
我想查找 file1 中 col1 中 file2 中缺失的值(前缀为“left”)和 file2 中 col1 中 file1 中缺失的值(前缀为“joined”)。对于这些行,我想打印 col1 和 col8。我的 diff 命令对于 col1 值相等而 col8 值不同的行失败。
文件 1:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/12/21 The Lair of the Splitpaw Char6(Main) VT emp time off off 0 Char6(Main) VT emp time
文件 2:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5a 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/21/21 Zone Char6(Main) Emp/VT/Time off off 0 Char6(Main) Emp/VT/Time
diff 命令产生输出:
diff --new-line-format="joined %L" --old-line-format="left %L" --unchanged-line-format="" <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort
当前输出:
joined Char5a Char1
joined Char6 Char6(Main) Emp/VT/Time
left Char5 Char1
left Char6 Char6(Main) VT emp time
期望的输出:
joined Char5a Char1
left Char5 Char1
非常感谢任何帮助,谢谢!
希望这个 python 脚本能达到您的期望:
#!/bin/bash
python -c '
import re, sys
def read_into_dict(regex, file):
return { re.findall(regex, line)[0]: line for line in open(file) }
regex, file1, file2 = sys.argv[1:]
dict1 = read_into_dict(regex, file1)
dict2 = read_into_dict(regex, file2)
keys1 = set(dict1.keys())
keys2 = set(dict2.keys())
for key in (keys1 - keys2):
print(f"left {dict1[key]}")
for key in (keys2 - keys1):
print(f"joined {dict2[key]}")
' '^(\S+)' <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort
Diff 用于简单的差异。使用 awk
.
更容易解决您的问题
您不需要对文件进行排序,只需读取两个文件并计算第 1 列中的键出现的频率。
在计算关键值时准备潜在的输出。当 NR==FNR
你在第一个文件中并且你想使用 "left".
读取两个文件后,查找仅出现一次的键并打印该行的准备输出。
awk -F'\t' '
{
key[]++
value[]=(NR==FNR ? "left " : "joined ") "\t"
}
END {
for (i in key) {
if (key[i]==1) {
print value[i]
}
}
}' file1 file2
我有两个要比较的制表符分隔文件。
我想查找 file1 中 col1 中 file2 中缺失的值(前缀为“left”)和 file2 中 col1 中 file1 中缺失的值(前缀为“joined”)。对于这些行,我想打印 col1 和 col8。我的 diff 命令对于 col1 值相等而 col8 值不同的行失败。
文件 1:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/12/21 The Lair of the Splitpaw Char6(Main) VT emp time off off 0 Char6(Main) VT emp time
文件 2:
Char1 55 Necromancer Knight A 11/21/21 Zone Char1(Main) off off 0 Char1(Main)
Char2 28 Druid Squire A 12/08/21 Zone Char1 off off 0 Char1
Char3 44 Enchanter Recruit A 08/07/21 Zone Char3(Main) off off 0 Char3(Main)
Char4 56 Enchanter Knight A 11/06/21 Zone Char4(Main) off off 0 Char4(Main)
Char5a 10 Magician Recruit A 10/29/21 Zone Char1 off off 0 Char1
Char6 65 Druid Champion A 12/21/21 Zone Char6(Main) Emp/VT/Time off off 0 Char6(Main) Emp/VT/Time
diff 命令产生输出:
diff --new-line-format="joined %L" --old-line-format="left %L" --unchanged-line-format="" <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort
当前输出:
joined Char5a Char1
joined Char6 Char6(Main) Emp/VT/Time
left Char5 Char1
left Char6 Char6(Main) VT emp time
期望的输出:
joined Char5a Char1
left Char5 Char1
非常感谢任何帮助,谢谢!
希望这个 python 脚本能达到您的期望:
#!/bin/bash
python -c '
import re, sys
def read_into_dict(regex, file):
return { re.findall(regex, line)[0]: line for line in open(file) }
regex, file1, file2 = sys.argv[1:]
dict1 = read_into_dict(regex, file1)
dict2 = read_into_dict(regex, file2)
keys1 = set(dict1.keys())
keys2 = set(dict2.keys())
for key in (keys1 - keys2):
print(f"left {dict1[key]}")
for key in (keys2 - keys1):
print(f"joined {dict2[key]}")
' '^(\S+)' <(cut -f1,8 "$file1" | sort) <(cut -f1,8 "$file2" | sort) | sort
Diff 用于简单的差异。使用 awk
.
更容易解决您的问题
您不需要对文件进行排序,只需读取两个文件并计算第 1 列中的键出现的频率。
在计算关键值时准备潜在的输出。当 NR==FNR
你在第一个文件中并且你想使用 "left".
读取两个文件后,查找仅出现一次的键并打印该行的准备输出。
awk -F'\t' '
{
key[]++
value[]=(NR==FNR ? "left " : "joined ") "\t"
}
END {
for (i in key) {
if (key[i]==1) {
print value[i]
}
}
}' file1 file2