如何查找 CSV 文件中的差异？

Question

我有两个 csv 文件，如果用 TextEdit 打开，它们是完全一样的。但是，当运行通过终端使用 'diff' 命令时，它显示了一些我无法确定的差异。

MacBook-Pro-2 Desktop % diff goodcsv.csv completecsv.csv
1c1
< AccountNo.,PromptPay ID,Account Name,Amount,Description
---
> AccountNo.,PromptPay ID,Account Name,Amount,Description

Answer 1

我很确定它是其中一个文件开头的 BOM。

Diff 告诉您区别在于第 1 行第 1 个字符：1c1.

我复制了你的header并组成了第一行：

AccountNo.,PromptPay ID,Account Name,Amount,Description
1,2,Acme,2000.00,For a foo

然后我向其中添加了一个 BOM 并将其另存为另一个文件：

% gocsv clean --add-bom input.csv > input_bom.csv

现在当我比较两者时，我得到了你的结果：

% diff input.csv input_bom.csv
1c1
< AccountNo.,PromptPay ID,Account Name,Amount,Description
---
> AccountNo.,PromptPay ID,Account Name,Amount,Description

对于 BSD（macOS 终端），您可以使用 less 或 hexdump 可视化 BOM。我发现 less 的输出对我来说更直接：

% less input_bom.csv
<U+FEFF>AccountNo.,PromptPay ID,Account Name,Amount,Description
1,2,Acme,2000.00,For a foo

FEFF 是 BYTE ORDER MARK 字符的 Unicode 代码点。

hexdump 将为您提供有关文件中内容的完整且未经修饰的真相：

% hexdump -C input_bom.csv
00000000  ef bb bf 41 63 63 6f 75  6e 74 4e 6f 2e 2c 50 72  |...AccountNo.,Pr|
00000010  6f 6d 70 74 50 61 79 20  49 44 2c 41 63 63 6f 75  |omptPay ID,Accou|
00000020  6e 74 20 4e 61 6d 65 2c  41 6d 6f 75 6e 74 2c 44  |nt Name,Amount,D|
00000030  65 73 63 72 69 70 74 69  6f 6e 0a 31 2c 32 2c 41  |escription.1,2,A|
00000040  63 6d 65 2c 32 30 30 30  2e 30 30 2c 46 6f 72 20  |cme,2000.00,For |
00000050  61 20 66 6f 6f 0a                                 |a foo.|
00000056

前三个字节，ef bb bf，是组成BOM的UTF-8编码字节序列；您还可以从三个前导点 ...AccountNo.

看到那里有一些东西，但无法打印

我使用了 GoCSV 的 clean command to add the BOM; you can use its --strip-bom option to get rid of it. If you're doing anything with CSVs on the command line, GoCSV is a fantastic tool; it's prebuilt for macOS Intel and ARM。

如何查找 CSV 文件中的差异？

How to find differences in CSV files?

csv

macos

terminal