在 pandas 中读取包含特殊字符的大型 excel 文件

Reading large excel file with special characters in pandas

我有一个 500MB 以上的文件，它是通过将大型 excel 电子表格另存为 unicode 生成的。我是运行windows7.

我需要用 python pandas 打开文件。到目前为止，我曾经使用记事本++将文件从ANSI转换为UTF-8，但文件现在太大了，然后用记事本++打开它。

我有希伯来语、法语、瑞典语、挪威语、丹麦语特殊字符。

Panda 的 read_excel 太慢了 * 我让它运行了几分钟，但没有看到任何输出。
iconv: apparently I can not get the encoding right, I just get out a list of tab separated nulls when I have tried:

iconv -f "CP858" -t "UTF-8" file1.txt > file2.txt

iconv -f "windows-1252" -t "UTF-8" file1.txt > file2.txt

编辑

iconv -f "UTF-16le" -t "UTF-8" file1.txt > file2.txt 导致一个非常奇怪的行为：行与行之间的一行被剪掉了。一切看起来都很好，但实际上只有 80K 行被转换。

编辑 2

.. read_csv 和 encoding='utf-16le' 正确读取文件。但是，我还是不明白为什么 iconv 搞砸了。

read_csv 和 encoding='utf-16le' 正确读取文件。但是，我还是不明白为什么 iconv 搞砸了。