Pandas 格式 - 如何将 DataFrame float64 列（带有 NaN）保存为 int？

Question

我的DataFrame大约有20列，列类型混合；其中之一是 15 到 18 位的身份证号码。有些行没有 ID 号（列中有 NaN）。读取.csv时，身份证号是用科学计数法写的，失去了身份证号的好处...

我正在尝试找到一种方法将 DataFrame 保存为 csv（使用 .to_csv），同时将此 ID 号保持为完整的 int 形式。

我发现最接近的是 Format / Suppress Scientific Notation from Python Pandas Aggregation Results，但它更改了所有列，而我只想更改一列。

感谢您的帮助！

Answer 1

您可以在调用 to_csv()

时使用 float_format

df.to_csv(filepath, index=False, sep='\t', float_format='%.6f')

完整答案在这里：

对于您的 ID，您可以尝试将 6 更改为 0

Answer 2

正如 MaxU 在评论中所说，最好的方法可能是对 NaN 使用占位符。

我在我的专栏中使用 .fillna(-9999) 去除了 NaN，然后很容易将 ID 表示为 int（使用 .astype(int) 或 dtype）。

问题已解决。

Answer 3

从 pandas 0.24（2019 年 1 月）开始，您可以将数据表示为 arrays.IntegerArray, corresponding to nullable integers，允许您在坚持惯用 pandas 的同时实现您想要的结果。

例如，假设以下是您使用浮点数得到的结果：

In [99]: df.Id
Out[99]:
0    1.000000e+18
1    2.000000e+18
2    3.000000e+18
3             NaN
4    4.000000e+18
Name: Id, dtype: float64

In [100]: df.Id.to_csv('output.csv')

In [101]: !cat output.csv
0,1e+18
1,2e+18
2,3e+18
3,
4,4e+18

然后，使用 dtype 'Int64'，您将得到以下内容：

In [102]: df.Id.astype('Int64')
Out[102]:
0    1000000000000000000
1    2000000000000000000
2    3000000000000000000
3                    NaN
4    4000000000000000000
Name: Id, dtype: Int64

In [103]: df.Id.astype('Int64').to_csv('output.csv')

In [104]: !cat output.csv
0,1000000000000000000
1,2000000000000000000
2,3000000000000000000
3,
4,4000000000000000000

Pandas 格式 - 如何将 DataFrame float64 列（带有 NaN）保存为 int？

Pandas format - How to save a DataFrame float64 column (with NaNs) as int?

python

floating-point

scientific-notation

number-formatting

pandas