Pandas 带有 float64 的数据帧被 to_json() 函数错误导出

Question

此问题是关于使用 Pandas 中的 to_json() 函数导出具有 float64 数据类型的数据帧。下面附上源码

import pandas

if __name__ == "__main__":
    d = {'col1': [11111111.84, 123456.55], 'col2': [3, 4]}
    df = pandas.DataFrame(data=d)

    print(df)
    print(df.dtypes)

    output_file_path = '/test.csv'
    df.to_csv(output_file_path, index=False, encoding='UTF-8')
    output_file_path = '/test.json'
    df.to_json(output_file_path, orient="records", lines=True)

在将数据帧导出到 JSON 或 CSV 文件之前，print() 函数的输出是正确的。输出如下所示。

          col1  col2
0  11111111.84     3
1    123456.55     4
col1    float64
col2      int64
dtype: object

导出的 CSV 格式数据 (test.csv) 应该是正确的。

JSON 格式 (test.json) 导出的数据有 不正确的 小数点，如下列 col1 row1 (11111111.8399999999)。此问题仅对某些值发生，因为 col1 row2 是正确的 (123456.55).

我发现通过为 to_json() 函数指定另一个参数 double_precision 可以解决此问题。结果变得正确！（已测试。）

参考：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_json.html

但是，通过指定 double_precision 参数，它可能会限制所有列的小数点数。当每个数据列需要不同数量的小数点时，这不是一个好的方法。

另外，找到了下面的话题，但不确定是否与我的这个问题有关。

Link:

我正在尝试了解此问题的根本原因并寻找解决方案。这很奇怪，这个问题只发生在 to_json() 函数上，但 to_csv() 函数有效。

请大家帮忙！

Answer 1

pandas to_json 可能在精确度上做了一些奇怪的事情。正如您所解释的，规范的解决方案是指定 double_precision 您想要的精度，但这不允许您有选择地将 specific 列舍入到所需的精度。

另一种选择是在此处切断中间人 df.to_json，而是使用 python 的内置 json.dump:

import json

# convert to string
json.dumps(df.to_dict()) 
# '{"col1": {"0": 11111111.84, "1": 123456.55}, "col2": {"0": 3, "1": 4}}'  

# save as a file
json.dump(df.to_dict(), f)  # f is an open fileobj

如您所见，这不会影响精度。 Standard floating point caveats 仍然适用。

Pandas 带有 float64 的数据帧被 to_json() 函数错误导出

Pandas dataframe with float64 incorrectly exported by to_json() function

python

json

decimal-point

dataframe

pandas