缩短 Pandas 数据框的十进制数，与 Python 的十进制模块一样

Question

我有几个这样的数据框。

data = {'col1': [3.1415926535, 28, -0.0000000000000000618, 1.100000001],
        'col2': ['string1', 'string2', 'string3', 'string4'],
        'col3': [9876543210, 0, 333.3333333, np.nan],
        'col4': [np.nan] * 4}
df = pd.DataFrame(data, index=[1001, 1002, 1003, 1004])

print(df)
              col1      col2          col3  col4
1001  3.141593e+00   string1  9.876543e+09   NaN
1002  2.800000e+01   string2  0.000000e+00   NaN
1003  -6.180000e-17  string3  3.333333e+02   NaN
1004  1.100000e+00   string4           NaN   NaN

它们都有数千行和数百列，并以 CSV 格式存储。出于存储经济的原因，我想在将数据帧中包含的每个值保存到 csv 之前优化它们的精度。

对于4位的精度，这可能会给出以下结果。

           col1     col2       col3  col4
1001  3.141e+00  string1  9.876e+09   NaN
1002    2.8e+01  string2      0e+00   NaN
1003  -6.18e-17  string3  3.333e+02   NaN
1004    1.1e+00  string4        NaN   NaN

有时同一列中的范围很大，“舍入”方法不适合我的情况，因为它不允许我在大值和小值之间保持相似的精度。我也用“df.to_csv()”命令测试了“float_format”参数，但它也不符合我的需要。

Python's Decimal library满足了这个需求，但是我没能在dataframe上应用。您有有效应用此类处理的解决方案吗？

注意： 如果预期值为“1.1e+00”，我也可以接受“1.1”格式，一旦存储在 CSV 中会更经济。但是对于太极端或太接近于零的数字，这种格式可能并不理想......因此科学记数法似乎更适合我。

Answer 1

运行显示DF之前的这条命令：

pd.options.display.float_format = "{:,.4E}".format

或者，如果您只需要几行，例如：

df['col1'] = df['col1'].map('${:,.4E}'.format)

缩短 Pandas 数据框的十进制数，与 Python 的十进制模块一样

Shorten decimal numbers of Pandas dataframe as with Python's decimal modules

decimal

dataframe

python-3.x

pandas