pandas.DataFrame.to_markdown 将大整数转换为浮点数

Question

pandas.DataFrame.to_markdown 将大 int 转换为 float。这是错误还是功能？有什么解决办法吗？

>>> df = pd.DataFrame({"A": [123456, 123456]})
>>> print(df.to_markdown())
|    |      A |
|---:|-------:|
|  0 | 123456 |
|  1 | 123456 |

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> print(df.to_markdown())
|    |           A |
|---:|------------:|
|  0 | 1.23457e+06 |
|  1 | 1.23457e+06 |

>>> print(df)
         A
0  1234567
1  1234567

>>> print(df.A.dtype)
int64

Answer 1

我最初只找到了一个解决方法，但没有找到解释：将列转换为字符串。

>>> df = pd.DataFrame({"A": [1234567, 1234567]})
>>> df["A"] = df.A.astype(str)
>>> print(df.to_markdown())
|    |       A |
|---:|--------:|
|  0 | 1234567 |
|  1 | 1234567 |

更新：

我认为这是由 2 个因素造成的：

tabulate中的_column_type函数：

def _column_type(strings, has_invisible=True, numparse=True):
    """The least generic type all column values are convertible to.

可以通过tablefmt="pretty"禁用转换来解决：

print(df.to_markdown(tablefmt="pretty"))
+---+---------+
|   |    A    |
+---+---------+
| 0 | 1234567 |
| 1 | 1234567 |
+---+---------+

当有多列时，其中一列包含 float 个数字。由于 tabulate 使用 df.values 提取数据，将 DataFrame 转换为 numpy.array，所有值随后都转换为相同的 dtype（float). this issue.

>>> df = pd.DataFrame({"A": [1234567, 1234567], "B": [0.1, 0.2]})
>>> print(df)
         A    B
0  1234567  0.1
1  1234567  0.2

>>> print(df.A.dtype)
int64

>>> print(df.to_markdown(tablefmt="pretty"))
+---+-----------+-----+
|   |     A     |  B  |
+---+-----------+-----+
| 0 | 1234567.0 | 0.1 |
| 1 | 1234567.0 | 0.2 |
+---+-----------+-----+

>>> df.values
array([[1.234567e+06, 1.000000e-01],
       [1.234567e+06, 2.000000e-01]])

Answer 2

如果勾选pandas选项，默认有效位数为6。

import pandas as pd

pd.describe_option()

display.precision : int
    Floating point output precision (number of significant digits). This is
    only a suggestion
    [default: 6] [currently: 6]

pandas.DataFrame.to_markdown 将大整数转换为浮点数

pandas.DataFrame.to_markdown transform large int to float

markdown

pandas

tabulate