Pandas: 应用 result_type="expand": 错误的数据类型

Question

我想向 DataFrame 添加多列：

import pandas as pd

df = pd.DataFrame(
    [
        (0, 1),
        (1, 1),
        (1, 2),
    ],
    columns=['a', 'b']
)


def apply_fn(row) -> (int, float):
    return int(row.a + row.b), float(row.a / row.b)


df[['c', 'd']] = df.apply(apply_fn, result_type='expand', axis=1)

结果：

>>> df
   a  b    c    d
0  0  1  1.0  0.0
1  1  1  2.0  1.0
2  1  2  3.0  0.5

>>> df.dtypes
a      int64
b      int64
c    float64
d    float64
dtype: object

为什么列 c 不是数据类型 int？我可以以某种方式指定它吗？类似于 .apply(..., dtypes=[int, float])?

Answer 1

我相信这是因为 result_type='expand' 导致扩展为一个系列，所以第一个行在它自己的系列中，然后是下一行，依此类推. 但是，因为Series对象只能有一个数据类型，所以整数会被转换成浮点数。

例如，看这个：

>>> pd.Series([1, 0.0])
0    1.0
1    0.0
dtype: float64

一种解决方法是在 apply 调用中调用 tolist，并将其包装在对 DataFrame:

的调用中

>>> df[['c', 'd']] = pd.DataFrame(df.apply(apply_fn, axis=1).tolist())
   a  b  c    d
0  0  1  1  0.0
1  1  1  2  1.0
2  1  2  3  0.5

Answer 2

您可以链接 astype

df.apply(apply_fn, axis=1, result_type='expand').astype({0:'int', 1:'float'})
Out[147]: 
   0    1
0  1  0.0
1  2  1.0
2  3  0.5

Pandas: 应用 result_type="expand": 错误的数据类型

Pandas: apply result_type="expand": wrong dtypes

python

apply

pandas

dtype