PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

Question

我尝试在 Koalas 数据框中创建一个新列 df。数据框有 2 列：col1 和 col2。我需要创建一个新列 newcol 作为 col1 和 col2 值的中位数。

import numpy as np
import databricks.koalas as ks

# df is Koalas dataframe
df = df.assign(newcol=lambda x: np.median(x.col1, x.col2).astype(float))

但是我得到以下错误：

PandasNotImplementedError: The method pd.Series.__iter__() is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

我也试过：

df.newcol = df.apply(lambda x: np.median(x.col1, x.col2), axis=1)

但是没有用。

Answer 1

我遇到了同样的问题。需要注意的是，我使用的是 pyspark.pandas 而不是考拉，但我的理解是 pyspark.pandas 来自考拉，因此我的解决方案可能仍有帮助。我尝试用 koalas 对其进行测试，但无法运行具有合理版本的集群。

import pyspark.pandas as ps

data = {"col_1": [1,2,3], "col_2": [4,5,6]}
df = ps.DataFrame(data)

median_series = df[["col_1","col_2"]].apply(lambda x: x.median(), axis=1)
median_series.name = "median"

df = ps.merge(df, median_series, left_index=True, right_index=True, how='left')

在应用时，lambda 参数 x 是每一行的 pandas.Series，所以我使用了它的中值方法。恼人的是，我无法分配任何工作，我找到的唯一方法就是进行这种丑陋的合并。哦，用 left 可以放心 df 会保持相同的行数，但 inner 可能会很好，具体取决于上下文

PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

python

dataframe

pandas

databricks

spark-koalas

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array

PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array

python

dataframe

pandas

databricks

spark-koalas

PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array

PandasNotImplementedError: The method `pd.Series.iter()` is not implemented. If you want to collect your data as an NumPy array