value error: setting an array element with a sequence

Question

我正在使用 scikit 模型（即 ExtraTreesRegressor ）进行拟合，目的是进行受监督的特征选择。

为了尽可能清楚，我做了一个玩具示例。那是玩具代码：

import pandas as pd
import numpy as np
from  sklearn.ensemble import ExtraTreesRegressor
from itertools import chain

# Original Dataframe
df = pd.DataFrame({"A": [[10,15,12,14],[20,30,10,43]], "R":[2,2] ,"C":[2,2] , "CLASS":[1,0]})
X = np.array([np.array(df.A).reshape(1,4) , df.C , df.R])
Y = np.array(df.CLASS)

# prints
X = np.array([np.array(df.A), df.C , df.R])
Y = np.array(df.CLASS)

print("X",X)
print("Y",Y) 
print(df)
df['A'].apply(lambda x: print("ORIGINAL SHAPE",np.array(x).shape,"field:",x))
df['A'] = df['A'].apply(lambda x: np.array(x).reshape(4,1),"field:",x)
df['A'].apply(lambda x: print("RESHAPED SHAPE",np.array(x).shape,"field:",x))
model = ExtraTreesRegressor()
model.fit(X,Y)
model.feature_importances_

X [[[10, 15, 12, 14] [20, 30, 10, 43]]
 [2 2]
 [2 2]]

Y [1 0]

                   A  C  CLASS  R
0  [10, 15, 12, 14]  2      1  2
1  [20, 30, 10, 43]  2      0  2
ORIGINAL SHAPE (4,) field: [10, 15, 12, 14]
ORIGINAL SHAPE (4,) field: [20, 30, 10, 43]
---------------------------

这就是出现的异常：

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-5a36c4c17ea0> in <module>()
      7 print(df)
      8 model = ExtraTreesRegressor()
----> 9 model.fit(X,Y)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/ensemble/forest.py in fit(self, X, y, sample_weight)
    210         """
    211         # Validate or convert input data
--> 212         X = check_array(X, dtype=DTYPE, accept_sparse="csc")
    213         if issparse(X):
    214             # Pre-sort indices to avoid that each individual tree of the

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    371                                       force_all_finite)
    372     else:
--> 373         array = np.array(array, dtype=dtype, order=order, copy=copy)
    374 
    375         if ensure_2d:

ValueError: setting an array element with a sequence.

我注意到涉及 np.arrays。所以我尝试安装另一个玩具数据框，这是最基本的数据框，只有标量并且没有出现错误。我试图保留相同的代码并通过添加另一个包含一维数组的字段来修改相同的玩具数据框，现在出现了相同的异常。

我环顾四周，但到目前为止，即使尝试进行一些重塑、转换为列表、np.array 等并在我的实际问题中矩阵化，我也没有找到解决方案。现在我一直在朝这个方向努力。

我还看到，当样本之间的数组长度不同时，通常会出现这种问题，但玩具示例并非如此。

有人知道如何处理这个 structures/exception 吗？在此先感谢您的帮助。

Answer 1

要将 Pandas' DataFrame 转换为 NumPy 的矩阵，

import pandas as pd

def df2mat(df):
    a = df.as_matrix()
    n = a.shape[0]
    m = len(a[0])
    b = np.zeros((n,m))
    for i in range(n):
        for j in range(m):
            b[i,j]=a[i][j]
return b

df = pd.DataFrame({"A":[[1,2],[3,4]]})
b = df2mat(df.A)

之后，连接。

Answer 2

仔细看看你的X：

>>> X
array([[[10, 15, 12, 14], [20, 30, 10, 43]],
       [2, 2],
       [2, 2]], dtype=object)
>>> type(X[0,0])
<class 'list'>

请注意它是 dtype=object，其中一个对象是 list，因此“设置数组元素的顺序。部分问题是 np.array(df.A) 没有正确创建二维数组：

>>> np.array(df.A)
array([[10, 15, 12, 14], [20, 30, 10, 43]], dtype=object)
>>> _.shape
(2,)  # oops!

但是使用 np.stack(df.A) 解决了这个问题。

您在寻找：

>>> X = np.concatenate([
        np.stack(df.A),                 # condense A to (N, 4)
        np.expand_dims(df.C, axis=-1),  # expand C to (N, 1)
        np.expand_dims(df.R, axis=-1),  # expand R to (N, 1)
        axis=-1
    )
>>> X
array([[10, 15, 12, 14,  2,  2],
       [20, 30, 10, 43,  2,  2]], dtype=int64)

value error: setting an array element with a sequence

value error: setting an array element with a sequence

python

arrays

numpy

data-fitting

scikit-learn