ValueError: operands could not be broadcast together with shapes in concatenatinng arrays across pandas columns

ValueError: operands could not be broadcast together with shapes in concatenatinng arrays across pandas columns

我正在使用一个 pandas 数据框,它看起来像这样:

col1    col2    col3    col_num
0   [-0.20447069290738076, 0.4159556680196389, -0....   [-0.10935000772973974, -0.04425263358067333, -...   [51.0834196, 10.4234469]    3160
1   [-0.42439951483476124, -0.3135960467759942, 0....   [0.3842614765721414, -0.06756644506033657, 0.4...   [45.5643442, 17.0118954]    3159
3   [0.3158755226012898, -0.007057682056994253, 0....   [-0.33158941456615376, 0.09637640660002277, -0...   [50.6402809, 4.6667145] 3157
5   [-0.011089723491692679, -0.01649481399305317, ...   [-0.02827408211098023, 0.00019040943944721592,...   [53.45733965, -2.22695880505223]    3157

我想像这样跨行连接向量: df['col1'] + df['col2'] + df['col3'] + df['col_num'].transform(lambda item: [item])

但是我收到以下错误提示:

/opt/conda/lib/python3.6/site-packages/pandas/core/ops.py in <lambda>(x)
    708                 if is_object_dtype(lvalues):
    709                     return libalgos.arrmap_object(lvalues,
--> 710                                                   lambda x: op(x, rvalues))
    711             raise
    712 

ValueError: operands could not be broadcast together with shapes (30,) (86597,) 

看起来由于某种原因 ti 在连接只有 2 个维度的第 3 列时卡住了。数据长 86597 行。我该如何解决这个错误?

您可以将有问题的列转换为 list,例如:

df['col1'] + df['col2'] + df['col3'].apply(list) + df['col_num'].transform(lambda x: [x])

另一个解决方案是将所有列表转换为 2d numpy 数组并使用 hstack,如果每列中的列表长度相同,因为您失去了使用连续内存块中保存的 NumPy 数组的矢量化功能:

np.random.seed(123)
N = 10
df = pd.DataFrame({
        "col1": [np.random.randint(10, size=3) for i in range(N)],
        "col2": [np.random.randint(10, size=3) for i in range(N)],
        "col3": [np.random.randint(10, size=2) for i in range(N)],
        'col_num': range(N)
        })
print (df)
        col1       col2    col3  col_num
0  [2, 2, 6]  [9, 3, 4]  [2, 4]        0
1  [1, 3, 9]  [6, 1, 5]  [8, 1]        1
2  [6, 1, 0]  [6, 2, 1]  [2, 1]        2
3  [1, 9, 0]  [8, 3, 5]  [1, 3]        3
4  [0, 9, 3]  [0, 2, 6]  [5, 9]        4
5  [4, 0, 0]  [2, 4, 4]  [0, 8]        5
6  [4, 1, 7]  [6, 3, 0]  [1, 6]        6
7  [3, 2, 4]  [6, 4, 7]  [3, 3]        7
8  [7, 2, 4]  [6, 7, 1]  [5, 9]        8
9  [8, 0, 7]  [5, 7, 9]  [7, 9]        9

a = np.array(df['col1'].values.tolist())
b = np.array(df['col2'].values.tolist())
c = np.array(df['col3'].values.tolist())
#create Nx1 array
d = df['col_num'].values[:, None]

arr = np.hstack((a,b,c, d))
print (arr)
[[2 2 6 9 3 4 2 4 0]
 [1 3 9 6 1 5 8 1 1]
 [6 1 0 6 2 1 2 1 2]
 [1 9 0 8 3 5 1 3 3]
 [0 9 3 0 2 6 5 9 4]
 [4 0 0 2 4 4 0 8 5]
 [4 1 7 6 3 0 1 6 6]
 [3 2 4 6 4 7 3 3 7]
 [7 2 4 6 7 1 5 9 8]
 [8 0 7 5 7 9 7 9 9]]

df = pd.DataFrame(arr)
print (df)
   0  1  2  3  4  5  6  7  8
0  2  2  6  9  3  4  2  4  0
1  1  3  9  6  1  5  8  1  1
2  6  1  0  6  2  1  2  1  2
3  1  9  0  8  3  5  1  3  3
4  0  9  3  0  2  6  5  9  4
5  4  0  0  2  4  4  0  8  5
6  4  1  7  6  3  0  1  6  6
7  3  2  4  6  4  7  3  3  7
8  7  2  4  6  7  1  5  9  8
9  8  0  7  5  7  9  7  9  9