具有足够元素的 1d numpy 数组未按预期调整大小

1d numpy array with enough elements not resizing as expected

我刚开始将 numpy 数组与 panda 数据帧结合使用,我正在做一个练习项目,但遇到了一些问题。我有一个熊猫数据框,我将它的行传递给一个函数来对其进行一些处理。该函数接受两个不同的数组,一个标记为最佳和最差,然后创建一个新向量来比较总和。从那里它将 return pandas.apply 已经传递的当前数组,或者它将 return 基于 sum() 最低的新向量。这将创建一个新的 python 数组,最后需要是一个 20x5 的矩阵。该函数工作正常,但需要将 returned 数据帧转换为大小为 (20 x 5) 的 python 数组以便进一步工作,当调用 np.array() 时,它将其转换为大小为 (20,) 的数组。我认为只使用 .reshape(20,5) 就可以工作,因为它有足够的元素可以使用,但事实并非如此,它只是在 运行 上失败了。感谢任何帮助,因为我找不到任何可以帮助我理解为什么会发生这种情况的东西。

(许多人通过阅读上面的内容可以猜到错误是:“无法将大小为 20 的数组重塑为形状 (20,5)”)

我的程序中显示它的代码除外(可以 运行 自己):

import numpy as np
import pandas as pd

rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))

def new_vectors(current, best, worst):
    #convert current to numpy array 
    current = current.to_numpy()

    #construct a new vector to check
    new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))

    #get the new sum for the new and old vectors
    summed = current.sum()
    newsummed = new.sum()

    #return the smallest one
    return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()


z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))
z.reshape(20,5) #I know reshape() creates a copy, just here to show it doesn't work regardless

您可以手动进行整形。

  1. 删除z.reshape(20,5)。这不适用于数组数组。

  2. 应用函数后,改用这个:

    # Create a empty matrix with desired size
     matrix = np.zeros(shape=(20,5))
     # Iterate over z and assign each array to a row in the numpy matrix.
     for i,arr in enumerate(z):
          matrix[i] = arr
    

如果您不知道所需的矩阵大小。创建矩阵 matrix = np.zeros(shape=df.shape).

使用的所有代码:

import numpy as np
import pandas as pd

rng = np.random.default_rng(seed=22)
df = pd.DataFrame(rng.random((20,5)))

def new_vectors(current, best, worst):
    #convert current to numpy array 
    current = current.to_numpy()

    #construct a new vector to check
    new = np.add(current, np.subtract((rng.random()*(np.subtract(best, np.absolute(current)))), ((rng.random()*(np.subtract(worst, np.absolute(current)))))))

    #get the new sum for the new and old vectors
    summed = current.sum()
    newsummed = new.sum()

    #return the smallest one
    return np.add(((newsummed < summed)*(new)), ((newsummed > summed)*(current))).flatten()


z = np.array(df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[11].to_numpy()), axis=1))

matrix = np.zeros(shape=df.shape)

for i,arr in enumerate(z):
     matrix[i] = arr

您的原始数据框 - 为了显示目的缩短了长度:

In [628]: df = pd.DataFrame(rng.random((4,5)))
In [629]: df
Out[629]: 
          0         1         2         3         4
0  0.891169  0.134904  0.515261  0.975586  0.150426
1  0.834185  0.671914  0.072134  0.170696  0.923737
2  0.065445  0.356001  0.034787  0.257711  0.213964
3  0.790341  0.080620  0.111369  0.542423  0.199517

下一帧:

In [631]: df1=df.apply(new_vectors, args=(df.iloc[0].to_numpy(), df.iloc[3].to_numpy()), axis=1)
In [632]: df1
Out[632]: 
0    [0.891168725430691, 0.13490384333565053, 0.515...
1    [0.834184861872087, 0.6719141503303373, 0.0721...
2    [0.065444520313796, 0.35600115939269394, 0.034...
3    [0.7903408924058509, 0.08061955595765169, 0.11...
dtype: object

请注意,它有 1 列,其中包含数组。从中创建一个数组:

In [633]: df1.to_numpy()
Out[633]: 
array([array([0.89116873, 0.13490384, 0.51526113, 0.97558562, 0.15042584]),
       array([0.83418486, 0.67191415, 0.07213404, 0.17069617, 0.92373724]),
       array([0.06544452, 0.35600116, 0.03478695, 0.25771129, 0.21396367]),
       array([0.79034089, 0.08061956, 0.1113691 , 0.54242262, 0.19951741])],
      dtype=object)

即(4,)object dtype。该数据类型很重要。即使元素本身都有 5 个元素,reshape 也无法跨越该“对象”边界。我们无法将其重塑为 (4,5)。

但是我们可以concatenate那些数组:

In [636]: np.vstack(df1.to_numpy())
Out[636]: 
array([[0.89116873, 0.13490384, 0.51526113, 0.97558562, 0.15042584],
       [0.83418486, 0.67191415, 0.07213404, 0.17069617, 0.92373724],
       [0.06544452, 0.35600116, 0.03478695, 0.25771129, 0.21396367],
       [0.79034089, 0.08061956, 0.1113691 , 0.54242262, 0.19951741]])