操作数无法与形状 (100,3) (100,) 一起广播，为什么？

Question

这是我在stackoverlow的第一个问题，我的英文实在是太差了，所以感谢所有看到我英文差的人帮助我^_^

我的问题是关于广播的。 enter image description here 我要做的是将X的每一行乘以B的同一行中的数字……

X 是一个 (100,3) 数组，XW 是一个列向量，(100,)。为什么他们不能广播？

我加上“XW = XW.reshape((X.shape[0],1))”后，就可以播放了。为什么……(100,1)和(100,)有区别吗？

我觉得我的图已经很清楚的描述了我的问题了。。。我的代码好长。。。我觉得我的代码不方便看。。。。。。。。。。。。。。。。。。。。。。

这是代码..

import numpy as np
import matplotlib.pyplot as plt

class MyFirstMachineLeaningAlgorithm():
    def StochasticGradientDescent(self, W, X, count=100, a=0.1):

        n = X.shape[0]
        for i in range(count):  # 学习count次
            gradient = np.zeros(3)
            for j in range(n):
                gradient += X[j, :] * (1 - 2 * (X[j, :] @ W))

            W = W + a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

        return W

    def BatchGraidentDescent(self, W, X, count=100, a=0.1):
        for i in range(count):
            XW = X @ W
            XW = 1 - 2 * XW

            #XW = XW.reshape((X.shape[0],1))
            gradient = X*XW
            gradient = np.sum(gradient,axis = 0)

            W = W + a * gradient
            # 修复模长
            W = W / np.sqrt((W @ W))

    def train(self, count=100):
        self.W = self.BatchGraidentDescent(self.W, self.X, count)

    def draw(self):
        draw_x = np.arange(-120, 120, 0.01)
        draw_y = -self.W[0] / self.W[1] * draw_x
        draw_y = [-self.W[2] / self.W[1] + draw_y[i] for i in range(len(draw_y))]
        plt.plot(draw_x, draw_y)
        plt.show()

    def __init__(self):
        array_size = (50, 2)
        array1 = np.random.randint(50, 100, size=array_size)
        array2 = np.random.randint(-100, -50, size=array_size)
        array = np.vstack((array1, array2))
        column = np.ones(100)
        self.X = np.column_stack((array, column))
        plt.scatter(array[:, 0], array[0:, 1])
        self.W = np.array([1, 2, 3])
        self.W = self.W / np.sqrt((self.W @ self.W))

g = MyFirstMachineLeaningAlgorithm()
g.train()
g.draw()

Answer 1

我在 post 之前已经解决了这个问题。不过我觉得可能对别人有帮助，所以还是post吧。

XW是从X@W推导出来的，应该是100x1的矩阵吧？但是当结果可以看作一个向量（nx1 或 1xn）时，结果将是一个向量。向量的形状是(n,)或(,n)，矩阵的形状是(n,1)或(1,n)，这就是它们的区别。

在python中，向量默认为行向量。所以XW不能和X一起广播。但是reshape之后变成了一个(100,1)矩阵，那么他们就可以广播了。

Answer 2

最好post 复制粘贴错误信息，而不是图片。不过有图聊胜于无。

所以错误出现在这个剪辑的最后一行：

        XW = X @ W
        XW = 1 - 2 * XW

        #XW = XW.reshape((X.shape[0],1))
        gradient = X*XW

仅从函数定义看不出X和W的形状。显然 X 是 2d (100,n)。如果 W 是 (n,)，那么 XW 将是 (100,)，乘积和在 n 维度上。如果不清楚，请阅读 np.matmul 文档。

根据 broadcasting 的规则（查找它们），如果一个数组的维度不如另一个数组多，它将根据需要添加前导维度。因此 (100,) 可以变成 (1,100)。但是为了避免歧义，它不会添加尾随维度。你必须自己提供。所以最后一行应该变成

 gradient = X * XW[:,None]

或使用 XW.reshape(-1,1) 或您的版本的等效项。

因为数组可以是 1d（甚至 0d），所以 row vector 或 column vector 等术语的价值有限。在某些情况下，一维数组可以被认为是行向量 - 这种自动引导维度适用的情况。

在init,

    self.X = np.column_stack((array, column))
    self.W = np.array([1, 2, 3])

X 是 (100,3) 而 W 是 (3,)。 X@W 那么 (100,).

In [45]: X=np.ones((100,3)); W=np.array([1,2,3])
In [46]: (X@W).shape
Out[46]: (100,)
In [47]: X * (1+(X@W)[:,None]);

操作数无法与形状 (100,3) (100,) 一起广播，为什么？

operands could not be broadcast together with shapes (100,3) (100,) , why?

python

numpy

broadcasting