为什么在斯坦福的 cs231n SVM 中点积倒退？

Question

我正在观看斯坦福 cs231n 的 Youtube 视频，并尝试将作业作为练习。在执行 SVM 时，我运行进入以下代码段：

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W) # This line
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin

这是我遇到问题的行：

scores = X[i].dot(W)

这是在做产品xW，不应该是Wx吗？我的意思是 W.dot(X[i])

Answer 1

因为W和X的数组形状分别是(D, C)和(N, D)，所以不能直接取点积，必须先转置它们（对于矩阵乘法，它们必须是 (C, D)·(D, N)。

自 X.T.dot(W.T) == W.dot(X) 以来，实现只是颠倒点积的顺序，而不是对每个数组进行变换。实际上，这归结为关于如何安排输入的决定。在这种情况下，（有点武断的）决定以更直观的方式排列样本和特征，而不是将点积设为 x·W.

为什么在斯坦福的 cs231n SVM 中点积倒退？

Why are dot products backwards in Stanford's cs231n SVM?

python

math

machine-learning

svm

deep-learning