为什么在斯坦福的 cs231n SVM 中点积倒退?

Why are dot products backwards in Stanford's cs231n SVM?

我正在观看斯坦福 cs231n 的 Youtube 视频,并尝试将作业作为练习。在执行 SVM 时,我 运行 进入以下代码段:

def svm_loss_naive(W, X, y, reg):
  """
  Structured SVM loss function, naive implementation (with loops).

  Inputs have dimension D, there are C classes, and we operate on minibatches
  of N examples.

  Inputs:
  - W: A numpy array of shape (D, C) containing weights.
  - X: A numpy array of shape (N, D) containing a minibatch of data.
  - y: A numpy array of shape (N,) containing training labels; y[i] = c means
    that X[i] has label c, where 0 <= c < C.
  - reg: (float) regularization strength

  Returns a tuple of:
  - loss as single float
  - gradient with respect to weights W; an array of same shape as W
  """
  dW = np.zeros(W.shape) # initialize the gradient as zero

  # compute the loss and the gradient
  num_classes = W.shape[1]
  num_train = X.shape[0]
  loss = 0.0
  for i in range(num_train):
    scores = X[i].dot(W) # This line
    correct_class_score = scores[y[i]]
    for j in range(num_classes):
      if j == y[i]:
        continue
      margin = scores[j] - correct_class_score + 1 # note delta = 1
      if margin > 0:
        loss += margin

这是我遇到问题的行:

scores = X[i].dot(W) 

这是在做产品xW,不应该是Wx吗?我的意思是 W.dot(X[i])

因为WX的数组形状分别是(D, C)(N, D),所以不能直接取点积,必须先转置它们(对于矩阵乘法,它们必须是 (C, D)·(D, N)

X.T.dot(W.T) == W.dot(X) 以来,实现只是颠倒点积的顺序,而不是对每个数组进行变换。实际上,这归结为关于如何安排输入的决定。在这种情况下,(有点武断的)决定以更直观的方式排列样本和特征,而不是将点积设为 x·W.