为什么在斯坦福的 cs231n SVM 中点积倒退?
Why are dot products backwards in Stanford's cs231n SVM?
我正在观看斯坦福 cs231n 的 Youtube 视频,并尝试将作业作为练习。在执行 SVM 时,我 运行 进入以下代码段:
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W) # This line
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
这是我遇到问题的行:
scores = X[i].dot(W)
这是在做产品xW,不应该是Wx吗?我的意思是 W.dot(X[i])
因为W
和X
的数组形状分别是(D, C)
和(N, D)
,所以不能直接取点积,必须先转置它们(对于矩阵乘法,它们必须是 (C, D)·(D, N)
。
自 X.T.dot(W.T) == W.dot(X)
以来,实现只是颠倒点积的顺序,而不是对每个数组进行变换。实际上,这归结为关于如何安排输入的决定。在这种情况下,(有点武断的)决定以更直观的方式排列样本和特征,而不是将点积设为 x·W
.
我正在观看斯坦福 cs231n 的 Youtube 视频,并尝试将作业作为练习。在执行 SVM 时,我 运行 进入以下代码段:
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).
Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.
Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength
Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
dW = np.zeros(W.shape) # initialize the gradient as zero
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = 0.0
for i in range(num_train):
scores = X[i].dot(W) # This line
correct_class_score = scores[y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss += margin
这是我遇到问题的行:
scores = X[i].dot(W)
这是在做产品xW,不应该是Wx吗?我的意思是 W.dot(X[i])
因为W
和X
的数组形状分别是(D, C)
和(N, D)
,所以不能直接取点积,必须先转置它们(对于矩阵乘法,它们必须是 (C, D)·(D, N)
。
自 X.T.dot(W.T) == W.dot(X)
以来,实现只是颠倒点积的顺序,而不是对每个数组进行变换。实际上,这归结为关于如何安排输入的决定。在这种情况下,(有点武断的)决定以更直观的方式排列样本和特征,而不是将点积设为 x·W
.