纪元内错误的平均误差和标准偏差未正确更新 - PyTorch
Average error and standard deviation of error within epoch not correctly updating - PyTorch
我正在尝试使用随机梯度下降,但我不确定为什么我的 error/loss 没有减少。我从 train
数据框中使用的信息是索引(每个序列)和结合亲和力,目标是预测结合亲和力。这是数据框的头部:
对于训练,我制作了一个序列并用另一个矩阵计算分数,目标是让这个分数尽可能接近结合亲和力(对于任何给定的肽) .我如何计算分数和我的训练循环显示在下面的代码中,但我认为没有必要解释为什么我的错误没有减少。
#ONE-HOT ENCODING
AA=['A','R','N','D','C','Q','E','G','H','I','L','K','M','F','P','S','T','W','Y','V']
loc=['N','2','3','4','5','6','7','8','9','10','11','C']
aa = "ARNDCQEGHILKMFPSTWYV"
def p_one_hot(seq):
c2i = dict((c,i) for i,c in enumerate(aa))
int_encoded = [c2i[char] for char in seq]
onehot_encoded = list()
for value in int_encoded:
letter = [0 for _ in range(len(aa))]
letter[value] = 1
onehot_encoded.append(letter)
return(torch.Tensor(np.transpose(onehot_encoded)))
#INITALIZE TENSORS
a=Var(torch.randn(20,1),requires_grad=True) #initalize similarity matrix - random array of 20 numbers
freq_m=Var(torch.randn(12,20),requires_grad=True)
freq_m.data=(freq_m.data-freq_m.min().data)/(freq_m.max().data-freq_m.min().data)#0 to 1 scaling
optimizer = optim.SGD([torch.nn.Parameter(a), torch.nn.Parameter(freq_m)], lr=1e-6)
loss = nn.MSELoss()
#TRAINING LOOP
epochs = 100
for i in range(epochs):
#RANDOMLY SAMPLE DATA
train = all_seq.sample(frac=.03)
names = train.index.values.tolist()
affinities = train['binding_affinity']
print('Epoch: ' + str(i))
#forward pass
iteration_loss=[]
for j, seq in enumerate(names):
sm=torch.mm(a,a.t()) #make simalirity matrix square symmetric
freq_m.data=freq_m.data/freq_m.data.sum(1,keepdim=True) #sum of each row must be 1 (sum of probabilities of each amino acid at each position)
affin_score = affinities[j]
new_m = torch.mm(p_one_hot(seq), freq_m)
tss_m = new_m * sm
tss_score = tss_m.sum()
sms = sm
fms = freq_m
error = loss(tss_score, torch.FloatTensor(torch.Tensor([affin_score])))
iteration_loss.append(error.item())
optimizer.zero_grad()
error.backward()
optimizer.step()
mean = statistics.mean(iteration_loss)
stdev = statistics.stdev(iteration_loss)
print('Epoch Average Error: ' + str(mean) + '. Epoch Standard Deviation: ' + str(stdev))
iteration_loss.clear()
在每个 epoch 之后,我打印出那个 epoch 的所有错误的平均值以及标准差。每个时期运行大约 45,000 个序列。然而,在 10 个时期之后,我仍然没有看到我的错误有任何改善,我不确定为什么。这是我看到的输出:
关于我做错了什么有什么想法吗?我是 PyTorch 的新手,非常感谢您的帮助!谢谢!
事实证明,将优化器参数转换为 torch.nn.Parameter() 会使张量无法保持更新,现在删除它会显示错误减少。
我正在尝试使用随机梯度下降,但我不确定为什么我的 error/loss 没有减少。我从 train
数据框中使用的信息是索引(每个序列)和结合亲和力,目标是预测结合亲和力。这是数据框的头部:
对于训练,我制作了一个序列并用另一个矩阵计算分数,目标是让这个分数尽可能接近结合亲和力(对于任何给定的肽) .我如何计算分数和我的训练循环显示在下面的代码中,但我认为没有必要解释为什么我的错误没有减少。
#ONE-HOT ENCODING
AA=['A','R','N','D','C','Q','E','G','H','I','L','K','M','F','P','S','T','W','Y','V']
loc=['N','2','3','4','5','6','7','8','9','10','11','C']
aa = "ARNDCQEGHILKMFPSTWYV"
def p_one_hot(seq):
c2i = dict((c,i) for i,c in enumerate(aa))
int_encoded = [c2i[char] for char in seq]
onehot_encoded = list()
for value in int_encoded:
letter = [0 for _ in range(len(aa))]
letter[value] = 1
onehot_encoded.append(letter)
return(torch.Tensor(np.transpose(onehot_encoded)))
#INITALIZE TENSORS
a=Var(torch.randn(20,1),requires_grad=True) #initalize similarity matrix - random array of 20 numbers
freq_m=Var(torch.randn(12,20),requires_grad=True)
freq_m.data=(freq_m.data-freq_m.min().data)/(freq_m.max().data-freq_m.min().data)#0 to 1 scaling
optimizer = optim.SGD([torch.nn.Parameter(a), torch.nn.Parameter(freq_m)], lr=1e-6)
loss = nn.MSELoss()
#TRAINING LOOP
epochs = 100
for i in range(epochs):
#RANDOMLY SAMPLE DATA
train = all_seq.sample(frac=.03)
names = train.index.values.tolist()
affinities = train['binding_affinity']
print('Epoch: ' + str(i))
#forward pass
iteration_loss=[]
for j, seq in enumerate(names):
sm=torch.mm(a,a.t()) #make simalirity matrix square symmetric
freq_m.data=freq_m.data/freq_m.data.sum(1,keepdim=True) #sum of each row must be 1 (sum of probabilities of each amino acid at each position)
affin_score = affinities[j]
new_m = torch.mm(p_one_hot(seq), freq_m)
tss_m = new_m * sm
tss_score = tss_m.sum()
sms = sm
fms = freq_m
error = loss(tss_score, torch.FloatTensor(torch.Tensor([affin_score])))
iteration_loss.append(error.item())
optimizer.zero_grad()
error.backward()
optimizer.step()
mean = statistics.mean(iteration_loss)
stdev = statistics.stdev(iteration_loss)
print('Epoch Average Error: ' + str(mean) + '. Epoch Standard Deviation: ' + str(stdev))
iteration_loss.clear()
在每个 epoch 之后,我打印出那个 epoch 的所有错误的平均值以及标准差。每个时期运行大约 45,000 个序列。然而,在 10 个时期之后,我仍然没有看到我的错误有任何改善,我不确定为什么。这是我看到的输出:
关于我做错了什么有什么想法吗?我是 PyTorch 的新手,非常感谢您的帮助!谢谢!
事实证明,将优化器参数转换为 torch.nn.Parameter() 会使张量无法保持更新,现在删除它会显示错误减少。