是否可以按宽度扩展冻结神经网络模型的节点?
Is it possible to expand nodes of a frozen neural network model by width?
我想在pytorch中按宽度扩展冻结神经网络模型的节点数。我想做如下图所示的事情,其中灰色是冻结的权重,绿色是新添加的可训练权重。
.
我有一个初始模型,它接受 3 个输入并返回一个输出,该模型还有两个隐藏层,节点为 h1=5 和 h2=3 分别。我在 pytorch 中创建了模型并冻结了权重。
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(3, 5)
self.fc2 = nn.Linear(5, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self,x):
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
print(Net())
model = Net()
X = torch.rand(5,3)
y = model(X)
print(y)
# Freeze layers
for param in model.parameters():
param.requires_grad = False
现在我想通过将可训练节点添加到 h1=5+2、h2=3+1 和 输出=1+1。只有新添加的节点应该是可训练的,所有其他权重都应该被冻结,并且那些冻结的权重应该与父模型具有相同的权重。这可以在 pytorch 或 tensorflow 中完成吗?
有两件事需要完成
1。展开图层
您确实应该使用 ModuleList
或 ModuleDict
来创建图层,因为这意味着您可以使用循环。我知道 eval
或 setattr
也可以,但它们往往会破坏其他东西,所以我不想使用它们。
我能想到两种方法。一种是直接用更大的东西替换 weight
,另一种是创建一个更大的图层并替换整个图层。
# Replace the weight with randomly generated tensor
fc1_newweight = torch.rand(7, 3)
fc1_newbias = torch.rand(7)
fc1_shape = model.fc1.weight.shape
fc1_newweight[:fc1_shape[0], :fc1_shape[1]] = model.fc1.weight.clone()
fc1_newbias[:fc1_shape[0]] = model.fc1.bias.clone()
model.fc1.weight = torch.nn.Parameter(fc1_newweight)
model.fc1.bias = torch.nn.Parameter(fc1_newbias)
# Replace the weight with the random generated weights from the new layer
fc2_shape = model.fc2.weight.shape
fc2 = nn.Linear(7, 4)
fc2_weight = fc2.state_dict()
fc2_weight['weight'][:fc2_shape[0], :fc2_shape[1]] = model.fc2.weight.clone()
fc2_weight['bias'][:fc2_shape[0]] = model.fc2.bias.clone()
fc2.load_state_dict(fc2_weight)
model.fc2.weight = torch.nn.Parameter(fc2_weight['weight'])
model.fc2.bias = torch.nn.Parameter(fc2_weight['bias'])
# Replace the whole layer
fc3_shape = model.fc3.weight.shape
fc3 = nn.Linear(4, 2)
fc3_weight = fc3.state_dict()
fc3_weight['weight'][:fc3_shape[0], :fc3_shape[1]] = model.fc3.weight.clone()
fc3_weight['bias'][:fc3_shape[0]] = model.fc3.bias.clone()
fc3.load_state_dict(fc3_weight)
model.fc3 = fc3
我更喜欢 2. 或 3. 而不是 1. 因为权重将使用 nn.init.kaiming_uniform
而不是统一生成。
2。 Select 什么是可训练的
这很棘手,因为您不能只对权重的某些元素设置 require_grad
,因为您会得到 RuntimeError: you can only change requires_grad flags of leaf variables.
但是像这样的东西应该是一个足够好的替代品。同样,使用 ModuleList
也会使此处的代码看起来更好。
y = model(x)
loss = criterion(y, target)
loss.backward()
model.fc1.weight.grad[:fc1_shape[0], :fc1_shape[1]] = 0
model.fc1.bias.grad[:fc1_shape[0]] = 0
model.fc2.weight.grad[:fc2_shape[0], :fc2_shape[1]] = 0
model.fc2.bias.grad[:fc2_shape[0]] = 0
model.fc3.weight.grad[:fc3_shape[0], :fc3_shape[1]] = 0
model.fc3.bias.grad[:fc3_shape[0]] = 0
optimizer.step()
我想在pytorch中按宽度扩展冻结神经网络模型的节点数。我想做如下图所示的事情,其中灰色是冻结的权重,绿色是新添加的可训练权重。
我有一个初始模型,它接受 3 个输入并返回一个输出,该模型还有两个隐藏层,节点为 h1=5 和 h2=3 分别。我在 pytorch 中创建了模型并冻结了权重。
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net,self).__init__()
self.fc1 = nn.Linear(3, 5)
self.fc2 = nn.Linear(5, 3)
self.fc3 = nn.Linear(3, 1)
def forward(self,x):
x = self.fc1(x)
x = self.fc2(x)
x = self.fc3(x)
return x
print(Net())
model = Net()
X = torch.rand(5,3)
y = model(X)
print(y)
# Freeze layers
for param in model.parameters():
param.requires_grad = False
现在我想通过将可训练节点添加到 h1=5+2、h2=3+1 和 输出=1+1。只有新添加的节点应该是可训练的,所有其他权重都应该被冻结,并且那些冻结的权重应该与父模型具有相同的权重。这可以在 pytorch 或 tensorflow 中完成吗?
有两件事需要完成
1。展开图层
您确实应该使用 ModuleList
或 ModuleDict
来创建图层,因为这意味着您可以使用循环。我知道 eval
或 setattr
也可以,但它们往往会破坏其他东西,所以我不想使用它们。
我能想到两种方法。一种是直接用更大的东西替换 weight
,另一种是创建一个更大的图层并替换整个图层。
# Replace the weight with randomly generated tensor
fc1_newweight = torch.rand(7, 3)
fc1_newbias = torch.rand(7)
fc1_shape = model.fc1.weight.shape
fc1_newweight[:fc1_shape[0], :fc1_shape[1]] = model.fc1.weight.clone()
fc1_newbias[:fc1_shape[0]] = model.fc1.bias.clone()
model.fc1.weight = torch.nn.Parameter(fc1_newweight)
model.fc1.bias = torch.nn.Parameter(fc1_newbias)
# Replace the weight with the random generated weights from the new layer
fc2_shape = model.fc2.weight.shape
fc2 = nn.Linear(7, 4)
fc2_weight = fc2.state_dict()
fc2_weight['weight'][:fc2_shape[0], :fc2_shape[1]] = model.fc2.weight.clone()
fc2_weight['bias'][:fc2_shape[0]] = model.fc2.bias.clone()
fc2.load_state_dict(fc2_weight)
model.fc2.weight = torch.nn.Parameter(fc2_weight['weight'])
model.fc2.bias = torch.nn.Parameter(fc2_weight['bias'])
# Replace the whole layer
fc3_shape = model.fc3.weight.shape
fc3 = nn.Linear(4, 2)
fc3_weight = fc3.state_dict()
fc3_weight['weight'][:fc3_shape[0], :fc3_shape[1]] = model.fc3.weight.clone()
fc3_weight['bias'][:fc3_shape[0]] = model.fc3.bias.clone()
fc3.load_state_dict(fc3_weight)
model.fc3 = fc3
我更喜欢 2. 或 3. 而不是 1. 因为权重将使用 nn.init.kaiming_uniform
而不是统一生成。
2。 Select 什么是可训练的
这很棘手,因为您不能只对权重的某些元素设置 require_grad
,因为您会得到 RuntimeError: you can only change requires_grad flags of leaf variables.
但是像这样的东西应该是一个足够好的替代品。同样,使用 ModuleList
也会使此处的代码看起来更好。
y = model(x)
loss = criterion(y, target)
loss.backward()
model.fc1.weight.grad[:fc1_shape[0], :fc1_shape[1]] = 0
model.fc1.bias.grad[:fc1_shape[0]] = 0
model.fc2.weight.grad[:fc2_shape[0], :fc2_shape[1]] = 0
model.fc2.bias.grad[:fc2_shape[0]] = 0
model.fc3.weight.grad[:fc3_shape[0], :fc3_shape[1]] = 0
model.fc3.bias.grad[:fc3_shape[0]] = 0
optimizer.step()