如何从 Pytorch 中的单个图像中提取特征向量?
How to extract feature vector from single image in Pytorch?
我正在尝试更多地了解计算机视觉模型,并且正在尝试探索它们的工作原理。为了更好地理解如何解释特征向量,我尝试使用 Pytorch 来提取特征向量。下面是我从不同地方拼凑的代码。
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image
img=Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image_name):
# Load the image with Pillow library
img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
# Create a PyTorch Variable with the transformed image
t_img = transforms(img)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.data)
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
model(t_img)
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
执行此操作时出现以下错误:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead
我确定这是一个基本错误,但我似乎无法弄清楚如何解决这个问题。我的印象是“totensor”转换会使我的数据成为 4-d,但它似乎无法正常工作或者我误解了它。感谢我可以用来了解更多相关信息的任何帮助或资源!
pytorch 中的所有默认 nn.Modules
都需要额外的批处理维度。如果模块的输入是形状 (B, ...) 那么输出也将是 (B, ...) (尽管后面的维度可能会根据层而改变)。此行为允许同时对 B 批输入进行有效推理。为了让你的代码符合你可以只 unsqueeze
an additional unitary dimension onto the front of t_img
tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten
layer
的输出,如果你想将它复制到你的 one-dimensional my_embedding
张量中,然后存储它。
其他几件事:
您应该在 torch.no_grad()
上下文中进行推断以避免计算梯度,因为您不需要它们(请注意 model.eval()
只是改变某些层的行为,例如 dropout和批量归一化,它不会禁用计算图的构造,但 torch.no_grad()
会)。
我认为这只是一个复制粘贴问题,但 transforms
是导入模块的名称以及全局变量。
o.data
只是返回 o
的副本。在旧的 Variable
接口(大约 PyTorch 0.3.1 和更早版本)中,这曾经是必需的,但是 Variable
接口是 deprecated way back in PyTorch 0.4.0 并且不再做任何有用的事情;现在它的使用只会造成混乱。不幸的是,许多教程仍在使用这个不必要的旧界面编写。
更新后的代码如下:
import torch
import torchvision
import torchvision.models as models
from PIL import Image
img = Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image):
# Create a PyTorch tensor with the transformed image
t_img = transforms(image)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.flatten()) # <-- flatten
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
with torch.no_grad(): # <-- no_grad context
model(t_img.unsqueeze(0)) # <-- unsqueeze
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
model(t_img)
而不是这个
这里就做--
model(t_img[None])
这将增加一个额外的维度,因此图像将具有 [1,3,224,224]
的形状并且它会起作用。
我正在尝试更多地了解计算机视觉模型,并且正在尝试探索它们的工作原理。为了更好地理解如何解释特征向量,我尝试使用 Pytorch 来提取特征向量。下面是我从不同地方拼凑的代码。
import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image
img=Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image_name):
# Load the image with Pillow library
img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
# Create a PyTorch Variable with the transformed image
t_img = transforms(img)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.data)
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
model(t_img)
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
执行此操作时出现以下错误:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 224, 224] instead
我确定这是一个基本错误,但我似乎无法弄清楚如何解决这个问题。我的印象是“totensor”转换会使我的数据成为 4-d,但它似乎无法正常工作或者我误解了它。感谢我可以用来了解更多相关信息的任何帮助或资源!
pytorch 中的所有默认 nn.Modules
都需要额外的批处理维度。如果模块的输入是形状 (B, ...) 那么输出也将是 (B, ...) (尽管后面的维度可能会根据层而改变)。此行为允许同时对 B 批输入进行有效推理。为了让你的代码符合你可以只 unsqueeze
an additional unitary dimension onto the front of t_img
tensor before sending it into your model to make it a (1, ...) tensor. You will also need to flatten
layer
的输出,如果你想将它复制到你的 one-dimensional my_embedding
张量中,然后存储它。
其他几件事:
您应该在
torch.no_grad()
上下文中进行推断以避免计算梯度,因为您不需要它们(请注意model.eval()
只是改变某些层的行为,例如 dropout和批量归一化,它不会禁用计算图的构造,但torch.no_grad()
会)。我认为这只是一个复制粘贴问题,但
transforms
是导入模块的名称以及全局变量。o.data
只是返回o
的副本。在旧的Variable
接口(大约 PyTorch 0.3.1 和更早版本)中,这曾经是必需的,但是Variable
接口是 deprecated way back in PyTorch 0.4.0 并且不再做任何有用的事情;现在它的使用只会造成混乱。不幸的是,许多教程仍在使用这个不必要的旧界面编写。
更新后的代码如下:
import torch
import torchvision
import torchvision.models as models
from PIL import Image
img = Image.open("Documents/01235.png")
# Load the pretrained model
model = models.resnet18(pretrained=True)
# Use the model object to select the desired layer
layer = model._modules.get('avgpool')
# Set model to evaluation mode
model.eval()
transforms = torchvision.transforms.Compose([
torchvision.transforms.Resize(256),
torchvision.transforms.CenterCrop(224),
torchvision.transforms.ToTensor(),
torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
def get_vector(image):
# Create a PyTorch tensor with the transformed image
t_img = transforms(image)
# Create a vector of zeros that will hold our feature vector
# The 'avgpool' layer has an output size of 512
my_embedding = torch.zeros(512)
# Define a function that will copy the output of a layer
def copy_data(m, i, o):
my_embedding.copy_(o.flatten()) # <-- flatten
# Attach that function to our selected layer
h = layer.register_forward_hook(copy_data)
# Run the model on our transformed image
with torch.no_grad(): # <-- no_grad context
model(t_img.unsqueeze(0)) # <-- unsqueeze
# Detach our copy function from the layer
h.remove()
# Return the feature vector
return my_embedding
pic_vector = get_vector(img)
model(t_img)
而不是这个
这里就做--
model(t_img[None])
这将增加一个额外的维度,因此图像将具有 [1,3,224,224]
的形状并且它会起作用。