如何在Pytorch中从单个图像中提取特征向量？

如何解决如何在Pytorch中从单个图像中提取特征向量？

我试图了解有关计算机视觉模型的更多信息，并且试图对它们的工作方式进行一些探索。为了进一步了解如何解释特征向量，我尝试使用Pytorch提取特征向量。下面是我从不同地方拼凑而成的代码。

import torch
import torch.nn as nn
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
from PIL import Image



img=Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
        torchvision.transforms.Resize(256),torchvision.transforms.CenterCrop(224),torchvision.transforms.ToTensor(),torchvision.transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]),])
    
def get_vector(image_name):
    # Load the image with Pillow library
    img = Image.open("Documents/Documents/Driven Data Competitions/Hateful Memes Identification/data/01235.png")
    # Create a PyTorch Variable with the transformed image
    t_img = transforms(img)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)
    # Define a function that will copy the output of a layer
    def copy_data(m,i,o):
        my_embedding.copy_(o.data)
    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    model(t_img)
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding

pic_vector = get_vector(img)

执行此操作时，出现以下错误：

RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64,3,7,7],but got 3-dimensional input of size [3,224,224] instead

我确定这是一个基本错误，但是我似乎无法弄清楚该如何解决。我的印象是，“张量”转换会使我的数据变成4维，但似乎它要么无法正常工作，要么我误会了它。感谢我可以用来了解更多有关此方面的帮助或资源！

解决方法

pytorch中所有默认的nn.Modules都希望有额外的批次尺寸。如果模块的输入为形状（B，...），则输出也将为（B，...）（尽管以后的尺寸可能会随层而变化）。此行为允许同时对一批B输入进行有效推断。为了使代码一致，您可以在将t_img张量发送到模型中以使其成为（1，...）张量之前，在layer张量的前面unsqueeze添加一个附加的单位维。如果想将my_embedding的输出复制到一维torch.no_grad()张量中，则还需要flatten进行存储。

其他一些事情：

您应该在model.eval()上下文中进行推断以避免计算梯度，因为您将不需要梯度（请注意，torch.no_grad()只是更改某些层的行为，例如退出和批处理规范化，它不会禁用计算图的构建，但是transforms会禁用）。
我认为这只是一个复制粘贴问题，但是o.data是导入模块的名称以及全局变量。
o仅返回Variable的副本。在旧的Variable界面（大约在PyTorch 0.3.1及更早版本）中，这是必需的，但是deprecated的界面0.4.0早在PyTorch {{3}}中就不再存在有什么用处吗？现在，它的使用只会造成混乱。不幸的是，许多教程仍在使用旧的不必要的界面编写。

更新后的代码如下：

import torch
import torchvision
import torchvision.models as models
from PIL import Image

img = Image.open("Documents/01235.png")

# Load the pretrained model
model = models.resnet18(pretrained=True)

# Use the model object to select the desired layer
layer = model._modules.get('avgpool')

# Set model to evaluation mode
model.eval()

transforms = torchvision.transforms.Compose([
    torchvision.transforms.Resize(256),torchvision.transforms.CenterCrop(224),torchvision.transforms.ToTensor(),torchvision.transforms.Normalize(mean=[0.485,0.456,0.406],std=[0.229,0.224,0.225]),])


def get_vector(image):
    # Create a PyTorch tensor with the transformed image
    t_img = transforms(image)
    # Create a vector of zeros that will hold our feature vector
    # The 'avgpool' layer has an output size of 512
    my_embedding = torch.zeros(512)

    # Define a function that will copy the output of a layer
    def copy_data(m,i,o):
        my_embedding.copy_(o.flatten())                 # <-- flatten

    # Attach that function to our selected layer
    h = layer.register_forward_hook(copy_data)
    # Run the model on our transformed image
    with torch.no_grad():                               # <-- no_grad context
        model(t_img.unsqueeze(0))                       # <-- unsqueeze
    # Detach our copy function from the layer
    h.remove()
    # Return the feature vector
    return my_embedding


pic_vector = get_vector(img)

如何在Pytorch中从单个图像中提取特征向量？

如何解决如何在Pytorch中从单个图像中提取特征向量？

解决方法

相关推荐