如何衡量预训练模型（例如vgg、resnet...）提取的图像特征之间的语义相似性？

Question

据我所知，预训练模型作为特征提取器在许多任务中表现良好，这要归功于其丰富的训练数据集。

但是，我想知道是否是模型，比方说 vgg-16，

have certain ability to extract some "semantic" information from input image?

如果答案是肯定的，给定一个unlabeled dataset，

is it possible to "cluster" images by measuring the semantic similarities of the extracted features?

其实我也付出了一些努力：

通过 Pytorch 加载预训练的 vgg-16。
加载 Cifar-10 数据集并转换为大小为 (5000, 3, 224, 224) 的批处理张量 X。
微调vgg.classifier，定义其输出维度为4096。
提取特征：

 features = vgg.features(X).view(X.shape[0], -1) # X: (5000, 3, 224, 224)

 features = vgg.classifier(features) # features: (5000, 25088)

 return features # features: (5000, 4096)

尝试cosine similarity、inner product、torch.cdist，但是，只发现了几个坏簇。

有什么建议吗？提前致谢。

Answer 1

您可能不想一路走到最后一层，因为这些层包含特定于手头分类任务的特征。使用分类器中更高层的特征可能会有所帮助。此外，你想切换到 eval 模式，因为 VGG-16 在其分类器中有一个 dropout 层。

>>> vgg16 = torchvision.models.vgg(pretrained=True).eval()

截断分类器：

>>> vgg16.classifier = vgg16.classifier[:4]

现在 vgg16 的分类器将如下所示：

(classifier): Sequential(
  (0): Linear(in_features=25088, out_features=4096, bias=True)
  (1): ReLU(inplace=True)
  (2): Dropout(p=0.5, inplace=False)
  (3): Linear(in_features=4096, out_features=4096, bias=True)
)

然后提取特征：

>>> vgg16(torch.rand(1, 3, 124, 124)).shape
torch.Size([1, 4096])

如何衡量预训练模型（例如vgg、resnet...）提取的图像特征之间的语义相似性？

How to measure the semantic similarities among image features extracted by pre-trained models(e.g. vgg, resnet...)?

python

cluster-analysis

cosine-similarity

pytorch

tensor