如何从 Tensorflow 中的 Huggingface 修改基础 ViT 架构

Question

我是拥抱脸的新手，想采用与 ViT 中相同的 Transformer 架构来对我的领域进行图像分类。因此，我需要更改输入形状并完成扩充。

来自 huggingface 的片段：

from transformers import ViTFeatureExtractor, TFViTForImageClassification
import tensorflow as tf
from PIL import Image
import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
model = TFViTForImageClassification.from_pretrained("google/vit-base-patch16-224")

inputs = feature_extractor(images=image, return_tensors="tf")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = tf.math.argmax(logits, axis=-1)[0]
print("Predicted class:", model.config.id2label[int(predicted_class_idx)])

当我做 mode.summary()

我得到以下结果：

Model: "tf_vi_t_for_image_classification_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 vit (TFViTMainLayer)        multiple                  85798656  
                                                                 
 classifier (Dense)          multiple                  769000    
                                                                 
=================================================================
Total params: 86,567,656
Trainable params: 86,567,656
Non-trainable params: 0

如图所示，ViT base的图层是封装好的，有没有办法解开图层，让我可以修改特定的图层？

Answer 1

对于您的情况，我建议查看源代码 here 并跟踪调用的 classes。例如要获取 Embeddings class 的图层，您可以运行:

print(model.layers[0].embeddings.patch_embeddings.projection)
print(model.layers[0].embeddings.dropout)

<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>

或者如果你想获得第一个 Attention 块的图层，尝试：

print(model.layers[0].encoder.layer[0].attention.self_attention.query)
print(model.layers[0].encoder.layer[0].attention.self_attention.key)
print(model.layers[0].encoder.layer[0].attention.self_attention.value)
print(model.layers[0].encoder.layer[0].attention.self_attention.dropout)
print(model.layers[0].encoder.layer[0].attention.dense_output.dense)
print(model.layers[0].encoder.layer[0].attention.dense_output.dropout)

<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>
<keras.layers.core.dense.Dense object at 0x7fea62ec7f90>
<keras.layers.core.dense.Dense object at 0x7fea62ec7b50>
<keras.layers.core.dense.Dense object at 0x7fea62ec79d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5c90>
<keras.layers.core.dense.Dense object at 0x7fea62cf5250>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5410>

等等。

如何从 Tensorflow 中的 Huggingface 修改基础 ViT 架构

How to modify base ViT architecture from Huggingface in Tensorflow

python

keras

tensorflow

huggingface-transformers