如何从 Tensorflow 中的 Huggingface 修改基础 ViT 架构
How to modify base ViT architecture from Huggingface in Tensorflow
我是拥抱脸的新手,想采用与 ViT 中相同的 Transformer 架构来对我的领域进行图像分类。
因此,我需要更改输入形状并完成扩充。
来自 huggingface 的片段:
from transformers import ViTFeatureExtractor, TFViTForImageClassification
import tensorflow as tf
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
model = TFViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
inputs = feature_extractor(images=image, return_tensors="tf")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = tf.math.argmax(logits, axis=-1)[0]
print("Predicted class:", model.config.id2label[int(predicted_class_idx)])
当我做 mode.summary()
我得到以下结果:
Model: "tf_vi_t_for_image_classification_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vit (TFViTMainLayer) multiple 85798656
classifier (Dense) multiple 769000
=================================================================
Total params: 86,567,656
Trainable params: 86,567,656
Non-trainable params: 0
如图所示,ViT base的图层是封装好的,有没有办法解开图层,让我可以修改特定的图层?
对于您的情况,我建议查看源代码 here 并跟踪调用的 classes。例如要获取 Embeddings
class 的图层,您可以 运行:
print(model.layers[0].embeddings.patch_embeddings.projection)
print(model.layers[0].embeddings.dropout)
<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>
或者如果你想获得第一个 Attention
块的图层,尝试:
print(model.layers[0].encoder.layer[0].attention.self_attention.query)
print(model.layers[0].encoder.layer[0].attention.self_attention.key)
print(model.layers[0].encoder.layer[0].attention.self_attention.value)
print(model.layers[0].encoder.layer[0].attention.self_attention.dropout)
print(model.layers[0].encoder.layer[0].attention.dense_output.dense)
print(model.layers[0].encoder.layer[0].attention.dense_output.dropout)
<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>
<keras.layers.core.dense.Dense object at 0x7fea62ec7f90>
<keras.layers.core.dense.Dense object at 0x7fea62ec7b50>
<keras.layers.core.dense.Dense object at 0x7fea62ec79d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5c90>
<keras.layers.core.dense.Dense object at 0x7fea62cf5250>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5410>
等等。
我是拥抱脸的新手,想采用与 ViT 中相同的 Transformer 架构来对我的领域进行图像分类。 因此,我需要更改输入形状并完成扩充。
来自 huggingface 的片段:
from transformers import ViTFeatureExtractor, TFViTForImageClassification
import tensorflow as tf
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
feature_extractor = ViTFeatureExtractor.from_pretrained("google/vit-base-patch16-224")
model = TFViTForImageClassification.from_pretrained("google/vit-base-patch16-224")
inputs = feature_extractor(images=image, return_tensors="tf")
outputs = model(**inputs)
logits = outputs.logits
# model predicts one of the 1000 ImageNet classes
predicted_class_idx = tf.math.argmax(logits, axis=-1)[0]
print("Predicted class:", model.config.id2label[int(predicted_class_idx)])
当我做 mode.summary()
我得到以下结果:
Model: "tf_vi_t_for_image_classification_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vit (TFViTMainLayer) multiple 85798656
classifier (Dense) multiple 769000
=================================================================
Total params: 86,567,656
Trainable params: 86,567,656
Non-trainable params: 0
如图所示,ViT base的图层是封装好的,有没有办法解开图层,让我可以修改特定的图层?
对于您的情况,我建议查看源代码 here 并跟踪调用的 classes。例如要获取 Embeddings
class 的图层,您可以 运行:
print(model.layers[0].embeddings.patch_embeddings.projection)
print(model.layers[0].embeddings.dropout)
<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>
或者如果你想获得第一个 Attention
块的图层,尝试:
print(model.layers[0].encoder.layer[0].attention.self_attention.query)
print(model.layers[0].encoder.layer[0].attention.self_attention.key)
print(model.layers[0].encoder.layer[0].attention.self_attention.value)
print(model.layers[0].encoder.layer[0].attention.self_attention.dropout)
print(model.layers[0].encoder.layer[0].attention.dense_output.dense)
print(model.layers[0].encoder.layer[0].attention.dense_output.dropout)
<keras.layers.convolutional.Conv2D object at 0x7fea6264c6d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62d65110>
<keras.layers.core.dense.Dense object at 0x7fea62ec7f90>
<keras.layers.core.dense.Dense object at 0x7fea62ec7b50>
<keras.layers.core.dense.Dense object at 0x7fea62ec79d0>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5c90>
<keras.layers.core.dense.Dense object at 0x7fea62cf5250>
<keras.layers.core.dropout.Dropout object at 0x7fea62cf5410>
等等。