按关键点位置划分的 Vision Transformer 注意图 - TensorFlow
Vision Transformer attention map by keypoint location - TensorFlow
我已经在 TensorFlow 上训练了一个基于 https://github.com/yangsenius/TransPose and I would like to simulate the attention maps of each keypoint like this: https://raw.githubusercontent.com/yangsenius/TransPose/main/attention_map_image_dependency_transposeh_thres_0.00075.jpg
的关键点估计的 ViT 模型
我在 Pytorch 上找到了代码,但我不知道如何在 TensorFlow 上模拟它:
https://github.com/yangsenius/TransPose/blob/dab9007b6f61c9c8dce04d61669a04922bbcd148/visualize.py#L128
我通过获取multihead attention层上一层的输出并通过multihead attention传过去解决了:
atten_maps_hooks = [Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_0') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_1') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_2') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_3') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_4') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_5') - 1].output)]
for i in range(len(atten_maps_hooks)):
temp = atten_maps_hooks[i].predict(input)
mha, scores = model.get_layer('encoded_' + str(i))(temp, temp, return_attention_scores = True)
enc_atten_maps_hwhw.append(scores.numpy()[0].reshape(shape + shape))
我已经在 TensorFlow 上训练了一个基于 https://github.com/yangsenius/TransPose and I would like to simulate the attention maps of each keypoint like this: https://raw.githubusercontent.com/yangsenius/TransPose/main/attention_map_image_dependency_transposeh_thres_0.00075.jpg
的关键点估计的 ViT 模型我在 Pytorch 上找到了代码,但我不知道如何在 TensorFlow 上模拟它: https://github.com/yangsenius/TransPose/blob/dab9007b6f61c9c8dce04d61669a04922bbcd148/visualize.py#L128
我通过获取multihead attention层上一层的输出并通过multihead attention传过去解决了:
atten_maps_hooks = [Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_0') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_1') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_2') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_3') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_4') - 1].output),
Model(inputs = model.input, outputs = model.layers[getLayerIndexByName(model, 'encoded_5') - 1].output)]
for i in range(len(atten_maps_hooks)):
temp = atten_maps_hooks[i].predict(input)
mha, scores = model.get_layer('encoded_' + str(i))(temp, temp, return_attention_scores = True)
enc_atten_maps_hwhw.append(scores.numpy()[0].reshape(shape + shape))