视频分类的迁移学习
Transfer learning for video classification
如何使用预训练模型训练视频分类模型?我的数据集形状是 (4000,10,150,150,1),我尝试使用 Conv2D TimeDistributed 对人类动作识别进行分类。
我可以在没有迁移学习的情况下进行训练,但准确性很差。
我尝试过的:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
model = models.Sequential()
model.add(conv_base)
model.add(TimeDistributed(Conv2D(96, (3, 3), padding='same',
input_shape=x_train.shape[1:])))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.35)))
.
.
.
.
但是我得到了ValueError: strides should be of length 1, 1 or 3 but was 2
有人有什么想法吗?
我假设每个视频有 10 帧。这是一个简单的模型,它对每个帧使用 VGG16 特征 (GloabAveragePooling),并使用 LSTM 对帧序列进行分类。
您可以通过添加更多层、更改超参数来进行试验。
N.B:您的模型中存在许多不一致之处,包括将 5 维数据直接传递给需要 4 维数据的 VGG16。
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
IMG_SIZE=(150,150,3)
num_class = 3
def create_base():
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
x = GlobalAveragePooling2D()(conv_base.output)
base_model = Model(conv_base.input, x)
return base_model
conv_base = create_base()
ip = Input(shape=(10,150,150,3))
t_conv = TimeDistributed(conv_base)(ip) # vgg16 feature extractor
t_lstm = LSTM(10, return_sequences=False)(t_conv)
f_softmax = Dense(num_class, activation='softmax')(t_lstm)
model = Model(ip, f_softmax)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_32 (InputLayer) [(None, 10, 150, 150, 3)] 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 512) 14714688
_________________________________________________________________
lstm_1 (LSTM) (None, 10) 20920
_________________________________________________________________
dense (Dense) (None, 3) 33
=================================================================
Total params: 14,735,641
Trainable params: 14,735,641
Non-trainable params: 0
________________________
如何使用预训练模型训练视频分类模型?我的数据集形状是 (4000,10,150,150,1),我尝试使用 Conv2D TimeDistributed 对人类动作识别进行分类。 我可以在没有迁移学习的情况下进行训练,但准确性很差。 我尝试过的:
from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
model = models.Sequential()
model.add(conv_base)
model.add(TimeDistributed(Conv2D(96, (3, 3), padding='same',
input_shape=x_train.shape[1:])))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.35)))
.
.
.
.
但是我得到了ValueError: strides should be of length 1, 1 or 3 but was 2
有人有什么想法吗?
我假设每个视频有 10 帧。这是一个简单的模型,它对每个帧使用 VGG16 特征 (GloabAveragePooling),并使用 LSTM 对帧序列进行分类。
您可以通过添加更多层、更改超参数来进行试验。
N.B:您的模型中存在许多不一致之处,包括将 5 维数据直接传递给需要 4 维数据的 VGG16。
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np
from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
IMG_SIZE=(150,150,3)
num_class = 3
def create_base():
conv_base = VGG16(weights='imagenet',
include_top=False,
input_shape=(150, 150, 3))
x = GlobalAveragePooling2D()(conv_base.output)
base_model = Model(conv_base.input, x)
return base_model
conv_base = create_base()
ip = Input(shape=(10,150,150,3))
t_conv = TimeDistributed(conv_base)(ip) # vgg16 feature extractor
t_lstm = LSTM(10, return_sequences=False)(t_conv)
f_softmax = Dense(num_class, activation='softmax')(t_lstm)
model = Model(ip, f_softmax)
model.summary()
Model: "model_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_32 (InputLayer) [(None, 10, 150, 150, 3)] 0
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 512) 14714688
_________________________________________________________________
lstm_1 (LSTM) (None, 10) 20920
_________________________________________________________________
dense (Dense) (None, 3) 33
=================================================================
Total params: 14,735,641
Trainable params: 14,735,641
Non-trainable params: 0
________________________