如何对多输入模型进行 kfold 交叉验证
How to do kfold cross-validation for multi-input models
型号如下:
inputs_1 = keras.Input(shape=(10081,1))
layer1 = Conv1D(64,14)(inputs_1)
layer2 = layers.MaxPool1D(5)(layer1)
layer3 = Conv1D(64, 14)(layer2)
layer4 = layers.GlobalMaxPooling1D()(layer3)
inputs_2 = keras.Input(shape=(85,))
layer5 = layers.concatenate([layer4, inputs_2])
layer6 = Dense(128, activation='relu')(layer5)
layer7 = Dense(2, activation='softmax')(layer6)
model_2 = keras.models.Model(inputs = [inputs_1, inputs_2], output = [layer7])
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:10166], df[['Result_cat','Result_cat1']].values, test_size=0.2)
X_train = X_train.to_numpy()
X_train = X_train.reshape([X_train.shape[0], X_train.shape[1], 1])
X_train_1 = X_train[:,0:10081,:]
X_train_2 = X_train[:,10081:10166,:].reshape(736,85)
X_test = X_test.to_numpy()
X_test = X_test.reshape([X_test.shape[0], X_test.shape[1], 1])
X_test_1 = X_test[:,0:10081,:]
X_test_2 = X_test[:,10081:10166,:].reshape(185,85)
adam = keras.optimizers.Adam(lr = 0.0005)
model_2.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['acc'])
history = model_2.fit([X_train_1,X_train_2], y_train, epochs = 120, batch_size = 256, validation_split = 0.2, callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)])
问题:
1) 数据为921rows x 10166columns
。每行都是一个观察值(前 10080 列是时间序列,其余列是其他统计特征)。根据模型,输入数据是否随机分为inputs_1
和inputs_2
?
2) 我正在考虑进行 kfold 交叉验证并将输入数据拆分为 inputs_1
和 inputs_2
。这样做的好方法是什么?谢谢
只拆分索引。
num_folds = 5
kfold = KFold(n_splits=num_folds, shuffle=False)
ID_Inp = np.array(range(nSamples))
ID_Out = np.array(range(nSamples))
Inputs = [Input1,Input2]
for IDs_Train, IDs_Test in kfold.split(ID_Inp, ID_Out):
Fold_Train_Input1, Fold_Train_Input2 = Input1[IDs_Train], Input2[IDs_Train]
Fold_Train_OutPut = Output[IDs_Train]
Fold_Test_Input1, Fold_Test_Input2 = Input1[IDs_Test], Input2[IDs_Test]
Fold_Test_OutPut = Output[IDs_Test]
####################
型号如下:
inputs_1 = keras.Input(shape=(10081,1))
layer1 = Conv1D(64,14)(inputs_1)
layer2 = layers.MaxPool1D(5)(layer1)
layer3 = Conv1D(64, 14)(layer2)
layer4 = layers.GlobalMaxPooling1D()(layer3)
inputs_2 = keras.Input(shape=(85,))
layer5 = layers.concatenate([layer4, inputs_2])
layer6 = Dense(128, activation='relu')(layer5)
layer7 = Dense(2, activation='softmax')(layer6)
model_2 = keras.models.Model(inputs = [inputs_1, inputs_2], output = [layer7])
X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,0:10166], df[['Result_cat','Result_cat1']].values, test_size=0.2)
X_train = X_train.to_numpy()
X_train = X_train.reshape([X_train.shape[0], X_train.shape[1], 1])
X_train_1 = X_train[:,0:10081,:]
X_train_2 = X_train[:,10081:10166,:].reshape(736,85)
X_test = X_test.to_numpy()
X_test = X_test.reshape([X_test.shape[0], X_test.shape[1], 1])
X_test_1 = X_test[:,0:10081,:]
X_test_2 = X_test[:,10081:10166,:].reshape(185,85)
adam = keras.optimizers.Adam(lr = 0.0005)
model_2.compile(loss = 'categorical_crossentropy', optimizer = adam, metrics = ['acc'])
history = model_2.fit([X_train_1,X_train_2], y_train, epochs = 120, batch_size = 256, validation_split = 0.2, callbacks = [keras.callbacks.EarlyStopping(monitor='val_loss', patience=20)])
问题:
1) 数据为921rows x 10166columns
。每行都是一个观察值(前 10080 列是时间序列,其余列是其他统计特征)。根据模型,输入数据是否随机分为inputs_1
和inputs_2
?
2) 我正在考虑进行 kfold 交叉验证并将输入数据拆分为 inputs_1
和 inputs_2
。这样做的好方法是什么?谢谢
只拆分索引。
num_folds = 5
kfold = KFold(n_splits=num_folds, shuffle=False)
ID_Inp = np.array(range(nSamples))
ID_Out = np.array(range(nSamples))
Inputs = [Input1,Input2]
for IDs_Train, IDs_Test in kfold.split(ID_Inp, ID_Out):
Fold_Train_Input1, Fold_Train_Input2 = Input1[IDs_Train], Input2[IDs_Train]
Fold_Train_OutPut = Output[IDs_Train]
Fold_Test_Input1, Fold_Test_Input2 = Input1[IDs_Test], Input2[IDs_Test]
Fold_Test_OutPut = Output[IDs_Test]
####################