如何定义每个 k 折的大小?

How is the size of each k folds defined?

我目前正在使用交叉验证训练我的回归网络,我没有任何标签,但是应该映射到特定输出的特定输入,然后网络应该生成 mapping.I 似乎有一些如何定义折叠的问题。

我做交叉验证的方式是这样的:

############################### Training setup ##################################

#Define 10 folds:
seed = 7
np.random.seed(seed)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
print "Splits"
cvscores_loss = []

for train, test in kfold.split(train_set_data_vstacked_normalized,train_set_output_vstacked):

    print "Model definition!"
    model = Sequential()

    #act = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))

    #act1 = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))

    #act2 = PReLU(init='normal', weights=None)
    model.add(Dense(output_dim=400, input_dim=400, init="normal",activation=K.tanh))

    act4=ELU(10000)
    model.add(Dense(output_dim=13, input_dim=300, init="normal",activation=act4))

    print "Compiling"
    model.compile(loss='mean_squared_error', optimizer='RMSprop',  metrics=["accuracy"])
    print "Compile done! "

    print '\n'

    print "Train start"
    model.fit(train_set_data_vstacked_normalized[train],train_set_output_vstacked[train], nb_epoch=10, verbose=1)

    loss, accuracy = model.evaluate(x=train_set_data_vstacked_normalized[test],y=train_set_output_vstacked[test],verbose=1)
    print
    print('loss: ', loss)
    print('accuracy: ', accuracy)
    print()
    print model.summary()
    print "New Model:"
    cvscores_loss.append(loss)


print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores_loss), numpy.std(cvscores_loss)))

此代码的问题是我从未进入 for 循环。在打印 "Splits" 后收到一条警告消息...它是 .

Splits
/home/k/.local/lib/python2.7/site-packages/sklearn/model_selection/_split.py:579: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=10.

这让人质疑 kfold 如何知道我的神经网络的输入和输出维度是多少?...

我应该在某处定义它吗?或者如何?..

消息告诉你问题所在。您的目标 classes 之一只有 1 个成员。当它分成 10 层时,每个层至少需要 10 个成员 class 这样它就可以在每个层中放 1 个。

您需要检查目标 classes 的计数以找到有问题的 class 并将其删除。

我认为你把这个复杂化了。如果需要对 Keras 模型做交叉验证,可以使用 keras scikit-learn API。为此,您需要:

导入一些东西:

from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score

创建定义模型的函数:

def model_creation():
    model = Sequential()
    model.add(...)
    ...
    model.compile(...)
    return model

并使用包装器:

model = KerasClassifier(build_fn=model_creation, nb_epoch=100, batch_size=100, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
results = cross_val_score(model, X, y, cv=kfold)