如何与 python 进行交叉验证?

how made cross-validation with python?

嗨,我制作了一个神经网络,我需要进行交叉验证。 我不知道它是怎么做到的,具体是如何训练或制作出来的。

如果有人知道,请写信或给我一些指示。

这是我的代码:

###Division Train / Test
X = df.drop('Peso secado',axis=1)  #Variables de entrada, menos la variable de salida
y = df['Peso secado']              #Variable de salida

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)

###

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train= scaler.fit_transform(X_train)
X_train
X_test = scaler.transform(X_test)
X_test



###Creacion del modelo###
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
import tensorflow as tf

model = Sequential()
num_neuronas = 50
model.add(tf.keras.layers.Dense(units=6, activation='sigmoid', input_shape=(6, )))
model.add(Dense(num_neuronas,activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='linear')) 

#Buscar mejor funcion de activacion para capa de salida sigmoid? o linear?
model.summary()
model.compile(optimizer='adam',loss='mse')

###Entrenamiento###
model.fit(x = X_train, y = y_train.values,
          validation_data=(X_test,y_test.values), batch_size=10, epochs=1000) 

losses = pd.DataFrame(model.history.history)  
losses
losses.plot()
   
###Evaluacion###
from sklearn.metrics import mean_squared_error,mean_absolute_error,explained_variance_score,mean_absolute_percentage_error
X_test
predictions = model.predict(X_test)
mean_absolute_error(y_test,predictions)
mean_absolute_percentage_error(y_test,predictions)

mean_squared_error(y_test,predictions)
explained_variance_score(y_test,predictions)  

mean_absolute_error(y_test,predictions)/df['Peso secado'].mean() 
mean_absolute_error(y_test,predictions)/df['Peso secado'].median()

一些训练或验证的建议会有所帮助

我的第一个观察结果是代码非常丑陋且没有结构。您应该在代码的顶部导入模块

要执行交叉验证,首先从 sklearn 导入模块(以及您需要的所有其他模块)

from sklearn.model_selection import StratifiedKFold

我会把模型定义放在一个单独的函数中:

def get_model():
  model = Sequential()
  model.add(Dense(4, input_dim=8, activation='relu'))
  model.add(Dense(1, activation='sigmoid'))
  model.compile(loss='binary_crossentropy', optimizer='adam')
  return model

定义您的变量,如果您使用的是 tensorflow/Keras,请执行以下操作:

BATCH_SIZE = 64  #  128
EPOCHS = 100

k = 10
# Use stratified k-fold if the data is imbalanced
kf = StratifiedKFold(n_splits=k, shuffle=False, random_state=None)

# here comes the Cross validation
fold_index = 1
for train_index, test_index in kf.split(X, y):
            X_train = X[train_index]
            y_train = y[train_index]

            X_test = X[test_index]
            y_test = y[test_index]

            # fit the model on the training set
            model = get_model()

            model.fit(
                X_train,
                y_train,
                batch_size=BATCH_SIZE,
                epochs=EPOCHS,
                verbose=0,
                validation_data=(X_test, y_test),
            )

            # predict values
            # pred_values = model.predict(X_test)
            pred_values_prob = np.array(model(X_test))

注意:使用 tensorflow 时,您需要在循环中每次都定义一个新模型。 sklearn 不是这种情况,因为 sklearn 在调用时以新的初始化权重开始。这里需要单独做。