如何与 python 进行交叉验证?
how made cross-validation with python?
嗨,我制作了一个神经网络,我需要进行交叉验证。
我不知道它是怎么做到的,具体是如何训练或制作出来的。
如果有人知道,请写信或给我一些指示。
这是我的代码:
###Division Train / Test
X = df.drop('Peso secado',axis=1) #Variables de entrada, menos la variable de salida
y = df['Peso secado'] #Variable de salida
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)
###
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train= scaler.fit_transform(X_train)
X_train
X_test = scaler.transform(X_test)
X_test
###Creacion del modelo###
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
model = Sequential()
num_neuronas = 50
model.add(tf.keras.layers.Dense(units=6, activation='sigmoid', input_shape=(6, )))
model.add(Dense(num_neuronas,activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='linear'))
#Buscar mejor funcion de activacion para capa de salida sigmoid? o linear?
model.summary()
model.compile(optimizer='adam',loss='mse')
###Entrenamiento###
model.fit(x = X_train, y = y_train.values,
validation_data=(X_test,y_test.values), batch_size=10, epochs=1000)
losses = pd.DataFrame(model.history.history)
losses
losses.plot()
###Evaluacion###
from sklearn.metrics import mean_squared_error,mean_absolute_error,explained_variance_score,mean_absolute_percentage_error
X_test
predictions = model.predict(X_test)
mean_absolute_error(y_test,predictions)
mean_absolute_percentage_error(y_test,predictions)
mean_squared_error(y_test,predictions)
explained_variance_score(y_test,predictions)
mean_absolute_error(y_test,predictions)/df['Peso secado'].mean()
mean_absolute_error(y_test,predictions)/df['Peso secado'].median()
一些训练或验证的建议会有所帮助
我的第一个观察结果是代码非常丑陋且没有结构。您应该在代码的顶部导入模块
要执行交叉验证,首先从 sklearn 导入模块(以及您需要的所有其他模块)
from sklearn.model_selection import StratifiedKFold
我会把模型定义放在一个单独的函数中:
def get_model():
model = Sequential()
model.add(Dense(4, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
定义您的变量,如果您使用的是 tensorflow/Keras,请执行以下操作:
BATCH_SIZE = 64 # 128
EPOCHS = 100
k = 10
# Use stratified k-fold if the data is imbalanced
kf = StratifiedKFold(n_splits=k, shuffle=False, random_state=None)
# here comes the Cross validation
fold_index = 1
for train_index, test_index in kf.split(X, y):
X_train = X[train_index]
y_train = y[train_index]
X_test = X[test_index]
y_test = y[test_index]
# fit the model on the training set
model = get_model()
model.fit(
X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=0,
validation_data=(X_test, y_test),
)
# predict values
# pred_values = model.predict(X_test)
pred_values_prob = np.array(model(X_test))
注意:使用 tensorflow 时,您需要在循环中每次都定义一个新模型。 sklearn 不是这种情况,因为 sklearn 在调用时以新的初始化权重开始。这里需要单独做。
嗨,我制作了一个神经网络,我需要进行交叉验证。 我不知道它是怎么做到的,具体是如何训练或制作出来的。
如果有人知道,请写信或给我一些指示。
这是我的代码:
###Division Train / Test
X = df.drop('Peso secado',axis=1) #Variables de entrada, menos la variable de salida
y = df['Peso secado'] #Variable de salida
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=101)
###
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
X_train= scaler.fit_transform(X_train)
X_train
X_test = scaler.transform(X_test)
X_test
###Creacion del modelo###
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
model = Sequential()
num_neuronas = 50
model.add(tf.keras.layers.Dense(units=6, activation='sigmoid', input_shape=(6, )))
model.add(Dense(num_neuronas,activation='relu'))
model.add(tf.keras.layers.Dense(units=1, activation='linear'))
#Buscar mejor funcion de activacion para capa de salida sigmoid? o linear?
model.summary()
model.compile(optimizer='adam',loss='mse')
###Entrenamiento###
model.fit(x = X_train, y = y_train.values,
validation_data=(X_test,y_test.values), batch_size=10, epochs=1000)
losses = pd.DataFrame(model.history.history)
losses
losses.plot()
###Evaluacion###
from sklearn.metrics import mean_squared_error,mean_absolute_error,explained_variance_score,mean_absolute_percentage_error
X_test
predictions = model.predict(X_test)
mean_absolute_error(y_test,predictions)
mean_absolute_percentage_error(y_test,predictions)
mean_squared_error(y_test,predictions)
explained_variance_score(y_test,predictions)
mean_absolute_error(y_test,predictions)/df['Peso secado'].mean()
mean_absolute_error(y_test,predictions)/df['Peso secado'].median()
一些训练或验证的建议会有所帮助
我的第一个观察结果是代码非常丑陋且没有结构。您应该在代码的顶部导入模块
要执行交叉验证,首先从 sklearn 导入模块(以及您需要的所有其他模块)
from sklearn.model_selection import StratifiedKFold
我会把模型定义放在一个单独的函数中:
def get_model():
model = Sequential()
model.add(Dense(4, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
return model
定义您的变量,如果您使用的是 tensorflow/Keras,请执行以下操作:
BATCH_SIZE = 64 # 128
EPOCHS = 100
k = 10
# Use stratified k-fold if the data is imbalanced
kf = StratifiedKFold(n_splits=k, shuffle=False, random_state=None)
# here comes the Cross validation
fold_index = 1
for train_index, test_index in kf.split(X, y):
X_train = X[train_index]
y_train = y[train_index]
X_test = X[test_index]
y_test = y[test_index]
# fit the model on the training set
model = get_model()
model.fit(
X_train,
y_train,
batch_size=BATCH_SIZE,
epochs=EPOCHS,
verbose=0,
validation_data=(X_test, y_test),
)
# predict values
# pred_values = model.predict(X_test)
pred_values_prob = np.array(model(X_test))
注意:使用 tensorflow 时,您需要在循环中每次都定义一个新模型。 sklearn 不是这种情况,因为 sklearn 在调用时以新的初始化权重开始。这里需要单独做。