我应该如何在 LSTM 中指定批量大小?
how should I specify batch size in LSTM?
是否可以在 LSTM 中查看此代码?我想用我放在这里的形状来训练数据,但我收到了一个关于我认为是的批次大小的错误。我不知道批次的大小。目前,我选择的批次大小是 64。我应该为批次输入另一个大小,还是错误与批次大小无关?
should I choose for this code: the shape of X (7311, 17, 124) and shape of Y(7311, 1)
InvalidArgumentError: Incompatible shapes: [16] vs. [64]
[[node gradient_tape/binary_crossentropy/weighted_loss/Mul (defined at <ipython-input-74-f95f7e276c58>:1) ]] [Op:__inference_train_function_138498]
df = pd.read_csv("train_data.csv")
timestep = 17 #from 1 to 23 (17 with the current NaN strategy)
threshold_for_classification = -8
X_scaler = MinMaxScaler()
y_scaler = MinMaxScaler()
fill_X = -0.01
seed = 11
#RNN hiperparameter
epochs = 75
batch = 64
val_split = 0.25
test_split = 0.25
lr = 0.0001
adam = optimizers.Nadam() #(lr)
class_weight = {True: 5.,
False: 1.}
verbose = 1
#Dropping first the empty column and then rows with NaNs
df = df.drop("c_rcs_estimate", axis=1)
df = df.dropna(how='any')
#Filtering events with len=1 or min_tca > 2 or max_tca < 2
def conditions(event):
x = event["time_to_tca"].values
return ((x.min()<2.0) & (x.max()>2.0) & (x.shape[0]>1))
df = df.groupby('event_id').filter(conditions)
#OHE for c_object_type (5 categories) -> 5 new features
df["mission_id"] = df["mission_id"].astype('category')
df["c_object_type"] = df["c_object_type"].astype('category')
df = pd.get_dummies(df)
#Getting y as 1D-array
y = df.groupby(["event_id"])["risk"].apply(lambda x: x.iloc[-1]).values.reshape(-1, 1)
#Scaling y
_ = y_scaler.fit(df["risk"].values.reshape(-1, 1)) #using the whole risk feature to scale the target 'y'
y = y_scaler.transform(y)
#Getting X as df (dropping rows with tca < 2)
df = df.loc[df["time_to_tca"]>2]
#Adding feature 'event_length' for counting how many instances each event has
df["event_length"] = df.groupby('event_id')['event_id'].transform(lambda x: x.value_counts().idxmax())
#Scaling X
df = pd.DataFrame(X_scaler.fit_transform(df), columns=df.columns)
#Transforming X into a 3D-array
events = df["event_id"].nunique() #rows
features = len(df.columns) #columns
X = np.zeros((events,timestep,features))
X.fill(fill_X)
i = 0
def df_to_3darray(event):
global X, i
#Transforming an event to time series (1,timesteps, columns)
row = event.values.reshape(1,event.shape[0],event.shape[1])
#Condition is needed to slice arrays correctly
#Condition -> is timestep greater than the event's time series length?
if(timestep>=row.shape[1]):
X[i:i+1,-row.shape[1]:,:] = row
else:
X[i:i+1,:,:] = row[:,-timestep:,:]
#index to iterate over X array
i = i + 1
#dataframe remains intact, while X array has been filled.
return event
df.groupby("event_id").apply(df_to_3darray)
#Dropping event_id to remove noise
X = X[:,:,1:]
#TODO: Padding with specific values column-wise instead of zeros.
#TODO: Separating time dependent and independent feature in 2 X arrays
print(X.shape, y.shape)
#computing scaled threshold
th = np.array([threshold_for_classification]).reshape(-1,1)
th = y_scaler.transform(th)
threshold_scaled = th[0,0]
#Splitting arrays
y_boolean = (y > threshold_scaled).reshape(-1,1)
X_train, X_test, y_train_numeric, y_test_numeric = train_test_split(X, y,
stratify=y_boolean,
shuffle=True,
random_state=seed,
test_size = test_split
)
y_train_boolean = (y_train_numeric > threshold_scaled).reshape(-1,1)
X_train, X_val, y_train_numeric, y_val_numeric = train_test_split(X_train, y_train_numeric,
stratify=y_train_boolean,
shuffle=True,
random_state=seed,
test_size = val_split
)
#transforming it into a classification task -> y_train, y_test boolean
y_train = (y_train_numeric > threshold_scaled).reshape(-1,1)
y_val = (y_val_numeric > threshold_scaled).reshape(-1,1)
y_test = (y_test_numeric > threshold_scaled).reshape(-1,1)
X_train = tf.convert_to_tensor(X_train,dtype=tf.int64)
X_test = tf.convert_to_tensor( X_test,dtype=tf.int64)
y_train_numeric = tf.convert_to_tensor(y_train_numeric,dtype=tf.int64)
y_test_numeric = tf.convert_to_tensor(y_test_numeric,dtype=tf.int64)
y_train_boolean = tf.convert_to_tensor(y_train_boolean,dtype=tf.int64)
X_val = tf.convert_to_tensor(X_val,dtype=tf.int64)
y_val_numeric = tf.convert_to_tensor(y_val_numeric,dtype=tf.int64)
y_train = tf.convert_to_tensor(y_train,dtype=tf.int64)
y_val = tf.convert_to_tensor(y_val,dtype=tf.int64)
y_test = tf.convert_to_tensor(y_test,dtype=tf.int64)
y_boolean = tf.convert_to_tensor(y_boolean,dtype=tf.int64)
#Percentage of high risks in train
print("TRAIN {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_train), y_train.shape[0], np.sum(y_train)/y_train.shape[0]))
#Percentage of high risks in val
print("VAL {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_val), y_val.shape[0], np.sum(y_val)/y_val.shape[0]))
#Percentage of high risks in test
print("TEST {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_test), y_test.shape[0], np.sum(y_test)/y_test.shape[0]))
# Model activation selu
input_tensor = Input(batch_shape=(batch, timestep, X_train.shape[2]))
rnn_1 = LSTM(32, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=True, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(input_tensor)
batch_1 = BatchNormalization()(rnn_1)
rnn_2 = LSTM(16, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=True, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(batch_1)
batch_2 = BatchNormalization()(rnn_2)
rnn_3 = LSTM(8, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=False, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(batch_2)
batch_3 = BatchNormalization()(rnn_3)
output_tensor = Dense(units = 1, activation='sigmoid')(batch_3)
model = Model(inputs=input_tensor,
outputs= output_tensor)
model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['accuracy'])
model.summary()
model_history = model.fit(X_train, y_train,
epochs=epochs,
batch_size=batch,
#shuffle=True, #OJO
validation_data=(X_val, y_val),
verbose=verbose,
class_weight=class_weight
).history
我建议更改此行
input_tensor = Input(batch_shape=(batch, timestep, X_train.shape[2]))
至
input_tensor = tf.keras.layers.Input(shape=(timestep, X_train.shape[2]))
然后在 model.fit
中定义 batch_size
并确保 X_train
和 y_train
具有相同数量的样本。
是否可以在 LSTM 中查看此代码?我想用我放在这里的形状来训练数据,但我收到了一个关于我认为是的批次大小的错误。我不知道批次的大小。目前,我选择的批次大小是 64。我应该为批次输入另一个大小,还是错误与批次大小无关?
should I choose for this code: the shape of X (7311, 17, 124) and shape of Y(7311, 1)
InvalidArgumentError: Incompatible shapes: [16] vs. [64]
[[node gradient_tape/binary_crossentropy/weighted_loss/Mul (defined at <ipython-input-74-f95f7e276c58>:1) ]] [Op:__inference_train_function_138498]
df = pd.read_csv("train_data.csv")
timestep = 17 #from 1 to 23 (17 with the current NaN strategy)
threshold_for_classification = -8
X_scaler = MinMaxScaler()
y_scaler = MinMaxScaler()
fill_X = -0.01
seed = 11
#RNN hiperparameter
epochs = 75
batch = 64
val_split = 0.25
test_split = 0.25
lr = 0.0001
adam = optimizers.Nadam() #(lr)
class_weight = {True: 5.,
False: 1.}
verbose = 1
#Dropping first the empty column and then rows with NaNs
df = df.drop("c_rcs_estimate", axis=1)
df = df.dropna(how='any')
#Filtering events with len=1 or min_tca > 2 or max_tca < 2
def conditions(event):
x = event["time_to_tca"].values
return ((x.min()<2.0) & (x.max()>2.0) & (x.shape[0]>1))
df = df.groupby('event_id').filter(conditions)
#OHE for c_object_type (5 categories) -> 5 new features
df["mission_id"] = df["mission_id"].astype('category')
df["c_object_type"] = df["c_object_type"].astype('category')
df = pd.get_dummies(df)
#Getting y as 1D-array
y = df.groupby(["event_id"])["risk"].apply(lambda x: x.iloc[-1]).values.reshape(-1, 1)
#Scaling y
_ = y_scaler.fit(df["risk"].values.reshape(-1, 1)) #using the whole risk feature to scale the target 'y'
y = y_scaler.transform(y)
#Getting X as df (dropping rows with tca < 2)
df = df.loc[df["time_to_tca"]>2]
#Adding feature 'event_length' for counting how many instances each event has
df["event_length"] = df.groupby('event_id')['event_id'].transform(lambda x: x.value_counts().idxmax())
#Scaling X
df = pd.DataFrame(X_scaler.fit_transform(df), columns=df.columns)
#Transforming X into a 3D-array
events = df["event_id"].nunique() #rows
features = len(df.columns) #columns
X = np.zeros((events,timestep,features))
X.fill(fill_X)
i = 0
def df_to_3darray(event):
global X, i
#Transforming an event to time series (1,timesteps, columns)
row = event.values.reshape(1,event.shape[0],event.shape[1])
#Condition is needed to slice arrays correctly
#Condition -> is timestep greater than the event's time series length?
if(timestep>=row.shape[1]):
X[i:i+1,-row.shape[1]:,:] = row
else:
X[i:i+1,:,:] = row[:,-timestep:,:]
#index to iterate over X array
i = i + 1
#dataframe remains intact, while X array has been filled.
return event
df.groupby("event_id").apply(df_to_3darray)
#Dropping event_id to remove noise
X = X[:,:,1:]
#TODO: Padding with specific values column-wise instead of zeros.
#TODO: Separating time dependent and independent feature in 2 X arrays
print(X.shape, y.shape)
#computing scaled threshold
th = np.array([threshold_for_classification]).reshape(-1,1)
th = y_scaler.transform(th)
threshold_scaled = th[0,0]
#Splitting arrays
y_boolean = (y > threshold_scaled).reshape(-1,1)
X_train, X_test, y_train_numeric, y_test_numeric = train_test_split(X, y,
stratify=y_boolean,
shuffle=True,
random_state=seed,
test_size = test_split
)
y_train_boolean = (y_train_numeric > threshold_scaled).reshape(-1,1)
X_train, X_val, y_train_numeric, y_val_numeric = train_test_split(X_train, y_train_numeric,
stratify=y_train_boolean,
shuffle=True,
random_state=seed,
test_size = val_split
)
#transforming it into a classification task -> y_train, y_test boolean
y_train = (y_train_numeric > threshold_scaled).reshape(-1,1)
y_val = (y_val_numeric > threshold_scaled).reshape(-1,1)
y_test = (y_test_numeric > threshold_scaled).reshape(-1,1)
X_train = tf.convert_to_tensor(X_train,dtype=tf.int64)
X_test = tf.convert_to_tensor( X_test,dtype=tf.int64)
y_train_numeric = tf.convert_to_tensor(y_train_numeric,dtype=tf.int64)
y_test_numeric = tf.convert_to_tensor(y_test_numeric,dtype=tf.int64)
y_train_boolean = tf.convert_to_tensor(y_train_boolean,dtype=tf.int64)
X_val = tf.convert_to_tensor(X_val,dtype=tf.int64)
y_val_numeric = tf.convert_to_tensor(y_val_numeric,dtype=tf.int64)
y_train = tf.convert_to_tensor(y_train,dtype=tf.int64)
y_val = tf.convert_to_tensor(y_val,dtype=tf.int64)
y_test = tf.convert_to_tensor(y_test,dtype=tf.int64)
y_boolean = tf.convert_to_tensor(y_boolean,dtype=tf.int64)
#Percentage of high risks in train
print("TRAIN {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_train), y_train.shape[0], np.sum(y_train)/y_train.shape[0]))
#Percentage of high risks in val
print("VAL {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_val), y_val.shape[0], np.sum(y_val)/y_val.shape[0]))
#Percentage of high risks in test
print("TEST {:0.1f}, {:0.1f}, {:0.3f}".format(np.sum(y_test), y_test.shape[0], np.sum(y_test)/y_test.shape[0]))
# Model activation selu
input_tensor = Input(batch_shape=(batch, timestep, X_train.shape[2]))
rnn_1 = LSTM(32, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=True, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(input_tensor)
batch_1 = BatchNormalization()(rnn_1)
rnn_2 = LSTM(16, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=True, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(batch_1)
batch_2 = BatchNormalization()(rnn_2)
rnn_3 = LSTM(8, stateful=False, dropout=0.15, recurrent_dropout=0.3, return_sequences=False, kernel_regularizer=L1L2(l1=0.0, l2=0.01))(batch_2)
batch_3 = BatchNormalization()(rnn_3)
output_tensor = Dense(units = 1, activation='sigmoid')(batch_3)
model = Model(inputs=input_tensor,
outputs= output_tensor)
model.compile(loss='binary_crossentropy',
optimizer=adam,
metrics=['accuracy'])
model.summary()
model_history = model.fit(X_train, y_train,
epochs=epochs,
batch_size=batch,
#shuffle=True, #OJO
validation_data=(X_val, y_val),
verbose=verbose,
class_weight=class_weight
).history
我建议更改此行
input_tensor = Input(batch_shape=(batch, timestep, X_train.shape[2]))
至
input_tensor = tf.keras.layers.Input(shape=(timestep, X_train.shape[2]))
然后在 model.fit
中定义 batch_size
并确保 X_train
和 y_train
具有相同数量的样本。