使用分类和数值数据集训练模型时出错:无法将 NumPy 数组转换为张量(不支持的对象类型浮点数)
Error while training Model with categorical and numerical dataset: Failed to convert a NumPy array to a Tensor (Unsupported object type float)
目前我正在完成我的最终学位项目,我必须训练一个神经网络来预测个人的 class。该数据集是关于巴塞罗那的事故。因此,我的数据集同时具有分类和数字特征。为了训练神经网络,我构建了一个模型,其中包含每个分类列的嵌入层。然而,当我尝试拟合我的模型时,会出现以下内容。
1 m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
----> 2 m.fit(dd_normalized, dummy_y)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
我已经研究过了,它似乎并没有解决我的问题。我是神经网络的菜鸟,所以请耐心等待。我的代码如下:
dd = pd.read_csv("C:/Users/Hussnain Shafqat/Desktop/Uni/Q8/TFG/Bases de dades/Modified/2021_Accidents_Final.csv")
dd_features = dd.copy()
Y = dd_features.pop('TipoAcc') #my target variable
# Normalization of Numerical variable
dd_normalized = dd_features.copy()
normalize_var_names = ["Long", "Lat", "NLesLeves", "NLesGraves", "NVictimas", "NVehiculos", "ACarne"]
for name, column in dd_features.items():
if name in normalize_var_names:
print(f"Normalizando {name}")
dd_normalized[name] = (dd_features[name] - dd_features[name].min()) / (dd_features[name].max() - dd_features[name].min())
dd_normalized = dd_normalized.replace({'VictMortales': {'Si': 1, 'No': 0}})
#Neural network model creation
def get_model(df):
names = df.columns
inputs = []
outputs = []
for col in names:
if col in normalize_var_names:
inp = layers.Input(shape=(1,), name = col)
inputs.append(inp)
outputs.append(inp)
else:
num_unique_vals = int(df[col].nunique())
embedding_size = int(min(np.ceil(num_unique_vals/2), 600))
inp = layers.Input(shape=(1,), name = col)
out = layers.Embedding(num_unique_vals + 1, embedding_size, name = col+"_emb")(inp)
out = layers.Reshape(target_shape = (embedding_size,))(out)
inputs.append(inp)
outputs.append(out)
x = layers.Concatenate()(outputs)
x = layers.Flatten()(x)
x = layers.Dense(64, activation ='relu')(x)
y = layers.Dense(15, activation = 'softmax')(x)
model = Model(inputs=inputs, outputs = y)
return model
m = get_model(dd_normalized)
#I convert the target variable using one hot encoding
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
dummy_y = np_utils.to_categorical(encoded_Y)
#Model training
m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
m.fit(dd_normalized, dummy_y)
我尝试使用 tf.convert_to_tensor 将我的数据集转换为张量,但出现了同样的错误。经过一些研究,我发现当我尝试将分类列和数值列转换为张量时,会出现相同的错误。如果我将该函数应用于分类或数字列,它就可以正常工作。我知道我不能将分类数据提供给神经网络,但是,我认为嵌入层应该足以解决问题。
最后想说的是这个我也试过了,但是不行。知道它可能是什么吗?非常感谢您的宝贵时间,抱歉我的英语不好。
您可以通过将字符串转换为数字或列数(参见解码器)来同时进行类别和数字训练,这只是简单的网络训练。
[样本]:
import tensorflow as tf
import pandas as pd
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
text = "I love cats"
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=10000, oov_token='<oov>')
tokenizer.fit_on_texts([text])
vocab = [ "a", "b", "c", "d", "e", "f", "g", "h", "I", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "_" ]
data = tf.constant([["_", "_", "_", "I"], ["l", "o", "v", "e"], ["c", "a", "t", "s"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
sequences_mapping_string = layer(data)
sequences_mapping_string = tf.constant( sequences_mapping_string, shape=(1,12) )
print( 'result: ' + str( sequences_mapping_string ) )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
variables = pd.read_excel('F:\temp\20220305\Book 2.xlsx', index_col=None, header=None)
print(variables)
print(tf.constant(variables).shape)
list_of_X = [ ]
list_of_Y = [ ]
for i in range(tf.constant(variables).numpy().shape[0]):
for j in range(tf.constant(variables).numpy().shape[1]):
if variables[j][i] == "X" :
print('found: ' + str(i) + ":" + str(j))
list_of_X.append(i)
list_of_Y.append(1)
else :
list_of_X.append(i)
list_of_Y.append(0)
for i in range( sequences_mapping_string.numpy()[0].shape[0] ):
list_of_X.append( sequences_mapping_string.numpy()[0][i] )
list_of_Y.append( sequences_mapping_string.numpy()[0][i] )
list_of_X = tf.cast( list_of_X, dtype=tf.int32 )
list_of_X = tf.constant( list_of_X, shape=( 1, 48, 1) )
list_of_Y = tf.cast( list_of_Y, dtype=tf.int32 )
list_of_Y = tf.constant( list_of_Y, shape=( 1, 48, 1) )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(48, 1)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
tf.keras.layers.Dense(1 , activation='sigmoid' ),
])
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(loss = 'mean_squared_error',
optimizer = 'adam',
metrics = ['mean_squared_error'])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model.fit(list_of_X, list_of_Y, epochs=10, batch_size=4)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Predict
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
result = model.predict( tf.zeros([1, 48, 1]).numpy() )
print( 'result: ' + str(result) )
[输出]:
目前我正在完成我的最终学位项目,我必须训练一个神经网络来预测个人的 class。该数据集是关于巴塞罗那的事故。因此,我的数据集同时具有分类和数字特征。为了训练神经网络,我构建了一个模型,其中包含每个分类列的嵌入层。然而,当我尝试拟合我的模型时,会出现以下内容。
1 m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
----> 2 m.fit(dd_normalized, dummy_y)
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).
我已经研究过了,它似乎并没有解决我的问题。我是神经网络的菜鸟,所以请耐心等待。我的代码如下:
dd = pd.read_csv("C:/Users/Hussnain Shafqat/Desktop/Uni/Q8/TFG/Bases de dades/Modified/2021_Accidents_Final.csv")
dd_features = dd.copy()
Y = dd_features.pop('TipoAcc') #my target variable
# Normalization of Numerical variable
dd_normalized = dd_features.copy()
normalize_var_names = ["Long", "Lat", "NLesLeves", "NLesGraves", "NVictimas", "NVehiculos", "ACarne"]
for name, column in dd_features.items():
if name in normalize_var_names:
print(f"Normalizando {name}")
dd_normalized[name] = (dd_features[name] - dd_features[name].min()) / (dd_features[name].max() - dd_features[name].min())
dd_normalized = dd_normalized.replace({'VictMortales': {'Si': 1, 'No': 0}})
#Neural network model creation
def get_model(df):
names = df.columns
inputs = []
outputs = []
for col in names:
if col in normalize_var_names:
inp = layers.Input(shape=(1,), name = col)
inputs.append(inp)
outputs.append(inp)
else:
num_unique_vals = int(df[col].nunique())
embedding_size = int(min(np.ceil(num_unique_vals/2), 600))
inp = layers.Input(shape=(1,), name = col)
out = layers.Embedding(num_unique_vals + 1, embedding_size, name = col+"_emb")(inp)
out = layers.Reshape(target_shape = (embedding_size,))(out)
inputs.append(inp)
outputs.append(out)
x = layers.Concatenate()(outputs)
x = layers.Flatten()(x)
x = layers.Dense(64, activation ='relu')(x)
y = layers.Dense(15, activation = 'softmax')(x)
model = Model(inputs=inputs, outputs = y)
return model
m = get_model(dd_normalized)
#I convert the target variable using one hot encoding
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
dummy_y = np_utils.to_categorical(encoded_Y)
#Model training
m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
m.fit(dd_normalized, dummy_y)
我尝试使用 tf.convert_to_tensor 将我的数据集转换为张量,但出现了同样的错误。经过一些研究,我发现当我尝试将分类列和数值列转换为张量时,会出现相同的错误。如果我将该函数应用于分类或数字列,它就可以正常工作。我知道我不能将分类数据提供给神经网络,但是,我认为嵌入层应该足以解决问题。
最后想说的是这个
您可以通过将字符串转换为数字或列数(参见解码器)来同时进行类别和数字训练,这只是简单的网络训练。
[样本]:
import tensorflow as tf
import pandas as pd
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
text = "I love cats"
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=10000, oov_token='<oov>')
tokenizer.fit_on_texts([text])
vocab = [ "a", "b", "c", "d", "e", "f", "g", "h", "I", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "_" ]
data = tf.constant([["_", "_", "_", "I"], ["l", "o", "v", "e"], ["c", "a", "t", "s"]])
layer = tf.keras.layers.StringLookup(vocabulary=vocab)
sequences_mapping_string = layer(data)
sequences_mapping_string = tf.constant( sequences_mapping_string, shape=(1,12) )
print( 'result: ' + str( sequences_mapping_string ) )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
variables = pd.read_excel('F:\temp\20220305\Book 2.xlsx', index_col=None, header=None)
print(variables)
print(tf.constant(variables).shape)
list_of_X = [ ]
list_of_Y = [ ]
for i in range(tf.constant(variables).numpy().shape[0]):
for j in range(tf.constant(variables).numpy().shape[1]):
if variables[j][i] == "X" :
print('found: ' + str(i) + ":" + str(j))
list_of_X.append(i)
list_of_Y.append(1)
else :
list_of_X.append(i)
list_of_Y.append(0)
for i in range( sequences_mapping_string.numpy()[0].shape[0] ):
list_of_X.append( sequences_mapping_string.numpy()[0][i] )
list_of_Y.append( sequences_mapping_string.numpy()[0][i] )
list_of_X = tf.cast( list_of_X, dtype=tf.int32 )
list_of_X = tf.constant( list_of_X, shape=( 1, 48, 1) )
list_of_Y = tf.cast( list_of_Y, dtype=tf.int32 )
list_of_Y = tf.constant( list_of_Y, shape=( 1, 48, 1) )
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
tf.keras.layers.InputLayer(input_shape=(48, 1)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
tf.keras.layers.Dense(1 , activation='sigmoid' ),
])
model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(loss = 'mean_squared_error',
optimizer = 'adam',
metrics = ['mean_squared_error'])
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
history = model.fit(list_of_X, list_of_Y, epochs=10, batch_size=4)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Predict
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
result = model.predict( tf.zeros([1, 48, 1]).numpy() )
print( 'result: ' + str(result) )
[输出]: