使用分类和数值数据集训练模型时出错:无法将 NumPy 数组转换为张量(不支持的对象类型浮点数)

Error while training Model with categorical and numerical dataset: Failed to convert a NumPy array to a Tensor (Unsupported object type float)

目前我正在完成我的最终学位项目,我必须训练一个神经网络来预测个人的 class。该数据集是关于巴塞罗那的事故。因此,我的数据集同时具有分类和数字特征。为了训练神经网络,我构建了一个模型,其中包含每个分类列的嵌入层。然而,当我尝试拟合我的模型时,会出现以下内容。

      1 m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
----> 2 m.fit(dd_normalized, dummy_y)

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float).

我已经研究过了,它似乎并没有解决我的问题。我是神经网络的菜鸟,所以请耐心等待。我的代码如下:

dd = pd.read_csv("C:/Users/Hussnain Shafqat/Desktop/Uni/Q8/TFG/Bases de dades/Modified/2021_Accidents_Final.csv")
dd_features = dd.copy()

Y = dd_features.pop('TipoAcc') #my target variable

# Normalization of Numerical variable
dd_normalized = dd_features.copy()
normalize_var_names = ["Long", "Lat", "NLesLeves", "NLesGraves", "NVictimas", "NVehiculos", "ACarne"] 
for name, column in dd_features.items():
    if name in normalize_var_names:
        print(f"Normalizando {name}")
        dd_normalized[name] = (dd_features[name] - dd_features[name].min()) / (dd_features[name].max() - dd_features[name].min())

dd_normalized = dd_normalized.replace({'VictMortales': {'Si': 1, 'No': 0}})  

#Neural network model creation
def get_model(df):
    names = df.columns
    inputs = []
    outputs = []
    for col in names:
        if col in normalize_var_names:
            inp = layers.Input(shape=(1,), name = col)
            inputs.append(inp)
            outputs.append(inp)
        else:
            num_unique_vals = int(df[col].nunique())
            embedding_size = int(min(np.ceil(num_unique_vals/2), 600))
            inp = layers.Input(shape=(1,), name = col)
            out = layers.Embedding(num_unique_vals + 1, embedding_size, name = col+"_emb")(inp)
            out = layers.Reshape(target_shape = (embedding_size,))(out)
            inputs.append(inp)
            outputs.append(out)
    x = layers.Concatenate()(outputs)
    x = layers.Flatten()(x)
    x = layers.Dense(64, activation ='relu')(x)
    y = layers.Dense(15, activation = 'softmax')(x)
    model = Model(inputs=inputs, outputs = y)
    return model

m = get_model(dd_normalized)

#I convert the target variable using one hot encoding
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
dummy_y = np_utils.to_categorical(encoded_Y)

#Model training
m.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
m.fit(dd_normalized, dummy_y)

我尝试使用 tf.convert_to_tensor 将我的数据集转换为张量,但出现了同样的错误。经过一些研究,我发现当我尝试将分类列和数值列转换为张量时,会出现相同的错误。如果我将该函数应用于分类或数字列,它就可以正常工作。我知道我不能将分类数据提供给神经网络,但是,我认为嵌入层应该足以解决问题。

最后想说的是这个我也试过了,但是不行。知道它可能是什么吗?非常感谢您的宝贵时间,抱歉我的英语不好。

您可以通过将字符串转换为数字或列数(参见解码器)来同时进行类别和数字训练,这只是简单的网络训练。

[样本]:

import tensorflow as tf
import pandas as pd

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Variables
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
text = "I love cats"
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=10000, oov_token='<oov>')
tokenizer.fit_on_texts([text])

vocab = [ "a", "b", "c", "d", "e", "f", "g", "h", "I", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "_" ]
data = tf.constant([["_", "_", "_", "I"], ["l", "o", "v", "e"], ["c", "a", "t", "s"]])

layer = tf.keras.layers.StringLookup(vocabulary=vocab)
sequences_mapping_string = layer(data)
sequences_mapping_string = tf.constant( sequences_mapping_string, shape=(1,12) )
print( 'result: ' + str( sequences_mapping_string ) )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Dataset
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
variables = pd.read_excel('F:\temp\20220305\Book 2.xlsx', index_col=None, header=None)

print(variables)
print(tf.constant(variables).shape)

list_of_X = [ ]
list_of_Y = [ ]

for i in range(tf.constant(variables).numpy().shape[0]):
    for j in range(tf.constant(variables).numpy().shape[1]):
        if variables[j][i] == "X" :
            print('found: ' + str(i) + ":" + str(j))
            list_of_X.append(i)
            list_of_Y.append(1)
        else :
            list_of_X.append(i)
            list_of_Y.append(0)

for i in range( sequences_mapping_string.numpy()[0].shape[0] ):
    list_of_X.append( sequences_mapping_string.numpy()[0][i] )
    list_of_Y.append( sequences_mapping_string.numpy()[0][i] )

list_of_X = tf.cast( list_of_X, dtype=tf.int32 )
list_of_X = tf.constant( list_of_X, shape=( 1, 48, 1) )
list_of_Y = tf.cast( list_of_Y, dtype=tf.int32 )
list_of_Y = tf.constant( list_of_Y, shape=( 1, 48, 1) )

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
    tf.keras.layers.InputLayer(input_shape=(48, 1)),
    
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128, return_sequences=True, return_state=False)),
    tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(128)),
    tf.keras.layers.Dense(1 , activation='sigmoid' ),
])

model.add(tf.keras.layers.Dense(1))
model.summary()
model.compile(loss = 'mean_squared_error',
              optimizer = 'adam',
              metrics = ['mean_squared_error'])
              
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Training
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""             
history = model.fit(list_of_X, list_of_Y, epochs=10, batch_size=4)

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
: Predict
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
result = model.predict( tf.zeros([1, 48, 1]).numpy() )
print( 'result: ' + str(result) )

[输出]: