当我尝试在keras模型中嵌入序列数据时，如何解决'could not convert string to float:'错误

Question

我想用单个字符或字符串替换序列数据并将其放入我的keras模型中。用字符串替换序列数据是通过包含填充来完成的，如下所示。

我的环境是 Python 3.6.7，Tensorflow 1.12.0，Keras 2.2.4。

数据形状：

x input: (23714, 160), y input: (23714, 7)

一个示例序列：

array(['M', 'A', 'S', 'K', 'R', 'A', 'L', 'V', 'I', 'L', 'A', 'K', 'G',
       'L', 'N', 'G', 'K', 'E', 'V', 'A', 'A', 'Q', 'V', 'K', 'A', 'P',
...
       'L', 'V', 'L', 'K'], dtype='<U1')

我尝试在keras中使用Embedding层，出现如下错误

ValueError: could not convert string to float: 'I'

嵌入层使用如下。

型号：

model = Sequential ()
model.add (InputLayer (input_shape = (160,)))
model.add (Embedding (30000, 160))
model.add (LSTM (160, activation = 'relu'))
model.add (Dense (7, activation = & quot; softmax & quot;))
model.summary ()
model.compile (loss = "categorical_crossentropy", optimizer = "man", metrics = ["accuracy"])
model.fit (x_train, y_train, epochs = 100, batch_size = 100, validation_split = 0.2)
print (model.evaluate (x_test, y_test) [1])

如果我将 30000 的值更改为另一个值，只有 'I' 的错误会更改为 'M' 或其他字符串。

我试过把它放在Dense层而不是用另一种方式使用embedding层，但还是出现了同样的错误。错误：

ValueError: could not convert string to float: 'S'

型号：

model = Sequential ()
model.add (Dense (64, input_shape = (160,), activation = 'relu'))
model.add (Dense (7, activation = & quot; softmax & quot;))
model.summary ()
model.compile (loss = "categorical_crossentropy", optimizer = "man", metrics = ["accuracy"])
model.fit (x_train, y_train, epochs = 100, batch_size = 100, validation_split = 0.2)
print (model.evaluate (x_test, y_test) [1])

总而言之，尝试将填充序列数据插入嵌入层或致密层时会发生错误。序列数据已转换为数组，但未执行整形。

如果我做reshape，我得到以下错误，所以我没有先reshape，但我想解决嵌入问题。 ValueError: can not reshape array of size 3794240 into shape (23714,1)

非常感谢您的帮助。

Answer 1

我可能完全错了...

但是错误

ValueError: could not convert string to float

可能表明您需要数字数据，而不是字符串数据。

因此，将序列编码为数字格式（数字信号处理 (DSP)）

This 论文重点介绍了执行此操作的多种方法...

这篇 table 来自那篇论文：

当我尝试在keras模型中嵌入序列数据时，如何解决'could not convert string to float:'错误

How can I solve the 'could not convert string to float:' error when I try to embed sequence data in a keras model

python

bioinformatics

sequence

keras

tensorflow