如何在keras Sequential模型中使用doc2vec的句子向量进行句子情感分析?
How to use sentence vectors from doc2vec in keras Sequntial model for sentence sentiment analysis?
正在创建 doc2vec 模型
x:句子列表(电影评论)
x 的长度 =2000
doc2vec_data = []
for line in x:
temp = ''.join(str(token) for token in line.lower())
doc2vec_data.append(temp)
File = open('doc2vec_data.txt', 'w',encoding="utf-8")
for item in doc2vec_data:
File.write("%s\n" % item)
sentences=gensim.models.doc2vec.TaggedLineDocument("doc2vec_data.txt")
d2v =gensim.models.Doc2Vec(sentences, dm=0,window = 5,
size=5,
iter = 100, workers=32,dbow_words=1,
alpha=2,min_aplha=0.5)
创建 Numpy 向量数组:
因为 doc2vec 模型不能直接用于 keras 顺序模型。
vec=np.array([d2v.infer_vector(item) for item in x])
Keras 顺序模型:
model=Sequential()
model.add(Embedding(2000,128,input_length=vec.shape[1]))
model.add(LSTM(200,dropout=0.2,recurrent_dropout=0.2))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
y:句子标签(0 和 1)
model.fit(vec,y,
batch_size=32,epochs=8,
verbose=1)
上面的代码给我这个错误-
InvalidArgumentError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in
_do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:
~\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in
_run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330
~\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in
__exit__(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it
stays alive
InvalidArgumentError: indices[0,0] = -19 is not in [0, 2000)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT,
validate_indices=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]
(embedding_1/embeddings/read, embedding_1/Cast)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-34-3d0fc0b22a78> in <module>()
1 model.fit(vec,y,
2 batch_size=32,epochs=8,
----> 3 verbose=1)
InvalidArgumentError: indices[0,0] = -19 is not in [0, 2000)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT,
validate_indices=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]
(embedding_1/embeddings/read, embedding_1/Cast)]]
谁能告诉我错误是什么以及我该如何解决?
您已经在将句子转换为向量并使用 Keras 模型重新尝试。它抱怨您的 Embedding 层没有收到正确的索引,因为它已经嵌入。假设你有 vec.shape == (samples, doc2vec_vector_size)
你需要删除 Embedding 因为它已经被嵌入和 LSTM 因为你现在每个句子而不是每个单词有 1 个向量:
model = Sequential()
model.add(Dense(hidden_size, activation='relu', input_dim=doc2vec_vector_size))
model.add(Dense(1, activation='sigmoid'))
正在创建 doc2vec 模型
x:句子列表(电影评论)
x 的长度 =2000
doc2vec_data = []
for line in x:
temp = ''.join(str(token) for token in line.lower())
doc2vec_data.append(temp)
File = open('doc2vec_data.txt', 'w',encoding="utf-8")
for item in doc2vec_data:
File.write("%s\n" % item)
sentences=gensim.models.doc2vec.TaggedLineDocument("doc2vec_data.txt")
d2v =gensim.models.Doc2Vec(sentences, dm=0,window = 5,
size=5,
iter = 100, workers=32,dbow_words=1,
alpha=2,min_aplha=0.5)
创建 Numpy 向量数组: 因为 doc2vec 模型不能直接用于 keras 顺序模型。
vec=np.array([d2v.infer_vector(item) for item in x])
Keras 顺序模型:
model=Sequential()
model.add(Embedding(2000,128,input_length=vec.shape[1]))
model.add(LSTM(200,dropout=0.2,recurrent_dropout=0.2))
model.add(Dense(1,activation='sigmoid'))
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
y:句子标签(0 和 1)
model.fit(vec,y,
batch_size=32,epochs=8,
verbose=1)
上面的代码给我这个错误-
InvalidArgumentError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in
_do_call(self, fn, *args)
1349 try:
-> 1350 return fn(*args)
1351 except errors.OpError as e:
~\Anaconda3\lib\site-packages\tensorflow\python\client\session.py in
_run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
1328 feed_dict, fetch_list, target_list,
-> 1329 status, run_metadata)
1330
~\Anaconda3\lib\site-packages\tensorflow\python\framework\errors_impl.py in
__exit__(self, type_arg, value_arg, traceback_arg)
472 compat.as_text(c_api.TF_Message(self.status.status)),
--> 473 c_api.TF_GetCode(self.status.status))
474 # Delete the underlying status object from memory otherwise it
stays alive
InvalidArgumentError: indices[0,0] = -19 is not in [0, 2000)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT,
validate_indices=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]
(embedding_1/embeddings/read, embedding_1/Cast)]]
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-34-3d0fc0b22a78> in <module>()
1 model.fit(vec,y,
2 batch_size=32,epochs=8,
----> 3 verbose=1)
InvalidArgumentError: indices[0,0] = -19 is not in [0, 2000)
[[Node: embedding_1/Gather = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT,
validate_indices=true,
_device="/job:localhost/replica:0/task:0/device:CPU:0"]
(embedding_1/embeddings/read, embedding_1/Cast)]]
谁能告诉我错误是什么以及我该如何解决?
您已经在将句子转换为向量并使用 Keras 模型重新尝试。它抱怨您的 Embedding 层没有收到正确的索引,因为它已经嵌入。假设你有 vec.shape == (samples, doc2vec_vector_size)
你需要删除 Embedding 因为它已经被嵌入和 LSTM 因为你现在每个句子而不是每个单词有 1 个向量:
model = Sequential()
model.add(Dense(hidden_size, activation='relu', input_dim=doc2vec_vector_size))
model.add(Dense(1, activation='sigmoid'))