texts_to_sequences return 加载分词器后为空字符串
texts_to_sequences return empty string after loading the tokenizer
我正在做一个项目,我已经训练并保存了我的模型和分词器
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
# save tokenizer
import pickle
# saving
with open('english_tokenizer_test.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
total_words = len(tokenizer.word_index) +1
from keras.models import load_model
model2.save('model.h5')
并加载模型和分词器
with open('../input/shyaridatasettesting/english_tokenizer_test.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
tokenizer = Tokenizer()
max_sequence_len = 24
model = tf.keras.models.load_model('../input/model.h5')
print(model);
所以当我使用加载的分词器做分词器时,它returns一个空字符串
token_list = tokenizer.texts_to_sequences(["this is something"])[0]
我想在我的网站上使用我的模型和分词器,但是每当我将文本传入我的分词器时,我得到一个空的 token_list?
如果我做错了什么,请帮助我。
问题是您在加载原始分词器后创建了一个同名的新 Tokenizer
,因此它被覆盖了。这是一个工作示例:
import tensorflow as tf
import pickle
corpus = ['this is something', 'this is something more', 'this is nothing']
tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(corpus)
### Save tokenizer
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
### Load tokenizer
with open('/content/tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
token_list = tokenizer.texts_to_sequences(["this is something"])[0]
print(token_list)
[1, 2, 3]
我正在做一个项目,我已经训练并保存了我的模型和分词器
tokenizer = Tokenizer()
tokenizer.fit_on_texts(corpus)
# save tokenizer
import pickle
# saving
with open('english_tokenizer_test.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
total_words = len(tokenizer.word_index) +1
from keras.models import load_model
model2.save('model.h5')
并加载模型和分词器
with open('../input/shyaridatasettesting/english_tokenizer_test.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
tokenizer = Tokenizer()
max_sequence_len = 24
model = tf.keras.models.load_model('../input/model.h5')
print(model);
所以当我使用加载的分词器做分词器时,它returns一个空字符串
token_list = tokenizer.texts_to_sequences(["this is something"])[0]
我想在我的网站上使用我的模型和分词器,但是每当我将文本传入我的分词器时,我得到一个空的 token_list?
如果我做错了什么,请帮助我。
问题是您在加载原始分词器后创建了一个同名的新 Tokenizer
,因此它被覆盖了。这是一个工作示例:
import tensorflow as tf
import pickle
corpus = ['this is something', 'this is something more', 'this is nothing']
tokenizer = tf.keras.preprocessing.text.Tokenizer()
tokenizer.fit_on_texts(corpus)
### Save tokenizer
with open('tokenizer.pickle', 'wb') as handle:
pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
### Load tokenizer
with open('/content/tokenizer.pickle', 'rb') as handle:
tokenizer = pickle.load(handle)
token_list = tokenizer.texts_to_sequences(["this is something"])[0]
print(token_list)
[1, 2, 3]