训练后的Phrase trigrams gensim模型如何存储
How to store the Phrase trigrams gensim model after training
我想知道在对句子进行训练后是否可以存储 gensim 短语模型
documents = ["the mayor of new york was there", "human computer interaction and
machine learning has now become a trending research area","human computer interaction
is interesting","human computer interaction is a pretty interesting subject", "human
computer interaction is a great and new subject", "machine learning can be useful
sometimes","new york mayor was present", "I love machine learning because it is a new
subject area", "human computer interaction helps people to get user friendly
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")
如何物理存储 trigram_transformer 以便使用 pickle 再次使用它?
预先感谢您的帮助。
将列表或特定格式转换为 numpy 数组并将其保存为 .npy 文件,易于保存和阅读,numpy 使用它可以让您在几乎每个平台上加载它,例如 google colab, replit ..... 参考这个 link 了解更多关于保存 npy 文件的细节 numpy.save()
使用 pickle 也是一个不错的选择,但是当编码标准不同并出现此类问题时,事情会变得有点棘手。
您可以使用 Gensim 的原生 .save()
方法:
trigram_transformer.save(TRIPHRASER_PATH)
...然后类似地重新加载:
reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)
(Gensim save/load 方法通常使用 Python pickling,但可能对某些模型和 version-transitions 特殊处理某些属性。)
您也可以使用 Python 自己的 pickle,它应该可以正常工作 unless/until 您尝试将 too-old 模型加载到更新版本的 Gensim 中,这可能会改变一些事情关于 Phrases
模型。
我想知道在对句子进行训练后是否可以存储 gensim 短语模型
documents = ["the mayor of new york was there", "human computer interaction and
machine learning has now become a trending research area","human computer interaction
is interesting","human computer interaction is a pretty interesting subject", "human
computer interaction is a great and new subject", "machine learning can be useful
sometimes","new york mayor was present", "I love machine learning because it is a new
subject area", "human computer interaction helps people to get user friendly
applications"]
sentences = [doc.split(" ") for doc in documents]
bigram_transformer = Phrases(sentences)
bigram_sentences = bigram_transformer[sentences]
print("Bigrams - done")
# Here we use a phrase model that detects the collocation of 3 words (trigrams).
trigram_transformer = Phrases(bigram_sentences)
trigram_sentences = trigram_transformer[bigram_sentences]
print("Trigrams - done")
如何物理存储 trigram_transformer 以便使用 pickle 再次使用它?
预先感谢您的帮助。
将列表或特定格式转换为 numpy 数组并将其保存为 .npy 文件,易于保存和阅读,numpy 使用它可以让您在几乎每个平台上加载它,例如 google colab, replit ..... 参考这个 link 了解更多关于保存 npy 文件的细节 numpy.save()
使用 pickle 也是一个不错的选择,但是当编码标准不同并出现此类问题时,事情会变得有点棘手。
您可以使用 Gensim 的原生 .save()
方法:
trigram_transformer.save(TRIPHRASER_PATH)
...然后类似地重新加载:
reloads_trigram_transformer = Phrases.load(TRIPHRASER_PATH)
(Gensim save/load 方法通常使用 Python pickling,但可能对某些模型和 version-transitions 特殊处理某些属性。)
您也可以使用 Python 自己的 pickle,它应该可以正常工作 unless/until 您尝试将 too-old 模型加载到更新版本的 Gensim 中,这可能会改变一些事情关于 Phrases
模型。