spacy nightly (3.0.0rc) load without vocab 如何添加 word2vec vectorspace?
spacy nightly (3.0.0rc) load without vocab how to add word2vec vectorspace?
在 spacy 2 中,我使用它向带有向量空间的空 spacy 模型添加一个词汇(spacy init)
:
nlp3=spacy.load('nl_core_news_sm') #standard model without vectors
spacy.load("spacyinitnlmodelwithvectorspace",vocab=nlp3.vocab)
在 spacy nightly 版本 3.0.0rc 中,vocab 参数不再在 spacy.load 中。有人建议我如何将 vocab 添加到 spacy 模型吗?
这有效,来自
将 vecfile 添加到 spacy 模型。仅在小型数据集上测试
从未来导入unicode_literals
导入 numpy
导入 spacy
def spacy_load_vec(spacy_model,vec_file,spacy_vec_model,print_words=假):
"""
spacy 模型 zonder vectoren + vecfile wordt spacy 模型 met vectorspace
Parameters
----------
spacy_model : TYPE
spacy model zonder vectorspace.
vec_file : TYPE
vecfile met fasttext of w2v getrainde vectoren.
spacy_vec_model : TYPE
spacy model met vectorspace.
print_words : TYPE, optional
woorden printen True/false. The default is False.
Returns
-------
None.
"""
nlp = spacy.load(spacy_model)
with open(vec_file, 'rb') as file_:
header = file_.readline()
nr_row, nr_dim = header.split()
nlp.vocab.reset_vectors(width=int(nr_dim))
count = 0
for line in file_:
count += 1
line = line.rstrip().decode('utf8')
pieces = line.rsplit(' ', int(nr_dim))
word = pieces[0]
if print_words:
print("{} - {}".format(count, word))
vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
nlp.vocab.set_vector(word, vector) # add the vectors to the vocab
nlp.to_disk(spacy_vec_model)
在 spacy 2 中,我使用它向带有向量空间的空 spacy 模型添加一个词汇(spacy init) :
nlp3=spacy.load('nl_core_news_sm') #standard model without vectors
spacy.load("spacyinitnlmodelwithvectorspace",vocab=nlp3.vocab)
在 spacy nightly 版本 3.0.0rc 中,vocab 参数不再在 spacy.load 中。有人建议我如何将 vocab 添加到 spacy 模型吗?
这有效,来自
从未来导入unicode_literals
导入 numpy 导入 spacy
def spacy_load_vec(spacy_model,vec_file,spacy_vec_model,print_words=假):
"""
spacy 模型 zonder vectoren + vecfile wordt spacy 模型 met vectorspace
Parameters
----------
spacy_model : TYPE
spacy model zonder vectorspace.
vec_file : TYPE
vecfile met fasttext of w2v getrainde vectoren.
spacy_vec_model : TYPE
spacy model met vectorspace.
print_words : TYPE, optional
woorden printen True/false. The default is False.
Returns
-------
None.
"""
nlp = spacy.load(spacy_model)
with open(vec_file, 'rb') as file_:
header = file_.readline()
nr_row, nr_dim = header.split()
nlp.vocab.reset_vectors(width=int(nr_dim))
count = 0
for line in file_:
count += 1
line = line.rstrip().decode('utf8')
pieces = line.rsplit(' ', int(nr_dim))
word = pieces[0]
if print_words:
print("{} - {}".format(count, word))
vector = numpy.asarray([float(v) for v in pieces[1:]], dtype='f')
nlp.vocab.set_vector(word, vector) # add the vectors to the vocab
nlp.to_disk(spacy_vec_model)