gensim word2vec打印日志丢失

gensim word2vec print log loss

使用 gensim word2vec 模型时,如何打印记录(文件或 stout)训练阶段每个时期的损失。

我试过了:

 logging.basicConfig(format='%(asctime)s: %(levelname)s: %(message)s')
 logging.root.setLevel(level=logging.INFO)

但是我没有看到有任何损失打印。

您可以使用方法get_latest_training_loss()获取word2vec模型的最新训练损失。如果你想在每个纪元之后打印损失,你可以添加一个回调来执行此操作。例如:

from gensim.test.utils import common_texts, get_tmpfile
from gensim.models import Word2Vec
from gensim.models.callbacks import CallbackAny2Vec

class callback(CallbackAny2Vec):
    '''Callback to print loss after each epoch.'''

    def __init__(self):
        self.epoch = 0

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        print('Loss after epoch {}: {}'.format(self.epoch, loss))
        self.epoch += 1

model = Word2Vec(common_texts, size=100, window=5, min_count=1, 
                 compute_loss=True, callbacks=[callback()])

然而,损失是以累积的方式计算的(即在每个时期之后打印的损失是迄今为止所有时期的总损失)。有关更多说明,请参阅

这可以打印每个时期的损失。谢谢@Anna Krogager

from gensim.models import Word2Vec
from gensim.models.word2vec import LineSentence
from gensim.models.callbacks import CallbackAny2Vec

class callback(CallbackAny2Vec):
    '''Callback to print loss after each epoch.'''

    def __init__(self):
        self.epoch = 0
        self.loss_to_be_subed = 0

    def on_epoch_end(self, model):
        loss = model.get_latest_training_loss()
        loss_now = loss - self.loss_to_be_subed
        self.loss_to_be_subed = loss
        print('Loss after epoch {}: {}'.format(self.epoch, loss_now))
        self.epoch += 1

model = Word2Vec(LineSentence('./data/house_list'), size=100, workers=20, \
                    min_count=1, iter=30, window=5, compute_loss=True, callbacks=[callback()])
model.save('./model/v2.model')