TypeError: a bytes-like object is required, not 'str' when converting gensim to tensorboard

TypeError: a bytes-like object is required, not 'str' when converting gensim to tensorboard

我正在使用以下代码将 gensim w2v 文件转换为 Tensorboard tsv 文件:

with open(outfiletsv, 'w+b') as file_vector:
    with open(outfiletsvmeta, 'w+b') as file_metadata:
        for word in model.index2word:
            file_metadata.write(gensim.utils.to_utf8(word) + gensim.utils.to_utf8('\n'))
            vector_row = '\t'.join(str(x) for x in model[word])
            file_vector.write(vector_row + '\n')

导致此错误:

TypeError                                 Traceback (most recent call last)
~\_repos\special\word2vec2tensor.py in <module>()
     79 
     80     logger.info("running %s", ' '.join(sys.argv))
---> 81     word2vec2tensor(args.input, args.output, args.binary)
     82     logger.info("finished running %s", os.path.basename(sys.argv[0]))

~\_repos\special\word2vec2tensor.py in word2vec2tensor(word2vec_model_path, tensor_filename, binary)
     61                 file_metadata.write(gensim.utils.to_utf8(word) + gensim.utils.to_utf8('\n'))
     62                 vector_row = '\t'.join(str(x) for x in model[word])
---> 63                 file_vector.write(vector_row + '\n')
     64 
     65     logger.info("2D tensor file saved to %s", outfiletsv)

TypeError: a bytes-like object is required, not 'str'

我在打开的文件片段中将 b 添加到原来的 w+ 以抵消相反的问题 (TypeError: write() argument must be str, not bytes)。

我尝试添加`vector_row = vector_row.encode('UTF-8'),但这没有用。

我该如何解决这个问题TypeError

您可以将字符串转换回字节:

file_vector.write(vector_row.encode() + b'\n')

但是你的代码已经将文件读取为字节,然后你明确地转换为 str (我猜):'\t'.join(str(x) for x in model[word])

所以你可能想要清理并在所有地方使用字节而不是来回:)