每当从 Tensorflow-hub 加载模型时,Colab 内核都会重新启动

Colab Kernel Restarts Whenever Loading a Model From Tensorflow-hub

我想尝试一下 tensorflow-hub 中提供的嵌入,具体来说 'universal-sentence-encoder'。我尝试了提供的示例 (https://colab.research.google.com/github/tensorflow/hub/blob/master/examples/colab/semantic_similarity_with_tf_hub_universal_encoder.ipynb) 它工作正常。所以我尝试对 'multilingual' 模型做同样的事情,但每次加载多语言模型时,colab 内核都会失败并重新启动。有什么问题,我该如何解决?

import tensorflow as tf
import tensorflow_hub as hub
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import re
import seaborn as sns
import tf_sentencepiece
import sentencepiece

# Import the Universal Sentence Encoder's TF Hub module
embed = hub.Module("https://tfhub.dev/google/universal-sentence-encoder-multilingual/1") // This is where the kernel dies.
print("imported model")
# Compute a representation for each message, showing various lengths supported.
word = "코끼리"
sentence = "나는 한국어로 쓰여진 문장이야."
paragraph = (
    "동해물과 백두산이 마르고 닳도록. "
    "하느님이 보우하사 우리나라 만세~")
messages = [word, sentence, paragraph]

# Reduce logging output.
tf.logging.set_verbosity(tf.logging.ERROR)

with tf.Session() as session:
  session.run([tf.global_variables_initializer(), tf.tables_initializer()])
  message_embeddings = session.run(embed(messages))

  for i, message_embedding in enumerate(np.array(message_embeddings).tolist()):
    print("Message: {}".format(messages[i]))
    print("Embedding size: {}".format(len(message_embedding)))
    message_embedding_snippet = ", ".join(
        (str(x) for x in message_embedding[:3]))
    print("Embedding: [{}, ...]\n".format(message_embedding_snippet))

我在使用多语言句子编码器时遇到了类似的问题。我通过将 tensorflow 版本指定为 1.14.0 并将 tf-sentencepiece 指定为 0.1.83 来解决它,所以在 运行 你在 colab 中的代码之前尝试:

!pip3 install tensorflow==1.14.0
!pip3 install tensorflow-hub
!pip3 install sentencepiece
!pip3 install tf-sentencepiece==0.1.83

我能够在 colab 中重现您的问题,并且此解决方案正确加载了模型:

好像是sentencepiece和tensorflow的兼容性问题,检查一下这个问题的更新here。 让我们知道怎么回事。祝你好运,希望对你有所帮助。

编辑:如果 tensorflow 版本 1.14.0 不起作用,请将其更改为 1.13.1。一旦解决了 tensorflow 和 sentencepiece 之间的兼容性,这个问题就应该得到解决。