混淆 tensorflow hub 中 elmo 模型的参数 'tokens_length'

Question

我正在查看 tensorflow hub 中的 ELMo 模型，我不太清楚 tokens_length = [6, 5] 在流程示例使用： (https://tfhub.dev/google/elmo/2)

elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
                ["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
    inputs={
        "tokens": tokens_input,
        "sequence_len": tokens_length
    },
    signature="tokens",
    as_dict=True)["elmo"]

它不喜欢输入标记句子的最大长度，也不喜欢[每个句子的最大单词数，句子数]，这让我很困惑。有人可以解释一下吗？谢谢！

Answer 1

第一个示例的长度为 6，第二个示例的长度为 5:。即

"the cat is on the mat" 有 6 个字长，但是 "dogs are in the fog" 只有 5 个字长.输入中额外的空字符串确实增加了一点混乱：-/

如果您阅读该页面上的文档，它会解释为什么需要这样做（粗体字体是我的）

With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.

混淆 tensorflow hub 中 elmo 模型的参数 'tokens_length'

confuse about parameter 'tokens_length' of elmo model in tensorflow hub

nlp

embedding

tensorflow

elmo