混淆 tensorflow hub 中 elmo 模型的参数 'tokens_length'
confuse about parameter 'tokens_length' of elmo model in tensorflow hub
我正在查看 tensorflow hub 中的 ELMo 模型,我不太清楚 tokens_length = [6, 5] 在流程示例使用:
(https://tfhub.dev/google/elmo/2)
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]
它不喜欢输入标记句子的最大长度,也不喜欢[每个句子的最大单词数,句子数],这让我很困惑。
有人可以解释一下吗?
谢谢!
第一个示例的长度为 6
,第二个示例的长度为 5
:。即
"the cat is on the mat" 有 6 个字长,但是 "dogs are in the fog" 只有 5 个字长.输入中额外的空字符串确实增加了一点混乱:-/
如果您阅读该页面上的文档,它会解释为什么需要这样做(粗体 字体是我的)
With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.
我正在查看 tensorflow hub 中的 ELMo 模型,我不太清楚 tokens_length = [6, 5] 在流程示例使用: (https://tfhub.dev/google/elmo/2)
elmo = hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)
tokens_input = [["the", "cat", "is", "on", "the", "mat"],
["dogs", "are", "in", "the", "fog", ""]]
tokens_length = [6, 5]
embeddings = elmo(
inputs={
"tokens": tokens_input,
"sequence_len": tokens_length
},
signature="tokens",
as_dict=True)["elmo"]
它不喜欢输入标记句子的最大长度,也不喜欢[每个句子的最大单词数,句子数],这让我很困惑。 有人可以解释一下吗? 谢谢!
第一个示例的长度为 6
,第二个示例的长度为 5
:。即
"the cat is on the mat" 有 6 个字长,但是 "dogs are in the fog" 只有 5 个字长.输入中额外的空字符串确实增加了一点混乱:-/
如果您阅读该页面上的文档,它会解释为什么需要这样做(粗体 字体是我的)
With the tokens signature, the module takes tokenized sentences as input. The input tensor is a string tensor with shape [batch_size, max_length] and an int32 tensor with shape [batch_size] corresponding to the sentence length. The length input is necessary to exclude padding in the case of sentences with varying length.