NLP ELMo模型剪枝输入

Question

我正在尝试根据 tensorflow hub 上可用的预训练 ELMo 模型检索单词的嵌入。我使用的代码是从这里修改的：https://www.geeksforgeeks.org/overview-of-word-embedding-using-embeddings-from-language-models-elmo/

我输入的句子是
bod = 即将到来，每个项目都有望制作视频，因为我们期待在这次会议上与您讨论这个问题，这次他们已经制定了视频奖的选择标准，今年将成为头把交椅时间“

这些是我想要嵌入的关键字：
words=["do", "a", "video"]

embeddings = elmo([bod],
signature="default",
as_dict=True)["elmo"]
init = tf.initialize_all_variables()
sess = tf.Session()
sess.run(init)

这句话的长度是236个字符。这是图片显示

但是当我将这句话放入ELMo模型时，返回的张量只包含一个长度为48的字符串

当我尝试为超出 48 长度限制的关键字提取嵌入时，这就成了一个问题，因为关键字的索引显示超出了这个长度：

这是我用来获取 'bod' 中单词索引的代码（如上所示）

num_list=[]
for item in words:
  print(item)
  index = bod.index(item)
  num_list.append(index)
num_list

但我一直运行进入这个错误：

我试图寻找 ELMo 文档来解释为什么会发生这种情况，但我没有找到任何与修剪输入问题相关的内容。

非常感谢任何建议！

谢谢

Answer 1

这并不是真正的 AllenNLP 问题，因为您使用的是基于张量流的 ELMo 实现。

也就是说，我认为问题在于 ELMo 嵌入了标记，而不是字符。您将获得 48 个嵌入，因为该字符串有 48 个标记。

NLP ELMo model pruning input