"Token second\team not found and default index is not set" torchtext 函数错误

Question

这是我的代码，该函数对于训练集运行良好，但对于测试集returns这个错误RuntimeError: Token second\team not found and default index is not set

train_data, train_labels = text_classification._create_data_from_iterator(
    vocab, text_classification._csv_iterator(train_csv_path, ngrams, yield_cls=True), False)
test_data, test_labels = text_classification._create_data_from_iterator(
    vocab, text_classification._csv_iterator(test_csv_path, ngrams, yield_cls=True), False)

有谁知道哪里出了问题吗？

Answer 1

词汇表用作查找 table 用于将 str 转换为 int 的数据。当给定的字符串（在本例中为“second\team”）没有出现在词汇表中时，有两种策略可以补偿：

抛出一个错误，因为你不知道如何处理它。想象一下在 Python

{}[1]

KeyError

为缺失的标记分配一个默认的“未知”标记。想象一下 Python.

{}.get(1, "I don't know!")

您的代码目前正在执行#1。您似乎想要使用 vocab.set_default_index 可以实现的 #2。当你建立你的词汇表时，添加 specials=["<unk>"] kwarg 然后调用 vocab.set_default_index(vocab['<unk>']).

"Token second\team not found and default index is not set" torchtext 函数错误

"Token second\team not found and default index is not set" error in torchtext function

python

nlp

torchtext