AttributeError: 'Field' object has no attribute 'vocab' preventing me to run the code

Question

我找到了这段代码，我想看看我在最后一行打印的对象是什么。我是 nlp 领域的新手，所以请帮我修复这段代码，因为它给出了 AttributeError: 'Field' object has no attribute 'vocab' 错误。顺便说一下，我发现 torchtext 已经被更改，错误可能与这些更改有关，代码可能之前是有效的。

import spacy
from torchtext.legacy.data import Field
spacy_eng = spacy.load("en")
def tokenize_eng(text):
    return [tok.text for tok in spacy_eng.tokenizer(text)]

english = Field(
    tokenize=tokenize_eng, lower=True, init_token="<sos>", eos_token="<eos>"
)
print([english.vocab.stoi["<sos>"]])

Answer 1

您必须先为 english Field 构建词汇表，然后才能尝试访问它。您将需要一个数据集来构建词汇表，这将是您要为其构建模型的数据集。您可以使用 english.build_vocab(...)。 Here are the docs 对于 build_vocab。

此外，如果您想了解如何将您正在做的事情迁移到新版本的 torchtext，here is a good resource。