关于 Deberta 句子的文本嵌入错误

textEmbed error about sentencepiece for Deberta

当 运行 R 包文本中的 deberta 时出现错误,当 运行:

textEmbed(“hello”, model = “microsoft/deberta-v3-base”)

错误:

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.

因此,要使其正常工作,您需要在您的 conda 环境中安装 sentencepiece。 (当我这样做时,我遇到了一些问题,RStudio 对我来说太僵硬了——所以在更新 RStudio 和 R 之后,我用 scipy 1.6sentencepiece 创建了一个特定的 conda 环境,然后它可以正常工作:

text::textrpp_install(rpp_version=c("torch==1.8", "transformers==4.12.5",
                                    "numpy", "nltk",
                                    "scipy==1.6", "sentencepiece"),
                      envname = "textrpp_condaenv_sentencepiece")

text::textrpp_initialize(condaenv = "textrpp_condaenv_sentencepiece",
                         refresh_settings = TRUE)