关于 Deberta 句子的文本嵌入错误

Question

当运行 R 包文本中的 deberta 时出现错误，当运行:

textEmbed(“hello”, model = “microsoft/deberta-v3-base”)

错误：

Error in py_call_impl(callable, dots$args, dots$keywords) : 
  ValueError: This tokenizer cannot be instantiated. Please make sure you have `sentencepiece` installed in order to use this tokenizer.

Answer 1

因此，要使其正常工作，您需要在您的 conda 环境中安装 sentencepiece。（当我这样做时，我遇到了一些问题，RStudio 对我来说太僵硬了——所以在更新 RStudio 和 R 之后，我用 scipy 1.6 和 sentencepiece 创建了一个特定的 conda 环境，然后它可以正常工作:

text::textrpp_install(rpp_version=c("torch==1.8", "transformers==4.12.5",
                                    "numpy", "nltk",
                                    "scipy==1.6", "sentencepiece"),
                      envname = "textrpp_condaenv_sentencepiece")

text::textrpp_initialize(condaenv = "textrpp_condaenv_sentencepiece",
                         refresh_settings = TRUE)

关于 Deberta 句子的文本嵌入错误

textEmbed error about sentencepiece for Deberta

r

r-text