Why do I get "ValueError: need more than 1 value to unpack" when using spaCy convert on my conllu data?

Why do I get "ValueError: need more than 1 value to unpack" when using spaCy convert on my conllu data?

我正在尝试使用 spaCy convert 将我的训练数据转换为 spaCy train。我的数据看起来像这样(在使用 pandas 进行一些预处理之后):

1   Hii hii PRON    _   NounClass=9|Num=Sing    _   _   _   _
2   si  si  VERB    _   _   _   _   _   _
3   mara    mara    NOUN    _   NounClass=10|Num=Plur   _   _   _   _
4   ya_kwanza   ya_kwanza   NUM _   _   _   _   _   _
5   kwa kwa ADP _   _   _   _   _   _
6   uongozi uongozi NOUN    _   NounClass=11|Num=Sing   _   _   _   _

我在终端中使用了以下命令:

PS C:\Users\...\pythonProject1> python -m spacy convert C:\Users\...\pythonProject1\my_dataframe_ready.conllu C:\Users\...\pythonProject1\train

并获得以下输出:

ℹ Grouping every 1 sentences into a document.
⚠ To generate better training data, you may want to group sentences into
documents with `-n 10`.
Traceback (most recent call last):
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\__main__.py", line 4, in <module>
    setup_cli()
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
    command(prog_name=COMMAND)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\typer\main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 89, in convert_cli
    msg=msg,
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 140, in convert
    db = DocBin(docs=docs, store_user_data=True)
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\tokens\_serialize.py", line 86, in __init__
    for doc in docs:
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 38, i
n conllu_to_docs
    for sent_doc in sent_docs:
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 85, i
n read_conllx
    ner_map=ner_map,
  File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 209,
in conllu_sentence_to_doc
    heads=heads,
  File "spacy\tokens\doc.pyx", line 366, in spacy.tokens.doc.Doc.__init__
  File "spacy\morphology.pyx", line 49, in spacy.morphology.Morphology.add
  File "spacy\morphology.pyx", line 153, in spacy.morphology.Morphology.feats_to_dict
ValueError: need more than 1 value to unpack

我的数据还有问题吗?我实际上不知道这个错误应该告诉我什么。

根据发生错误的行,您似乎某处的功能列表格式不正确。功能列表看起来像 alpha=yes|beta=no。看起来你可能有一些看起来像 alpha=yes|beta 的东西,这是无效的。

我认为下划线本身是一种特殊情况,应该有效,但也许您有其他类型的填充符?

您可以通过修改 conllu_to_docs.py 中的 conllu_sentence_to_doc 函数来调试它,以便在调用 doc = Doc(...).

之前打印 morphs