Why do I get "ValueError: need more than 1 value to unpack" when using spaCy convert on my conllu data?
Why do I get "ValueError: need more than 1 value to unpack" when using spaCy convert on my conllu data?
我正在尝试使用 spaCy convert 将我的训练数据转换为 spaCy train。我的数据看起来像这样(在使用 pandas 进行一些预处理之后):
1 Hii hii PRON _ NounClass=9|Num=Sing _ _ _ _
2 si si VERB _ _ _ _ _ _
3 mara mara NOUN _ NounClass=10|Num=Plur _ _ _ _
4 ya_kwanza ya_kwanza NUM _ _ _ _ _ _
5 kwa kwa ADP _ _ _ _ _ _
6 uongozi uongozi NOUN _ NounClass=11|Num=Sing _ _ _ _
我在终端中使用了以下命令:
PS C:\Users\...\pythonProject1> python -m spacy convert C:\Users\...\pythonProject1\my_dataframe_ready.conllu C:\Users\...\pythonProject1\train
并获得以下输出:
ℹ Grouping every 1 sentences into a document.
⚠ To generate better training data, you may want to group sentences into
documents with `-n 10`.
Traceback (most recent call last):
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\__main__.py", line 4, in <module>
setup_cli()
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 89, in convert_cli
msg=msg,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 140, in convert
db = DocBin(docs=docs, store_user_data=True)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\tokens\_serialize.py", line 86, in __init__
for doc in docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 38, i
n conllu_to_docs
for sent_doc in sent_docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 85, i
n read_conllx
ner_map=ner_map,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 209,
in conllu_sentence_to_doc
heads=heads,
File "spacy\tokens\doc.pyx", line 366, in spacy.tokens.doc.Doc.__init__
File "spacy\morphology.pyx", line 49, in spacy.morphology.Morphology.add
File "spacy\morphology.pyx", line 153, in spacy.morphology.Morphology.feats_to_dict
ValueError: need more than 1 value to unpack
我的数据还有问题吗?我实际上不知道这个错误应该告诉我什么。
根据发生错误的行,您似乎某处的功能列表格式不正确。功能列表看起来像 alpha=yes|beta=no
。看起来你可能有一些看起来像 alpha=yes|beta
的东西,这是无效的。
我认为下划线本身是一种特殊情况,应该有效,但也许您有其他类型的填充符?
您可以通过修改 conllu_to_docs.py
中的 conllu_sentence_to_doc
函数来调试它,以便在调用 doc = Doc(...)
.
之前打印 morphs
值
我正在尝试使用 spaCy convert 将我的训练数据转换为 spaCy train。我的数据看起来像这样(在使用 pandas 进行一些预处理之后):
1 Hii hii PRON _ NounClass=9|Num=Sing _ _ _ _
2 si si VERB _ _ _ _ _ _
3 mara mara NOUN _ NounClass=10|Num=Plur _ _ _ _
4 ya_kwanza ya_kwanza NUM _ _ _ _ _ _
5 kwa kwa ADP _ _ _ _ _ _
6 uongozi uongozi NOUN _ NounClass=11|Num=Sing _ _ _ _
我在终端中使用了以下命令:
PS C:\Users\...\pythonProject1> python -m spacy convert C:\Users\...\pythonProject1\my_dataframe_ready.conllu C:\Users\...\pythonProject1\train
并获得以下输出:
ℹ Grouping every 1 sentences into a document.
⚠ To generate better training data, you may want to group sentences into
documents with `-n 10`.
Traceback (most recent call last):
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\__main__.py", line 4, in <module>
setup_cli()
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\_util.py", line 71, in setup_cli
command(prog_name=COMMAND)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 782, in main
rv = self.invoke(ctx)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\click\core.py", line 610, in invoke
return callback(*args, **kwargs)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\typer\main.py", line 500, in wrapper
return callback(**use_params) # type: ignore
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 89, in convert_cli
msg=msg,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\cli\convert.py", line 140, in convert
db = DocBin(docs=docs, store_user_data=True)
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\tokens\_serialize.py", line 86, in __init__
for doc in docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 38, i
n conllu_to_docs
for sent_doc in sent_docs:
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 85, i
n read_conllx
ner_map=ner_map,
File "C:\Users\...\miniconda3\envs\pythonProject1\lib\site-packages\spacy\training\converters\conllu_to_docs.py", line 209,
in conllu_sentence_to_doc
heads=heads,
File "spacy\tokens\doc.pyx", line 366, in spacy.tokens.doc.Doc.__init__
File "spacy\morphology.pyx", line 49, in spacy.morphology.Morphology.add
File "spacy\morphology.pyx", line 153, in spacy.morphology.Morphology.feats_to_dict
ValueError: need more than 1 value to unpack
我的数据还有问题吗?我实际上不知道这个错误应该告诉我什么。
根据发生错误的行,您似乎某处的功能列表格式不正确。功能列表看起来像 alpha=yes|beta=no
。看起来你可能有一些看起来像 alpha=yes|beta
的东西,这是无效的。
我认为下划线本身是一种特殊情况,应该有效,但也许您有其他类型的填充符?
您可以通过修改 conllu_to_docs.py
中的 conllu_sentence_to_doc
函数来调试它,以便在调用 doc = Doc(...)
.
morphs
值