调用 DeepSpeech generate_lm.py 时子进程调用错误
Subprocess call error while calling generate_lm.py of DeepSpeech
我正在尝试在 Colab 中使用 DeepSpeech 为语音转文本构建自定义记分器(语言模型)。调用 generate_lm.py 时出现此错误:
main()
File "generate_lm.py", line 201, in main
build_lm(args, data_lower, vocab_str)
File "generate_lm.py", line 126, in build_lm
binary_path,
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/content/DeepSpeech/native_client/kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', '/content/DeepSpeech/data/lm/lm_filtered.arpa', '/content/DeepSpeech/data/lm/lm.binary']' died with <Signals.SIGSEGV: 11>.```
Calling the script generate_lm.py like this :
```! python3 generate_lm.py --input_txt hindi_tokens.txt --output_dir /content/DeepSpeech/data/lm --top_k 500000 --kenlm_bins /content/DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie```
能够找到上述问题的解决方案。将 top_k
的值减少到 15000 后成功创建语言模型。我的短语文件只有大约 42000 个条目。我们必须根据我们集合中的短语数量调整 top_k
值。 top_k
参数表示 - 在处理之前将删除这些不太频繁的短语。
我正在尝试在 Colab 中使用 DeepSpeech 为语音转文本构建自定义记分器(语言模型)。调用 generate_lm.py 时出现此错误:
main()
File "generate_lm.py", line 201, in main
build_lm(args, data_lower, vocab_str)
File "generate_lm.py", line 126, in build_lm
binary_path,
File "/usr/lib/python3.7/subprocess.py", line 363, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/content/DeepSpeech/native_client/kenlm/build/bin/build_binary', '-a', '255', '-q', '8', '-v', 'trie', '/content/DeepSpeech/data/lm/lm_filtered.arpa', '/content/DeepSpeech/data/lm/lm.binary']' died with <Signals.SIGSEGV: 11>.```
Calling the script generate_lm.py like this :
```! python3 generate_lm.py --input_txt hindi_tokens.txt --output_dir /content/DeepSpeech/data/lm --top_k 500000 --kenlm_bins /content/DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|1" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie```
能够找到上述问题的解决方案。将 top_k
的值减少到 15000 后成功创建语言模型。我的短语文件只有大约 42000 个条目。我们必须根据我们集合中的短语数量调整 top_k
值。 top_k
参数表示 - 在处理之前将删除这些不太频繁的短语。