使用 hugging_face load_dataset 从常见的声音加载数据出错

Loading Data using hugging_face load_dataset from common voices giving error

我正在使用 facebook hugging-face transformer 处理语音数据集,但无法从 commonvoice 论坛加载数据

from datasets import load_dataset, load_metric
common_voice_train = load_dataset("common_voice", "id", split="train+validation")
common_voice_test = load_dataset("common_voice", "id", split="test")

它给出以下错误

  Couldn't find file locally at common_voice/common_voice.py, or remotely at https://raw.githubusercontent.com/huggingface/datasets/1.4.1/datasets/common_voice/common_voice.py.
The file was picked from the master branch on github instead at https://raw.githubusercontent.com/huggingface/datasets/master/datasets/common_voice/common_voice.py.

您正在使用 Hugging Face 轻量级数据集库来加载 Common Voice 存储库数据集。 id参数必须替换为builder配置参数,比如要从Common Voice语料库加载英文数据集,builder配置参数为en.

您可以在 Common Voice 存储库中查看参数。它在提到版本的地方作为前缀。