使用自定义数据集训练 SpaCy NER
Training SpaCy NER with a custom dataset
我已按照 this SpaCy 教程训练自定义数据集。我的数据集是一个地名词典。因此,我做了如下训练数据。
TRAIN_DATA = [
("Where is Abbess",{"entities":[(9, 15,"GPE")]}),
("Where is Abbey Pass",{"entities":[(9, 19,"LOC")]}),
("Where is Abbot",{"entities":[(9, 14,"GPE")]}),
("Where is Abners Head",{"entities":[(9, 29,"LOC")]}),
("Where is Acheron Flat",{"entities":[(9, 21,"LOC")]}),
("Where is Acheron River",{"entities":[(9, 22,"LOC")]})
]
我使用 'en_core_web_sm'
进行训练,而不是空白模型。
model = 'en_core_web_sm'
output_dir=Path(path)
n_iter=20
在训练了 20 个 epocs 之后,我尝试用训练好的模型进行预测。以下是我得到的输出。
test_text = "Seven people, including teenagers, have been taken to hospital after their car crashed in the mid-Canterbury town of Rakaia."
Seven people, including teenagers 0 33 GPE
the mid-Canterbury town of Rakaia.. 90 125 GPE
我使用 'en_core_web_sm'
对相同的 test_text 进行了预测。输出如下。
Seven 0 5 CARDINAL
mid-Canterbury 94 108 DATE
Rakaia 117 123 GPE
有人可以指导我训练 SpaCy 时犯的错误吗?
我已按照 this SpaCy 教程训练自定义数据集。我的数据集是一个地名词典。因此,我做了如下训练数据。
TRAIN_DATA = [
("Where is Abbess",{"entities":[(9, 15,"GPE")]}),
("Where is Abbey Pass",{"entities":[(9, 19,"LOC")]}),
("Where is Abbot",{"entities":[(9, 14,"GPE")]}),
("Where is Abners Head",{"entities":[(9, 29,"LOC")]}),
("Where is Acheron Flat",{"entities":[(9, 21,"LOC")]}),
("Where is Acheron River",{"entities":[(9, 22,"LOC")]})
]
我使用 'en_core_web_sm'
进行训练,而不是空白模型。
model = 'en_core_web_sm'
output_dir=Path(path)
n_iter=20
在训练了 20 个 epocs 之后,我尝试用训练好的模型进行预测。以下是我得到的输出。
test_text = "Seven people, including teenagers, have been taken to hospital after their car crashed in the mid-Canterbury town of Rakaia."
Seven people, including teenagers 0 33 GPE
the mid-Canterbury town of Rakaia.. 90 125 GPE
我使用 'en_core_web_sm'
对相同的 test_text 进行了预测。输出如下。
Seven 0 5 CARDINAL
mid-Canterbury 94 108 DATE
Rakaia 117 123 GPE
有人可以指导我训练 SpaCy 时犯的错误吗?