如何使用新的训练数据集更新训练有素的 space 神经网络模型?
How to I update my trained space ner model with new training dataset?
我是 nlp 的新手,我开始学习如何在 spacy 中训练自定义 ner。
TRAIN_DATA = [
('what is the price of polo?', {'entities': [(21, 25, 'Product')]}),
('what is the price of ball?', {'entities': [(21, 25, 'Product')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'Product')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'Product')]}),
('what is the price of jeans?', {'entities': [(21, 26, 'Product')]}),
('what is the price of bat?', {'entities': [(21, 24, 'Product')]}),
('what is the price of shirt?', {'entities': [(21, 26, 'Product')]}),
('what is the price of bag?', {'entities': [(21, 24, 'Product')]}),
('what is the price of cup?', {'entities': [(21, 24, 'Product')]}),
('what is the price of jug?', {'entities': [(21, 24, 'Product')]}),
('what is the price of plate?', {'entities': [(21, 26, 'Product')]}),
('what is the price of glass?', {'entities': [(21, 26, 'Product')]}),
('what is the price of moniter?', {'entities': [(21, 28, 'Product')]}),
('what is the price of desktop?', {'entities': [(21, 28, 'Product')]}),
('what is the price of bottle?', {'entities': [(21, 27, 'Product')]}),
('what is the price of mouse?', {'entities': [(21, 26, 'Product')]}),
('what is the price of keyboad?', {'entities': [(21, 28, 'Product')]}),
('what is the price of chair?', {'entities': [(21, 26, 'Product')]}),
('what is the price of table?', {'entities': [(21, 26, 'Product')]}),
('what is the price of watch?', {'entities': [(21, 26, 'Product')]})
]
第一次训练空白spacy模型:
def train_spacy(data,iterations):
TRAIN_DATA = data
nlp = spacy.blank('en') # create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp
start_training = train_spacy(TRAIN_DATA, 20)
保存我训练的 spacy 模型:
# Saveing the trained model
start_training.to_disk("spacy_start_model")
我的问题是如何用新的训练数据更新保存的模型?
新训练数据:
TRAIN_DATA_2 = [('Who is Chaka Khan?', {"entities": [(7, 17, 'PERSON')]}),
('I like London and Berlin.', {"entities": [(7, 13, 'LOC')]})]
有人可以帮我解决这个问题吗?
提前致谢!
据我所知,您可以使用新数据示例重新训练您的模型,但您现在可以从现有模型开始,而不是从空白模型开始。
为了实现这一点,它将首先从您的 train_spacy
方法中删除以下行,并且可能会接收模型作为参数:
nlp = spacy.blank('en') # create blank Language class
然后重新训练你的模型而不是加载一个 spacy 空白模型并传递给你的训练方法,使用 load
方法加载你现有的模型然后调用你的训练方法(阅读更多关于 spacy save/load here).
start_training = spacy.load("spacy_start_model")
最后一个建议,在我的实践中,我通过从现有模型(例如 en_core_web_md
或 en_core_web_lg
重新训练一个 spacy NER 模型,添加我的自定义实体,而不是从头开始训练,获得了更好的结果来自宽敞的空白模型。
全部:
- 方法更新
def train_spacy(data, iterations, nlp): # <-- Add model as nlp parameter
TRAIN_DATA = data
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp
nlp = spacy.blank('en') # create blank Language class
start_training = train_spacy(TRAIN_DATA, 20, nlp)
- 重新训练您的模型
TRAIN_DATA_2 = [('Who is Chaka Khan?', {"entities": [(7, 17, 'PERSON')]}),
('I like London and Berlin.', {"entities": [(7, 13, 'LOC')]})]
nlp = spacy.load("spacy_start_model") # <-- Now your base model is your custom model
start_training = train_spacy(TRAIN_DATA_2, 20, nlp)
希望这对你有用!
我是 nlp 的新手,我开始学习如何在 spacy 中训练自定义 ner。
TRAIN_DATA = [
('what is the price of polo?', {'entities': [(21, 25, 'Product')]}),
('what is the price of ball?', {'entities': [(21, 25, 'Product')]}),
('what is the price of jegging?', {'entities': [(21, 28, 'Product')]}),
('what is the price of t-shirt?', {'entities': [(21, 28, 'Product')]}),
('what is the price of jeans?', {'entities': [(21, 26, 'Product')]}),
('what is the price of bat?', {'entities': [(21, 24, 'Product')]}),
('what is the price of shirt?', {'entities': [(21, 26, 'Product')]}),
('what is the price of bag?', {'entities': [(21, 24, 'Product')]}),
('what is the price of cup?', {'entities': [(21, 24, 'Product')]}),
('what is the price of jug?', {'entities': [(21, 24, 'Product')]}),
('what is the price of plate?', {'entities': [(21, 26, 'Product')]}),
('what is the price of glass?', {'entities': [(21, 26, 'Product')]}),
('what is the price of moniter?', {'entities': [(21, 28, 'Product')]}),
('what is the price of desktop?', {'entities': [(21, 28, 'Product')]}),
('what is the price of bottle?', {'entities': [(21, 27, 'Product')]}),
('what is the price of mouse?', {'entities': [(21, 26, 'Product')]}),
('what is the price of keyboad?', {'entities': [(21, 28, 'Product')]}),
('what is the price of chair?', {'entities': [(21, 26, 'Product')]}),
('what is the price of table?', {'entities': [(21, 26, 'Product')]}),
('what is the price of watch?', {'entities': [(21, 26, 'Product')]})
]
第一次训练空白spacy模型:
def train_spacy(data,iterations):
TRAIN_DATA = data
nlp = spacy.blank('en') # create blank Language class
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp
start_training = train_spacy(TRAIN_DATA, 20)
保存我训练的 spacy 模型:
# Saveing the trained model
start_training.to_disk("spacy_start_model")
我的问题是如何用新的训练数据更新保存的模型? 新训练数据:
TRAIN_DATA_2 = [('Who is Chaka Khan?', {"entities": [(7, 17, 'PERSON')]}),
('I like London and Berlin.', {"entities": [(7, 13, 'LOC')]})]
有人可以帮我解决这个问题吗? 提前致谢!
据我所知,您可以使用新数据示例重新训练您的模型,但您现在可以从现有模型开始,而不是从空白模型开始。
为了实现这一点,它将首先从您的 train_spacy
方法中删除以下行,并且可能会接收模型作为参数:
nlp = spacy.blank('en') # create blank Language class
然后重新训练你的模型而不是加载一个 spacy 空白模型并传递给你的训练方法,使用 load
方法加载你现有的模型然后调用你的训练方法(阅读更多关于 spacy save/load here).
start_training = spacy.load("spacy_start_model")
最后一个建议,在我的实践中,我通过从现有模型(例如 en_core_web_md
或 en_core_web_lg
重新训练一个 spacy NER 模型,添加我的自定义实体,而不是从头开始训练,获得了更好的结果来自宽敞的空白模型。
全部:
- 方法更新
def train_spacy(data, iterations, nlp): # <-- Add model as nlp parameter
TRAIN_DATA = data
# create the built-in pipeline components and add them to the pipeline
# nlp.create_pipe works for built-ins that are registered with spaCy
if 'ner' not in nlp.pipe_names:
ner = nlp.create_pipe('ner')
nlp.add_pipe(ner, last=True)
# add labels
for _, annotations in TRAIN_DATA:
for ent in annotations.get('entities'):
ner.add_label(ent[2])
# get names of other pipes to disable them during training
other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
with nlp.disable_pipes(*other_pipes): # only train NER
optimizer = nlp.begin_training()
for itn in range(iterations):
print("Statring iteration " + str(itn))
random.shuffle(TRAIN_DATA)
losses = {}
for text, annotations in TRAIN_DATA:
nlp.update(
[text], # batch of texts
[annotations], # batch of annotations
drop=0.2, # dropout - make it harder to memorise data
sgd=optimizer, # callable to update weights
losses=losses)
print(losses)
return nlp
nlp = spacy.blank('en') # create blank Language class
start_training = train_spacy(TRAIN_DATA, 20, nlp)
- 重新训练您的模型
TRAIN_DATA_2 = [('Who is Chaka Khan?', {"entities": [(7, 17, 'PERSON')]}),
('I like London and Berlin.', {"entities": [(7, 13, 'LOC')]})]
nlp = spacy.load("spacy_start_model") # <-- Now your base model is your custom model
start_training = train_spacy(TRAIN_DATA_2, 20, nlp)
希望这对你有用!