使用 .prop 文件以编程方式训练 NER 模型
Programmatically training NER Model using .prop file
我一直在使用 属性 文件训练我的 ner 模型,如此处的教程 LINK 所示。我正在使用相同的道具文件,但是当我不明白如何以编程方式进行时。
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("ner.model", "resources/NER.prop");
道具文件如下图:
# location of the training file
trainFile = nerTEST.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = resources/ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
错误:
java.io.StreamCorruptedException: invalid stream header: 23206C6F
....
..
Caused by: java.io.IOException: Couldn't load classifier from resources/NER.prop
从 上的另一个问题,我了解到您直接提供了模型文件。但是,我们如何在 属性 文件的帮助下做到这一点?
您应该从命令行运行执行此命令:
java -cp "*" edu.stanford.nlp.ie.crf.CRFClassifier -prop NER.prop
如果你想在Java代码中运行这个,你可以这样做:
String[] args = new String[]{"-props", "NER.prop"};
CRFClassifier.main(args);
.prop 文件是指定用于训练模型的设置的文件。您的代码试图将 .prop 文件作为模型本身加载,这导致了错误。
执行任一操作都会在 resources/ner-model 处生成最终模型。ser.gz
public class TrainModel {
private void trainCrf(String serializeFile, String prop) {
Properties props = StringUtils.propFileToProperties(prop);
props.setProperty("serializeTo", serializeFile);
SeqClassifierFlags flags = new SeqClassifierFlags(props);
CRFClassifier<CoreLabel> crf = new CRFClassifier<>(flags);
crf.train();
crf.serializeClassifier(serializeFile);
}
public static void main(String[] args) {
String serializeFile = "skill/ner-model.ser.gz";
String prop = "ner.props";
TrainModel trainModel = new TrainModel();
trainModel.trainCrf(serializeFile, prop);
}
}
我一直在使用 属性 文件训练我的 ner 模型,如此处的教程 LINK 所示。我正在使用相同的道具文件,但是当我不明白如何以编程方式进行时。
props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, sentiment, regexner");
props.setProperty("ner.model", "resources/NER.prop");
道具文件如下图:
# location of the training file
trainFile = nerTEST.tsv
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = resources/ner-model.ser.gz
# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1
# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1
# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
错误:
java.io.StreamCorruptedException: invalid stream header: 23206C6F
....
..
Caused by: java.io.IOException: Couldn't load classifier from resources/NER.prop
从
您应该从命令行运行执行此命令:
java -cp "*" edu.stanford.nlp.ie.crf.CRFClassifier -prop NER.prop
如果你想在Java代码中运行这个,你可以这样做:
String[] args = new String[]{"-props", "NER.prop"};
CRFClassifier.main(args);
.prop 文件是指定用于训练模型的设置的文件。您的代码试图将 .prop 文件作为模型本身加载,这导致了错误。
执行任一操作都会在 resources/ner-model 处生成最终模型。ser.gz
public class TrainModel {
private void trainCrf(String serializeFile, String prop) {
Properties props = StringUtils.propFileToProperties(prop);
props.setProperty("serializeTo", serializeFile);
SeqClassifierFlags flags = new SeqClassifierFlags(props);
CRFClassifier<CoreLabel> crf = new CRFClassifier<>(flags);
crf.train();
crf.serializeClassifier(serializeFile);
}
public static void main(String[] args) {
String serializeFile = "skill/ner-model.ser.gz";
String prop = "ner.props";
TrainModel trainModel = new TrainModel();
trainModel.trainCrf(serializeFile, prop);
}
}