如何用我自己的训练集训练文本蕴含模型？

Question

我想训练 decomposable attention + ELMo; SNLI model on the demo with my own dataset. I'm new to nlp. After going through the guide，我仍然不知道如何开始我自己的训练集，该训练集由纯文本前提、假设和标签组成。数据格式如下所示。

根据demo上的训练命令，我发现它的训练集是https://allennlp.s3.amazonaws.com/datasets/snli/snli_1.0_train.jsonl。如何用自己的数据生成这样的训练集？

仅供参考。我的数据集是这样的：

{ "premise":"sentences", "hypothesis":"sentences", "label":"x"}
{ "premise":"sentences", "hypothesis":"sentences", "label":"y"}
...

snli_1.0_train.jsonl 中的条目如下：

{"annotator_labels": ["neutral"], "captionID": "3416050480.jpg#4", "gold_label": "neutral", "pairID": "3416050480.jpg#4r1n", "sentence1": "A person on a horse jumps over a broken down airplane.", "sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )", "sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))", "sentence2": "A person is training his horse for a competition.", "sentence2_binary_parse": "( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )", "sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))"}

如果有人能提供帮助，我将不胜感激。谢谢。

Answer 1

在将 AllenNLP 应用于新数据集时，通常需要实现一个新的 DatasetReader。在这种情况下，您可以简单地调整现有的 SnliReader to the format of your dataset, or adjust the format of your dataset to work with the existing SnliReader. You can see here，即此 reader 仅查找 3 个字段：“gold_labels”（“标签”）、“sentence1”（“前提”）和“句子 2”（“假设”）。

如何用我自己的训练集训练文本蕴含模型？

How to train a textual entailment model with my own training set?

python

nlp

allennlp