AllenNLP 2.0:将 allennlp predict 与 MultiTaskDatasetReader 一起使用会导致 RuntimeError
AllenNLP 2.0: Using `allennlp predict` with MultiTaskDatasetReader leads to RuntimeError
我使用 allennlp 2.0 训练了一个多任务模型,现在想使用 allennlp predict
命令对新示例进行预测。
Problem/Error:
我正在使用以下命令:allennlp predict results/model.tar.gz new_instances.jsonl --include-package mtl_sd --predictor mtlsd_predictor --use-dataset-reader --dataset-reader-choice validation
这给了我以下错误:
Traceback (most recent call last):
File ".../mtl_sd_venv/bin/allennlp", line 10, in <module>
sys.exit(run())
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 119, in main
args.func(args)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 220, in _predict
manager.run()
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 186, in run
for batch in lazy_groups_of(self._get_instance_data(), self._batch_size):
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/common/util.py", line 139, in lazy_groups_of
s = list(islice(iterator, group_size))
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 180, in _get_instance_data
yield from self._dataset_reader.read(self._input_file)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/data/dataset_readers/multitask.py", line 31, in read
raise RuntimeError("This class is not designed to be called like this")
RuntimeError: This class is not designed to be called like this
据我了解,情况是这样的:
这个 RuntimeError is raised by the MultiTaskDatasetReader 因为 MultiTaskDatasetReader 的 read()
方法不应该被调用。 read()
方法只应为 MultiTaskDatasetReader.readers
中的特定 DatasetReader 调用。
MultiTaskDatasetReader 的 read() 方法被调用,因为在 jsonnet-config 中我指定了 DatasetsReaders 如下:
"dataset_reader": {
"type": "multitask",
"readers": {
"SemEval2016": {
"type": "SemEval2016",
"max_sequence_length": 509,
"token_indexers": {
"bert": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
},
"tokenizer": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
}, ...
}
}
通常dataset_reader的type
表示数据集-reader class要实例化进行预测。但在这种情况下,type
仅指向 MultiTaskDatasetReader,它没有实现 read()
方法并包含多个 DatasetReader。
据我了解,在使用 allennlp predict
时,我需要以某种方式指定应该使用多个 DatasetReader 中的哪个。
问题是:
如何指定在执行allennlp predict
时应该使用哪个特定的DatasetReader(MultiTaskDatasetReader.readers
中的多个DatasetReader)?或者更一般地说:如何使用 MultiTaskDatasetReader 将 allennlp predict
变为 运行?
附加代码,为了完整起见:
预测器:
@Predictor.register('mtlsd_predictor')
class MTLSDPredictor(Predictor):
def predict(self, sentence: str) -> JsonDict:
return self.predict_json({'sentence': sentence})
@overrides
def _json_to_instance(self, json_dict: JsonDict) -> Instance:
target = json_dict['text1']
claim = json_dict['text2']
return self._dataset_reader.text_to_instance(target, claim)
这里有两个问题。一个是 AllenNLP 中的错误,已在 2.1.0 版中修复。另一个是@sinaj 在他的模型头上缺少 default_predictor
。
我使用 allennlp 2.0 训练了一个多任务模型,现在想使用 allennlp predict
命令对新示例进行预测。
Problem/Error:
我正在使用以下命令:allennlp predict results/model.tar.gz new_instances.jsonl --include-package mtl_sd --predictor mtlsd_predictor --use-dataset-reader --dataset-reader-choice validation
这给了我以下错误:
Traceback (most recent call last):
File ".../mtl_sd_venv/bin/allennlp", line 10, in <module>
sys.exit(run())
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/__init__.py", line 119, in main
args.func(args)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 220, in _predict
manager.run()
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 186, in run
for batch in lazy_groups_of(self._get_instance_data(), self._batch_size):
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/common/util.py", line 139, in lazy_groups_of
s = list(islice(iterator, group_size))
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/commands/predict.py", line 180, in _get_instance_data
yield from self._dataset_reader.read(self._input_file)
File ".../mtl_sd_venv/lib/python3.7/site-packages/allennlp/data/dataset_readers/multitask.py", line 31, in read
raise RuntimeError("This class is not designed to be called like this")
RuntimeError: This class is not designed to be called like this
据我了解,情况是这样的:
这个 RuntimeError is raised by the MultiTaskDatasetReader 因为 MultiTaskDatasetReader 的 read()
方法不应该被调用。 read()
方法只应为 MultiTaskDatasetReader.readers
中的特定 DatasetReader 调用。
MultiTaskDatasetReader 的 read() 方法被调用,因为在 jsonnet-config 中我指定了 DatasetsReaders 如下:
"dataset_reader": {
"type": "multitask",
"readers": {
"SemEval2016": {
"type": "SemEval2016",
"max_sequence_length": 509,
"token_indexers": {
"bert": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
},
"tokenizer": {
"type": "pretrained_transformer",
"model_name": "bert-base-cased"
}
}, ...
}
}
通常dataset_reader的type
表示数据集-reader class要实例化进行预测。但在这种情况下,type
仅指向 MultiTaskDatasetReader,它没有实现 read()
方法并包含多个 DatasetReader。
据我了解,在使用 allennlp predict
时,我需要以某种方式指定应该使用多个 DatasetReader 中的哪个。
问题是:
如何指定在执行allennlp predict
时应该使用哪个特定的DatasetReader(MultiTaskDatasetReader.readers
中的多个DatasetReader)?或者更一般地说:如何使用 MultiTaskDatasetReader 将 allennlp predict
变为 运行?
附加代码,为了完整起见: 预测器:
@Predictor.register('mtlsd_predictor')
class MTLSDPredictor(Predictor):
def predict(self, sentence: str) -> JsonDict:
return self.predict_json({'sentence': sentence})
@overrides
def _json_to_instance(self, json_dict: JsonDict) -> Instance:
target = json_dict['text1']
claim = json_dict['text2']
return self._dataset_reader.text_to_instance(target, claim)
这里有两个问题。一个是 AllenNLP 中的错误,已在 2.1.0 版中修复。另一个是@sinaj 在他的模型头上缺少 default_predictor
。