Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
我正在尝试在 AllenNLP 上训练我自己的自定义 ELMo 模型。
训练模型时出现以下错误RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
。在某些情况下,张量 a 的大小被指定为其他值(例如 5300)。当我在一小部分文件上进行测试时,我能够成功地训练模型。
根据我的直觉,这是处理我模型中标记数量的东西。更具体地说,具有超过 5000 个标记的特定文件。但是,AllenNLP 包中没有参数允许我调整它来绕过这个错误。
关于如何克服这个问题有什么建议吗?会调整 PyTorch 代码以将其设置为 5000 大小吗(如果是,我该怎么做)?任何见解将不胜感激。
仅供参考,我目前正在使用自定义的 DatasetReader 进行标记化。我在训练模型之前生成了自己的词汇列表(以节省一些时间),该列表用于通过 AllenNLP 训练 ELMo 模型。
更新:我发现 AllenNLP max_len=5000
中有这个变量,这就是显示错误的原因。请参阅代码 here。我已经将参数调整为更大的值,但在很多情况下都以 CUDA 内存不足错误告终。让我相信这不应该被触及。
环境: Python 3.6.9,Linux Ubuntu,allennlp=2.9.1,allennlp-models=2.9.0
回溯:
Traceback (most recent call last):
File "/home/jiayi/.local/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
args.func(args)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
metrics = train_loop.run()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
return self.trainer.train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
metrics, epoch = self._try_train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
train_metrics = self._train_epoch(epoch)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
embeddings, mask
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
token_embeddings = self._position(token_embeddings)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
return x + self.positional_encoding[:, : x.size(1)]
RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1
AllenNLP 训练配置文件:
// For more info on config files generally, see https://guide.allennlp.org/using-config-files
local NUM_GRAD_ACC = 4;
local BATCH_SIZE = 1;
local BASE_LOADER = {
"max_instances_in_memory": 8,
"batch_sampler": {
"type": "bucket",
"batch_size": BATCH_SIZE,
"sorting_keys": ["source"]
}
};
{
"dataset_reader" : {
"type": "mimic_reader",
"token_indexers": {
"tokens": {
"type": "single_id"
},
"token_characters": {
"type": "elmo_characters"
}
},
"start_tokens": ["<S>"],
"end_tokens": ["</S>"],
},
"train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
// Note: We don't set a validation_data_path because the softmax is only
// sampled during training. Not sampling on GPUs results in a certain OOM
// given our large vocabulary. We'll need to evaluate against the test set
// (when we'll want a full softmax) with the CPU.
"vocabulary": {
// Use a prespecified vocabulary for efficiency.
"type": "from_files",
"directory": std.extVar("ELMO_VOCAB_PATH"),
// Plausible config for generating the vocabulary.
// "tokens_to_add": {
// "tokens": ["<S>", "</S>"],
// "token_characters": ["<>/S"]
// },
// "min_count": {"tokens": 3}
},
"model": {
"type": "language_model",
"bidirectional": true,
"num_samples": 8192,
# Sparse embeddings don't work with DistributedDataParallel.
"sparse_embeddings": false,
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "empty"
},
"token_characters": {
"type": "character_encoding",
"embedding": {
"num_embeddings": 262,
// Same as the Transformer ELMo in Calypso. Matt reports that
// this matches the original LSTM ELMo as well.
"embedding_dim": 16
},
"encoder": {
"type": "cnn-highway",
"activation": "relu",
"embedding_dim": 16,
"filters": [
[1, 32],
[2, 32],
[3, 64],
[4, 128],
[5, 256],
[6, 512],
[7, 1024]],
"num_highway": 2,
"projection_dim": 512,
"projection_location": "after_highway",
"do_layer_norm": true
}
}
}
},
// Consider the following.
// remove_bos_eos: true,
// Applies to the contextualized embeddings.
"dropout": 0.1,
"contextualizer": {
"type": "bidirectional_language_model_transformer",
"input_dim": 512,
"hidden_dim": 4096,
"num_layers": 2,
"dropout": 0.1,
"input_dropout": 0.1
}
},
"data_loader": BASE_LOADER,
// "distributed": {
// "cuda_devices": [0, 1],
// },
"trainer": {
"num_epochs": 10,
"cuda_devices": [0, 1, 2, 3],
"optimizer": {
// The gradient accumulators in Adam for the running stdev and mean for
// words not used in the sampled softmax would be decayed to zero with the
// standard "adam" optimizer.
"type": "dense_sparse_adam"
},
// "grad_norm": 10.0,
"learning_rate_scheduler": {
"type": "noam",
// See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
"model_size": 512,
// See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
// Adjusted based on our sample size relative to Calypso's.
"warmup_steps": 6000
},
"num_gradient_accumulation_steps": NUM_GRAD_ACC,
"use_amp": true
}
}
通过将构建的自定义 DatasetReader 的 max_tokens
变量设置为低于 5000,此错误不再存在。 AllenNLP 的一位贡献者也建议这样做,以确保分词器将输入截断为 5000 个分词。
AllenNLP 上发布了同样的问题:https://github.com/allenai/allennlp/discussions/5601
我正在尝试在 AllenNLP 上训练我自己的自定义 ELMo 模型。
训练模型时出现以下错误RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1
。在某些情况下,张量 a 的大小被指定为其他值(例如 5300)。当我在一小部分文件上进行测试时,我能够成功地训练模型。
根据我的直觉,这是处理我模型中标记数量的东西。更具体地说,具有超过 5000 个标记的特定文件。但是,AllenNLP 包中没有参数允许我调整它来绕过这个错误。
关于如何克服这个问题有什么建议吗?会调整 PyTorch 代码以将其设置为 5000 大小吗(如果是,我该怎么做)?任何见解将不胜感激。
仅供参考,我目前正在使用自定义的 DatasetReader 进行标记化。我在训练模型之前生成了自己的词汇列表(以节省一些时间),该列表用于通过 AllenNLP 训练 ELMo 模型。
更新:我发现 AllenNLP max_len=5000
中有这个变量,这就是显示错误的原因。请参阅代码 here。我已经将参数调整为更大的值,但在很多情况下都以 CUDA 内存不足错误告终。让我相信这不应该被触及。
环境: Python 3.6.9,Linux Ubuntu,allennlp=2.9.1,allennlp-models=2.9.0
回溯:
Traceback (most recent call last):
File "/home/jiayi/.local/bin/allennlp", line 8, in <module>
sys.exit(run())
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
main(prog="allennlp")
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
args.func(args)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
file_friendly_logging=args.file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
file_friendly_logging=file_friendly_logging,
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
metrics = train_loop.run()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
return self.trainer.train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
metrics, epoch = self._try_train()
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
train_metrics = self._train_epoch(epoch)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
batch_outputs = self.batch_outputs(batch, for_training=True)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
output_dict = self._pytorch_model(**batch)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
embeddings, mask
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
token_embeddings = self._position(token_embeddings)
File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
return x + self.positional_encoding[:, : x.size(1)]
RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1
AllenNLP 训练配置文件:
// For more info on config files generally, see https://guide.allennlp.org/using-config-files
local NUM_GRAD_ACC = 4;
local BATCH_SIZE = 1;
local BASE_LOADER = {
"max_instances_in_memory": 8,
"batch_sampler": {
"type": "bucket",
"batch_size": BATCH_SIZE,
"sorting_keys": ["source"]
}
};
{
"dataset_reader" : {
"type": "mimic_reader",
"token_indexers": {
"tokens": {
"type": "single_id"
},
"token_characters": {
"type": "elmo_characters"
}
},
"start_tokens": ["<S>"],
"end_tokens": ["</S>"],
},
"train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
// Note: We don't set a validation_data_path because the softmax is only
// sampled during training. Not sampling on GPUs results in a certain OOM
// given our large vocabulary. We'll need to evaluate against the test set
// (when we'll want a full softmax) with the CPU.
"vocabulary": {
// Use a prespecified vocabulary for efficiency.
"type": "from_files",
"directory": std.extVar("ELMO_VOCAB_PATH"),
// Plausible config for generating the vocabulary.
// "tokens_to_add": {
// "tokens": ["<S>", "</S>"],
// "token_characters": ["<>/S"]
// },
// "min_count": {"tokens": 3}
},
"model": {
"type": "language_model",
"bidirectional": true,
"num_samples": 8192,
# Sparse embeddings don't work with DistributedDataParallel.
"sparse_embeddings": false,
"text_field_embedder": {
"token_embedders": {
"tokens": {
"type": "empty"
},
"token_characters": {
"type": "character_encoding",
"embedding": {
"num_embeddings": 262,
// Same as the Transformer ELMo in Calypso. Matt reports that
// this matches the original LSTM ELMo as well.
"embedding_dim": 16
},
"encoder": {
"type": "cnn-highway",
"activation": "relu",
"embedding_dim": 16,
"filters": [
[1, 32],
[2, 32],
[3, 64],
[4, 128],
[5, 256],
[6, 512],
[7, 1024]],
"num_highway": 2,
"projection_dim": 512,
"projection_location": "after_highway",
"do_layer_norm": true
}
}
}
},
// Consider the following.
// remove_bos_eos: true,
// Applies to the contextualized embeddings.
"dropout": 0.1,
"contextualizer": {
"type": "bidirectional_language_model_transformer",
"input_dim": 512,
"hidden_dim": 4096,
"num_layers": 2,
"dropout": 0.1,
"input_dropout": 0.1
}
},
"data_loader": BASE_LOADER,
// "distributed": {
// "cuda_devices": [0, 1],
// },
"trainer": {
"num_epochs": 10,
"cuda_devices": [0, 1, 2, 3],
"optimizer": {
// The gradient accumulators in Adam for the running stdev and mean for
// words not used in the sampled softmax would be decayed to zero with the
// standard "adam" optimizer.
"type": "dense_sparse_adam"
},
// "grad_norm": 10.0,
"learning_rate_scheduler": {
"type": "noam",
// See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
"model_size": 512,
// See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
// Adjusted based on our sample size relative to Calypso's.
"warmup_steps": 6000
},
"num_gradient_accumulation_steps": NUM_GRAD_ACC,
"use_amp": true
}
}
通过将构建的自定义 DatasetReader 的 max_tokens
变量设置为低于 5000,此错误不再存在。 AllenNLP 的一位贡献者也建议这样做,以确保分词器将输入截断为 5000 个分词。
AllenNLP 上发布了同样的问题:https://github.com/allenai/allennlp/discussions/5601