Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1

Question

我正在尝试在 AllenNLP 上训练我自己的自定义 ELMo 模型。

训练模型时出现以下错误RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1。在某些情况下，张量 a 的大小被指定为其他值（例如 5300）。当我在一小部分文件上进行测试时，我能够成功地训练模型。

根据我的直觉，这是处理我模型中标记数量的东西。更具体地说，具有超过 5000 个标记的特定文件。但是，AllenNLP 包中没有参数允许我调整它来绕过这个错误。

关于如何克服这个问题有什么建议吗？会调整 PyTorch 代码以将其设置为 5000 大小吗（如果是，我该怎么做）？任何见解将不胜感激。

仅供参考，我目前正在使用自定义的 DatasetReader 进行标记化。我在训练模型之前生成了自己的词汇列表（以节省一些时间），该列表用于通过 AllenNLP 训练 ELMo 模型。

更新：我发现 AllenNLP max_len=5000 中有这个变量，这就是显示错误的原因。请参阅代码 here。我已经将参数调整为更大的值，但在很多情况下都以 CUDA 内存不足错误告终。让我相信这不应该被触及。

环境： Python 3.6.9，Linux Ubuntu，allennlp=2.9.1，allennlp-models=2.9.0

回溯：

Traceback (most recent call last):
  File "/home/jiayi/.local/bin/allennlp", line 8, in <module>
    sys.exit(run())
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/__main__.py", line 34, in run
    main(prog="allennlp")
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/__init__.py", line 121, in main
    args.func(args)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 120, in train_model_from_args
    file_friendly_logging=args.file_friendly_logging,
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 179, in train_model_from_file
    file_friendly_logging=file_friendly_logging,
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 246, in train_model
    file_friendly_logging=file_friendly_logging,
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 470, in _train_worker
    metrics = train_loop.run()
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/commands/train.py", line 543, in run
    return self.trainer.train()
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 720, in train
    metrics, epoch = self._try_train()
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 741, in _try_train
    train_metrics = self._train_epoch(epoch)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 459, in _train_epoch
    batch_outputs = self.batch_outputs(batch, for_training=True)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp/training/gradient_descent_trainer.py", line 352, in batch_outputs
    output_dict = self._pytorch_model(**batch)
  File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/models/language_model.py", line 257, in forward
    embeddings, mask
  File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 282, in forward
    token_embeddings = self._position(token_embeddings)
  File "/home/jiayi/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/jiayi/.local/lib/python3.6/site-packages/allennlp_models/lm/modules/seq2seq_encoders/bidirectional_lm_transformer.py", line 68, in forward
    return x + self.positional_encoding[:, : x.size(1)]
RuntimeError: The size of tensor a (5385) must match the size of tensor b (5000) at non-singleton dimension 1

AllenNLP 训练配置文件：

// For more info on config files generally, see https://guide.allennlp.org/using-config-files

local NUM_GRAD_ACC = 4;
local BATCH_SIZE = 1;

local BASE_LOADER = {
  "max_instances_in_memory": 8,
  "batch_sampler": {
    "type": "bucket",
    "batch_size": BATCH_SIZE,
    "sorting_keys": ["source"]
  }
};

{
    "dataset_reader" : {
        "type": "mimic_reader",
        "token_indexers": {
            "tokens": {
                "type": "single_id"
            },
            "token_characters": {
                "type": "elmo_characters"
            }
        },
        "start_tokens": ["<S>"],
        "end_tokens": ["</S>"],
    },
    "train_data_path": std.extVar("MIMIC3_NOTEEVENTS_DISCHARGE_PATH"),
    // Note: We don't set a validation_data_path because the softmax is only
    // sampled during training. Not sampling on GPUs results in a certain OOM
    // given our large vocabulary. We'll need to evaluate against the test set
    // (when we'll want a full softmax) with the CPU.
    "vocabulary": {
        // Use a prespecified vocabulary for efficiency.
        "type": "from_files",
        "directory": std.extVar("ELMO_VOCAB_PATH"),
        // Plausible config for generating the vocabulary.
        // "tokens_to_add": {
        //     "tokens": ["<S>", "</S>"],
        //     "token_characters": ["<>/S"]
        // },
        // "min_count": {"tokens": 3}
    },
    "model": {
        "type": "language_model",
        "bidirectional": true,
        "num_samples": 8192,
        # Sparse embeddings don't work with DistributedDataParallel.
        "sparse_embeddings": false,
        "text_field_embedder": {
        "token_embedders": {
            "tokens": {
            "type": "empty"
            },
            "token_characters": {
                "type": "character_encoding",
                "embedding": {
                    "num_embeddings": 262,
                    // Same as the Transformer ELMo in Calypso. Matt reports that
                    // this matches the original LSTM ELMo as well.
                    "embedding_dim": 16
                },
                "encoder": {
                    "type": "cnn-highway",
                    "activation": "relu",
                    "embedding_dim": 16,
                    "filters": [
                        [1, 32],
                        [2, 32],
                        [3, 64],
                        [4, 128],
                        [5, 256],
                        [6, 512],
                        [7, 1024]],
                    "num_highway": 2,
                    "projection_dim": 512,
                    "projection_location": "after_highway",
                    "do_layer_norm": true
                }
            }
        }
        },
        // Consider the following.
        // remove_bos_eos: true,
        // Applies to the contextualized embeddings.
        "dropout": 0.1,
        "contextualizer": {
            "type": "bidirectional_language_model_transformer",
            "input_dim": 512,
            "hidden_dim": 4096,
            "num_layers": 2,
            "dropout": 0.1,
            "input_dropout": 0.1
        }
    },
    "data_loader": BASE_LOADER,
    // "distributed": {
    //     "cuda_devices": [0, 1],
    // },
    "trainer": {
        "num_epochs": 10,
        "cuda_devices": [0, 1, 2, 3],
        "optimizer": {
        // The gradient accumulators in Adam for the running stdev and mean for
        // words not used in the sampled softmax would be decayed to zero with the
        // standard "adam" optimizer.
        "type": "dense_sparse_adam"
        },
        // "grad_norm": 10.0,
        "learning_rate_scheduler": {
        "type": "noam",
        // See https://github.com/allenai/calypso/blob/master/calypso/train.py#L401
        "model_size": 512,
        // See https://github.com/allenai/calypso/blob/master/bin/train_transformer_lm1b.py#L51.
        // Adjusted based on our sample size relative to Calypso's.
        "warmup_steps": 6000
        },
        "num_gradient_accumulation_steps": NUM_GRAD_ACC,
        "use_amp": true
    }
}

Answer 1

通过将构建的自定义 DatasetReader 的 max_tokens 变量设置为低于 5000，此错误不再存在。 AllenNLP 的一位贡献者也建议这样做，以确保分词器将输入截断为 5000 个分词。

AllenNLP 上发布了同样的问题：https://github.com/allenai/allennlp/discussions/5601

Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1

Error training ELMo - RuntimeError: The size of tensor a (5158) must match the size of tensor b (5000) at non-singleton dimension 1

python

machine-learning

pytorch

allennlp

elmo