MultiprocessIterator 在更改 batch_size 时抛出错误

MultiprocessIterator throws error when changing batch_size

我想用 ChainerCV 训练 Faster R-CNN。作为第一个测试,我主要复制了提供的 example, I only changed the lines corresponding the dataset to use my custom dataset. I checked if my dataset is fully functional with all operations discribed in this tutorial

如果我 运行 没有更改的脚本一切正常,但如果我将 batch_size I get an error. I tried increasing the shared_mem 从 100 MB 更改为 1000 MB,但错误并没有消失。

设置 batch_size=2 时出错:

Exception in main training loop: all the input array dimensions except for the concatenation axis must match exactly
Traceback (most recent call last):
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 171, in update_core
    in_arrays = self.converter(batch, self.device)
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 134, in concat_examples
    [example[i] for example in batch], padding[i])))
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 164, in _concat_arrays
    return xp.concatenate([array[None] for array in arrays])
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "/home/cv/ChainerCV/faster_rcnn/train.py", line 131, in <module>
    main()
  File "/home/cv/ChainerCV/faster_rcnn/train.py", line 126, in main
    trainer.run()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/trainer.py", line 315, in run
    update()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 165, in update
    self.update_core()
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/training/updaters/standard_updater.py", line 171, in update_core
    in_arrays = self.converter(batch, self.device)
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 134, in concat_examples
    [example[i] for example in batch], padding[i])))
  File "/home/cv/anaconda3/envs/chainer/lib/python3.6/site-packages/chainer/dataset/convert.py", line 164, in _concat_arrays
    return xp.concatenate([array[None] for array in arrays])
ValueError: all the input array dimensions except for the concatenation axis must match exactly

系统信息:

__Hardware Information__
Machine                                       : x86_64
CPU Name                                      : skylake
Number of accessible CPU cores                : 8

__OS Information__
Platform                                      : Linux-4.15.0-45-generic-x86_64-with-debian-stretch-sid
Release                                       : 4.15.0-45-generic
System Name                                   : Linux
Version                                       : #48~16.04.1-Ubuntu SMP Tue Jan 29 18:03:48 UTC 2019
OS specific info                              : debianstretch/sid
glibc info                                    : glibc 2.10

__CUDA Information__
Found 1 CUDA devices
id 0     b'GeForce GTX 1080'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 1
Summary:
    1/1 devices are supported
CUDA driver version                           : 10000

__Conda Information__
conda_build_version                           : 3.17.6
conda_env_version                             : 4.6.3
platform                                      : linux-64
python_version                                : 3.7.1.final.0

编辑:当 运行 将 example 与 batch_size=2 连接时,也会发生错误。

self.converter 假设 batch 的第一个参数由具有相同形状的输入组成。例如,如果您使用图像数据集,则所有图像都应该具有 (C, H, W) 的形状。

那么,你能检查你的数据集 returns 相同形状的图像吗? 如果你的数据集有各种形状的图像,那么使用 TransformDataset 怎么样 https://github.com/chainer/chainercv/blob/df63b74ef20f9d8c830e266881e577dd05c17442/examples/faster_rcnn/train.py#L86?

在尝试修复错误时,我得到了另一个 error

ValueError: Currently only batch size 1 is supported.

等待似乎是解决办法。

当前的 Faster-RCNN 实现不支持多批次训练,但您可以像下面的代码一样重写它以支持它。 https://github.com/knorth55/chainer-light-head-rcnn/blob/master/light_head_rcnn/links/model/light_head_rcnn_train_chain.py

另一种选择是在 ChainerCV 中使用 Faster-RCNN 和 FPN。 最新版本的 ChainerCV 具有 Faster-RCNN with FPN,支持多批次训练。 https://github.com/chainer/chainercv/blob/master/examples/fpn/train_multi.py