Tensorflow 不对一个数据集使用 GPU,而对非常相似的数据集却使用 GPU
Tensorflow not using GPU for one dataset, where it does for a very similar dataset
我正在使用 TensorFlow 使用来自两个来源的数据来训练模型。对于这两个来源,训练和验证数据的形状几乎相同,整个数据类型都是 np.float32。
奇怪的是,当我使用第一个数据集时,我机器上的 GPU 被使用,但是当使用第二个数据集时,GPU 没有被使用。
有人对如何调查有什么建议吗?
print(s1_train_data.shape)
print(s1_train_data.values)
(1165032, 941)
[[ 0.45031181 -0.99680316 0.63686389 ..., 0.22323072 -0.37929842 0. ]
[-0.40660214 0.34022757 -0.00710014 ..., -1.43051076 -0.14785887 1. ]
[ 0.03955967 -0.91227823 0.37887612 ..., 0.16451506 -1.02560401 0. ]
...,
[ 0.11746094 -0.18229018 0.43319091 ..., 0.36532226 -0.48208624 0. ]
[ 0.110379 -1.07364404 0.42837444 ..., 0.74732345 0.92880726 0. ]
[-0.81027234 -1.04290771 -0.56407243 ..., 0.25084609 -0.1797282 1. ]]
print(s2_train_data.shape)
print(s2_train_data.values)
(559873, 941)
[[ 0. 0. 0. ..., -1.02008295 0.27371082 0. ]
[ 0. 0. 0. ..., -0.74775815 0.18743835 0. ]
[ 0. 0. 0. ..., 0.6469788 0.67864949 1. ]
...,
[ 0. 0. 0. ..., -0.88198501 -0.02421325 1. ]
[ 0. 0. 0. ..., 0.28361112 -1.08478808 1. ]
[ 0. 0. 0. ..., 0.22360609 0.50698668 0. ]]
编辑。这是 log_device_placement=True 的日志片段。
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7578380
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:04.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7c54b10
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:05.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x65bb1d0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:06.0
Total memory: 4.00GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: N N Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: N N N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GRID K520, pci bus id: 0000:00:04.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GRID K520, pci bus id: 0000:00:05.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GRID K520, pci bus id: 0000:00:06.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0
WARNING:tensorflow:From tf.py:183 in get_session.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
gradients_3/add_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/add_2_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_2_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0
它似乎确实将任务放在了 GPU 上,但是我仍然在 nvidia-smi 监视器中看到几乎完全为 0% 的 GPU-Util。
pandas数据帧当然在内存中。是否有任何其他 IO 可能会影响此过程?
编辑 2:我捕获了快数据集和慢数据集的 log_device_placement 日志。它们是相同的,即使在一种情况下 GPU 使用率为 25%,而另一种情况下为 0%。现在真是抓耳挠腮....
缓慢的原因是支持 DataFrame 的 ndarray 的内存布局。 s2 数据是 column-major 意味着每行特征和目标不连续。
这个操作改变了内存布局:
s2_train_data = s2_train_data.values.copy(order='C')
现在 GPU 运行 利用率为 26%。快乐的日子:)
我正在使用 TensorFlow 使用来自两个来源的数据来训练模型。对于这两个来源,训练和验证数据的形状几乎相同,整个数据类型都是 np.float32。
奇怪的是,当我使用第一个数据集时,我机器上的 GPU 被使用,但是当使用第二个数据集时,GPU 没有被使用。
有人对如何调查有什么建议吗?
print(s1_train_data.shape)
print(s1_train_data.values)
(1165032, 941)
[[ 0.45031181 -0.99680316 0.63686389 ..., 0.22323072 -0.37929842 0. ]
[-0.40660214 0.34022757 -0.00710014 ..., -1.43051076 -0.14785887 1. ]
[ 0.03955967 -0.91227823 0.37887612 ..., 0.16451506 -1.02560401 0. ]
...,
[ 0.11746094 -0.18229018 0.43319091 ..., 0.36532226 -0.48208624 0. ]
[ 0.110379 -1.07364404 0.42837444 ..., 0.74732345 0.92880726 0. ]
[-0.81027234 -1.04290771 -0.56407243 ..., 0.25084609 -0.1797282 1. ]]
print(s2_train_data.shape)
print(s2_train_data.values)
(559873, 941)
[[ 0. 0. 0. ..., -1.02008295 0.27371082 0. ]
[ 0. 0. 0. ..., -0.74775815 0.18743835 0. ]
[ 0. 0. 0. ..., 0.6469788 0.67864949 1. ]
...,
[ 0. 0. 0. ..., -0.88198501 -0.02421325 1. ]
[ 0. 0. 0. ..., 0.28361112 -1.08478808 1. ]
[ 0. 0. 0. ..., 0.22360609 0.50698668 0. ]]
编辑。这是 log_device_placement=True 的日志片段。
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:03.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7578380
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 1 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:04.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x7c54b10
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 2 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:05.0
Total memory: 4.00GiB
Free memory: 3.95GiB
W tensorflow/stream_executor/cuda/cuda_driver.cc:590] creating context when one is currently active; existing: 0x65bb1d0
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 3 with properties:
name: GRID K520
major: 3 minor: 0 memoryClockRate (GHz) 0.797
pciBusID 0000:00:06.0
Total memory: 4.00GiB
Free memory: 3.95GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 0 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 1 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 2 and 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 1
I tensorflow/core/common_runtime/gpu/gpu_device.cc:777] Peer access not supported between device ordinals 3 and 2
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 1 2 3
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y N N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 1: N Y N N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 2: N N Y N
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 3: N N N Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GRID K520, pci bus id: 0000:00:03.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GRID K520, pci bus id: 0000:00:04.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:2) -> (device: 2, name: GRID K520, pci bus id: 0000:00:05.0)
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:3) -> (device: 3, name: GRID K520, pci bus id: 0000:00:06.0)
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0
I tensorflow/core/common_runtime/direct_session.cc:255] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GRID K520, pci bus id: 0000:00:03.0
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: GRID K520, pci bus id: 0000:00:04.0
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: GRID K520, pci bus id: 0000:00:05.0
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: GRID K520, pci bus id: 0000:00:06.0
WARNING:tensorflow:From tf.py:183 in get_session.: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.
gradients_3/add_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/add_2_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/add_2_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/Tile_grad/range: (Range)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/Mean_1_grad/truediv_grad/Shape_1: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const): /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:821] gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/Size: (Const)/job:localhost/replica:0/task:0/gpu:0
gradients_3/gradients_2/logistic_loss_1_grad/Sum_grad/range: (Range): /job:localhost/replica:0/task:0/gpu:0
它似乎确实将任务放在了 GPU 上,但是我仍然在 nvidia-smi 监视器中看到几乎完全为 0% 的 GPU-Util。
pandas数据帧当然在内存中。是否有任何其他 IO 可能会影响此过程?
编辑 2:我捕获了快数据集和慢数据集的 log_device_placement 日志。它们是相同的,即使在一种情况下 GPU 使用率为 25%,而另一种情况下为 0%。现在真是抓耳挠腮....
缓慢的原因是支持 DataFrame 的 ndarray 的内存布局。 s2 数据是 column-major 意味着每行特征和目标不连续。
这个操作改变了内存布局:
s2_train_data = s2_train_data.values.copy(order='C')
现在 GPU 运行 利用率为 26%。快乐的日子:)