错误的 python 版本上的 ML 引擎批量预测 运行
ML Engine Batch Prediction running on wrong python version
所以我在 python 3.5 中注册了一个 tensorflow 模型,我想 运行 使用它进行批量预测作业。我的 API 请求正文如下所示:
{
"versionName": "XXXXX/v8_0QSZ",
"dataFormat": "JSON",
"inputPaths": [
"XXXXX"
],
"outputPath": "XXXXXX",
"region": "us-east1",
"runtimeVersion": "1.12",
"accelerator": {
"count": "1",
"type": "NVIDIA_TESLA_P100"
}
}
然后批量预测作业 运行s 和 returns "Job completed successfully.",然而,它完全不成功,并始终为每个输入抛出以下错误:
Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
我的问题是:
- 为什么批处理作业报告成功,而实际上它完全失败了?
- 在上面的异常中它提到了 python 2.7... 但是模型注册为 python 3.5 并且无法使用 [= 指定 python 版本38=]。为什么批量预测使用 2.7?
- 一般情况下我可以做些什么来完成这项工作?
- 这与我的加速器选项有什么关系吗?
批量预测开发人员的回复:“我们还没有正式支持 Python 3。但是,您遇到的问题是一个已知错误,它会影响 TF 1.11 和 1.12 的 GPU 运行时间
所以我在 python 3.5 中注册了一个 tensorflow 模型,我想 运行 使用它进行批量预测作业。我的 API 请求正文如下所示:
{
"versionName": "XXXXX/v8_0QSZ",
"dataFormat": "JSON",
"inputPaths": [
"XXXXX"
],
"outputPath": "XXXXXX",
"region": "us-east1",
"runtimeVersion": "1.12",
"accelerator": {
"count": "1",
"type": "NVIDIA_TESLA_P100"
}
}
然后批量预测作业 运行s 和 returns "Job completed successfully.",然而,它完全不成功,并始终为每个输入抛出以下错误:
Exception during running the graph: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node convolution_layer/conv1d/conv1d/Conv2D (defined at /usr/local/lib/python2.7/dist-packages/google/cloud/ml/prediction/frameworks/tf_prediction_lib.py:210) = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="VALID", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](convolution_layer/conv1d/conv1d/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, convolution_layer/conv1d/conv1d/ExpandDims_1)]] [[{{node Cast_6/_495}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_789_Cast_6", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
我的问题是:
- 为什么批处理作业报告成功,而实际上它完全失败了?
- 在上面的异常中它提到了 python 2.7... 但是模型注册为 python 3.5 并且无法使用 [= 指定 python 版本38=]。为什么批量预测使用 2.7?
- 一般情况下我可以做些什么来完成这项工作?
- 这与我的加速器选项有什么关系吗?
批量预测开发人员的回复:“我们还没有正式支持 Python 3。但是,您遇到的问题是一个已知错误,它会影响 TF 1.11 和 1.12 的 GPU 运行时间