AWS tensorboard 分段错误(核心转储)
AWS tensorboard Segmentation fault (core dumped)
我正在尝试使用 tensorboardX 调试 AWS p2.xlarge 实例中 运行 的 pytorch NN。
我按照this tutorial打开了6006端口
模型是 运行,tensorboardX 正在制作其编写器文件。我在那里收到以下警告。我不确定它有多重要。
WARNING:root:tuple appears in op that does not forward tuples
(VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117)
frame #0: std::function::operator()() const + 0x11
(0x7fbe3dd04441 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) +
0x2a (0x7fbe3dd03d7a in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #3: + 0xaf6464 (0x7fbe3cdc4464 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #4:
torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13
(0x7fbe3cdc44a3 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x130cfc (0x7fbe7d003cfc in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #40: __libc_start_main + 0xf0
(0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
WARNING:root:tuple appears in op that does not forward tuples
(VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117)
frame #0: std::function::operator()() const + 0x11
(0x7fbe3dd04441 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) +
0x2a (0x7fbe3dd03d7a in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so)
frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #3: + 0xaf6464 (0x7fbe3cdc4464 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #4:
torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13
(0x7fbe3cdc44a3 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1)
frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x130cfc (0x7fbe7d003cfc in
/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
frame #40: __libc_start_main + 0xf0
(0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
问题是我无法访问 tensorboard 浏览器用户界面。我采取以下步骤:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate pytorch_p36
$ tensorboard --logdir=runs
我在哪里收到错误消息:
Segmentation fault (core dumped)
当我检查系统日志时 var/log/syslog
我看到以下内容:
Jun 26 09:06:40 ip-172-xx-xx-xxx kernel: [515315.598917] tensorboard[1446]: segfault at 0 ip (null) sp 00007ffd64c5f178 error 14 in python2.7[55d8673d1000+1000]
我的谷歌搜索技术远远不够。如何在 ASW 实例中使用它 运行 通过浏览器访问 tensorboard?
如果有什么不清楚或缺少某些信息,请告诉我。
尽管代码必须在环境 pytorch_p36 中 运行,但 tensorboard 实际上必须在不同的环境中 运行。
终端中的命令顺序应该是:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate tensorflow_p27
$ tensorboard --logdir=runs
然后打开指定的端口
我正在尝试使用 tensorboardX 调试 AWS p2.xlarge 实例中 运行 的 pytorch NN。
我按照this tutorial打开了6006端口
模型是 运行,tensorboardX 正在制作其编写器文件。我在那里收到以下警告。我不确定它有多重要。
WARNING:root:tuple appears in op that does not forward tuples (VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117) frame #0: std::function::operator()() const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #4: torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x130cfc (0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
WARNING:root:tuple appears in op that does not forward tuples (VisitNode at /pytorch/torch/csrc/jit/passes/lower_tuples.cpp:117) frame #0: std::function::operator()() const + 0x11 (0x7fbe3dd04441 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7fbe3dd03d7a in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libc10.so) frame #2: + 0xaf61f5 (0x7fbe3cdc41f5 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #3: + 0xaf6464 (0x7fbe3cdc4464 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #4: torch::jit::LowerAllTuples(std::shared_ptr&) + 0x13 (0x7fbe3cdc44a3 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch.so.1) frame #5: + 0x3f84b4 (0x7fbe7d2cb4b4 in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #6: + 0x130cfc (0x7fbe7d003cfc in /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/lib/libtorch_python.so) frame #40: __libc_start_main + 0xf0 (0x7fbe8d69c830 in /lib/x86_64-linux-gnu/libc.so.6)
问题是我无法访问 tensorboard 浏览器用户界面。我采取以下步骤:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate pytorch_p36
$ tensorboard --logdir=runs
我在哪里收到错误消息:
Segmentation fault (core dumped)
当我检查系统日志时 var/log/syslog
我看到以下内容:
Jun 26 09:06:40 ip-172-xx-xx-xxx kernel: [515315.598917] tensorboard[1446]: segfault at 0 ip (null) sp 00007ffd64c5f178 error 14 in python2.7[55d8673d1000+1000]
我的谷歌搜索技术远远不够。如何在 ASW 实例中使用它 运行 通过浏览器访问 tensorboard?
如果有什么不清楚或缺少某些信息,请告诉我。
尽管代码必须在环境 pytorch_p36 中 运行,但 tensorboard 实际上必须在不同的环境中 运行。
终端中的命令顺序应该是:
$ cd PATH_TO_FOLDER_CONTAINING_runs
$ source activate tensorflow_p27
$ tensorboard --logdir=runs
然后打开指定的端口