磁盘不足 space
Out of disk space
当 运行在 AML 计算上使用 AML 管道时,我得到了这种错误:
我可以尝试重新启动集群,但这可能无法解决问题(如果没有节点累积存储,则应该清理。
Session ID: 933fc468-7a22-425d-aa1b-94eba5784faa
{"error":{"code":"ServiceError","message":"Job preparation failed: [Errno 28] No space left on device","detailsUri":null,"target":null,"details":[],"innerError":null,"debugInfo":{"type":"OSError","message":"[Errno 28] No space left on device","stackTrace":" File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 126, in <module>\n invoke()\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 97, in invoke\n extract_project(project_dir, options.project_zip, options.snapshots)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 60, in extract_project\n project_fetcher.fetch_project_snapshot(snapshot[\"Id\"], snapshot[\"PathStack\"])\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 72, in fetch_project_snapshot\n _download_tree(sas_tree, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 106, in _download_tree\n _download_tree(child, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 106, in _download_tree\n _download_tree(child, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 98, in _download_tree\n fh.write(response.read())\n","innerException":null,"data":null,"errorResponse":null}},"correlation":null,"environment":null,"location":null,"time":"0001-01-01T00:00:00+00:00"}
我希望这份工作 运行 应该如此。事实上,我已经检查了节点,节点确实有很多可用的硬盘 space :
root@4f57957ac829466a86bad4d4dc51fadd000001:~# df -kh Filesystem Size Used Avail Use% Mounted on
udev 28G 0 28G 0% /dev
tmpfs 5.6G 9.0M 5.5G 1% /run
/dev/sda1 125G 2.8G 122G 3% /
tmpfs 28G 0 28G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 28G 0 28G 0% /sys/fs/cgroup
/dev/sdb1 335G 6.7G 311G 3% /mnt
tmpfs 5.6G 0 5.6G 0% /run/user/1002
关于我应该检查什么的建议?
您似乎 运行 遇到了 Azure 文件共享限制。您可以使用以下示例代码将您的 运行 更改为使用 blob 存储,它可以并行扩展到大量作业 运行ning:
我们还致力于在 运行 作业之前或之后清理磁盘的功能。尚无预计到达时间。
当 运行在 AML 计算上使用 AML 管道时,我得到了这种错误:
我可以尝试重新启动集群,但这可能无法解决问题(如果没有节点累积存储,则应该清理。
Session ID: 933fc468-7a22-425d-aa1b-94eba5784faa
{"error":{"code":"ServiceError","message":"Job preparation failed: [Errno 28] No space left on device","detailsUri":null,"target":null,"details":[],"innerError":null,"debugInfo":{"type":"OSError","message":"[Errno 28] No space left on device","stackTrace":" File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 126, in <module>\n invoke()\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 97, in invoke\n extract_project(project_dir, options.project_zip, options.snapshots)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1-setup/job_prep.py\", line 60, in extract_project\n project_fetcher.fetch_project_snapshot(snapshot[\"Id\"], snapshot[\"PathStack\"])\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 72, in fetch_project_snapshot\n _download_tree(sas_tree, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 106, in _download_tree\n _download_tree(child, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 106, in _download_tree\n _download_tree(child, path_stack)\n File \"/mnt/batch/tasks/shared/LS_root/jobs/jj2/azureml/piperun-20190911_1568231788841835_1/mounts/workspacefilestore/azureml/PipeRun-20190911_1568231788841835_1/azureml-setup/project_fetcher.py\", line 98, in _download_tree\n fh.write(response.read())\n","innerException":null,"data":null,"errorResponse":null}},"correlation":null,"environment":null,"location":null,"time":"0001-01-01T00:00:00+00:00"}
我希望这份工作 运行 应该如此。事实上,我已经检查了节点,节点确实有很多可用的硬盘 space :
root@4f57957ac829466a86bad4d4dc51fadd000001:~# df -kh Filesystem Size Used Avail Use% Mounted on
udev 28G 0 28G 0% /dev
tmpfs 5.6G 9.0M 5.5G 1% /run
/dev/sda1 125G 2.8G 122G 3% /
tmpfs 28G 0 28G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 28G 0 28G 0% /sys/fs/cgroup
/dev/sdb1 335G 6.7G 311G 3% /mnt
tmpfs 5.6G 0 5.6G 0% /run/user/1002
关于我应该检查什么的建议?
您似乎 运行 遇到了 Azure 文件共享限制。您可以使用以下示例代码将您的 运行 更改为使用 blob 存储,它可以并行扩展到大量作业 运行ning:
我们还致力于在 运行 作业之前或之后清理磁盘的功能。尚无预计到达时间。