奇点容器中出现错误 "no free space in /var/cache/apt/archives",但磁盘未满

Error "no free space in /var/cache/apt/archives" in singularity container, but disk not full

我正在尝试重现一篇较早的研究论文的结果,需要 运行 一个带有 nvidia CUDA 9.0 和 torch 1.2.0 的奇点容器。 在本地,我有 Ubuntu 20.04 作为 VM,其中我 运行 singularity build。我按照 guide 安装旧的 CUDA 版本。 这是食谱文件

#header
Bootstrap: docker
From: nvidia/cuda:9.0-runtime-ubuntu16.04

#Sections

%files
/home/timaie/rkn_tcml/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
/home/timaie/rkn_tcml/RKN/*

%post

# necessary dependencies
pip install numpy scipy scikit-learn biopython pandas

dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb

apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
apt-get autoclean
apt-get autoremove
apt-get update

export CUDA_HOME="/usr/local/cuda-9.0"
export TORCH_EXTENSIONS_DIR="$PWD/tmp"
export PYTHONPATH=$PWD:$PYTHONPATH

%runscript
cd experiments
python train_scop.py --pooling max --embedding blosum62 --kmer-size 14 --alternating --sigma 0.4 --tfid 0

哪个 运行 很好,给我一个 image.simg 文件。然后我尝试通过 sudo singularity exec image.simg apt-get install cuda 安装 cuda 产生以下错误

0 upgraded, 823 newly installed, 0 to remove and 1 not upgraded.
Need to get 2661 MB of archives.
After this operation, 6822 MB of additional disk space will be used.
W: Not using locking for read only lock file /var/lib/dpkg/lock-frontend
W: Not using locking for read only lock file /var/lib/dpkg/lock
W: chown to _apt:root of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (30: Read-only file system)
W: chmod 0700 of directory /var/cache/apt/archives/partial failed - SetupAPTPartialDirectory (30: Read-only file system)
W: Not using locking for read only lock file /var/cache/apt/archives/lock
E: You don't have enough free space in /var/cache/apt/archives/.

我在 docker 中读到了类似的问题,但我不知道关于 Singularity 的类似 docker system prune 的问题。

我也尝试通过 apt autoremoveapt autoclean 释放 space 但没有成功。

磁盘上应该有足够的 space 剩余,因为 运行ning df -H 给出

Filesystem      Size  Used Avail Use% Mounted on
udev            2,1G     0  2,1G   0% /dev
tmpfs           412M  1,4M  411M   1% /run
/dev/sda5        54G   19G   33G  36% /
tmpfs           2,1G     0  2,1G   0% /dev/shm
tmpfs           5,3M  4,1k  5,3M   1% /run/lock
tmpfs           2,1G     0  2,1G   0% /sys/fs/cgroup
/dev/loop0      132k  132k     0 100% /snap/bare/5
/dev/loop1       66M   66M     0 100% /snap/core20/1328
/dev/loop2      261M  261M     0 100% /snap/gnome-3-38-2004/99
/dev/loop3       66M   66M     0 100% /snap/core20/1405
/dev/loop4       69M   69M     0 100% /snap/gtk-common-themes/1519
/dev/loop5       46M   46M     0 100% /snap/snapd/15177
/dev/loop6       57M   57M     0 100% /snap/snap-store/558
/dev/loop7       46M   46M     0 100% /snap/snapd/14978
/dev/sda1       536M  4,1k  536M   1% /boot/efi
tmpfs           412M   25k  412M   1% /run/user/1000

有谁知道问题出在我的本地 Ubuntu 还是 nvidia docker 图像上?

感谢您的澄清。

singularity build 文档的 overview 部分所述

build can produce containers in two different formats that can be specified as follows.

  • compressed read-only Singularity Image File (SIF) format suitable for production (default)
  • writable (ch)root directory called a sandbox for interactive development (--sandbox option)

添加 --sandbox 应该可以使系统文件可写,这应该可以解决您的问题。

理想情况下,我建议将任何 apt-get install 命令添加到食谱文件中的 %post 部分。