“[Errno 13] 权限被拒绝”- AWS SageMaker 上的 Jupyter Labs

Question

我在 AWS SageMaker.

上使用 Jupyter Lab 实例

内核：conda_mxnet_latest_p37.

url_lib 包含一些虚假网址，我对此进行了异常处理。

['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf']  # sample

但是，那些有效的 URL 会抛出此错误：

[Errno 13] Permission denied: '/data'

我没有打开目录，也没有打开文件，因为我没有下载它们。

我运行在终端运气不好：

sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/

代码：

import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os

csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()

for url in url_list:
    try:
        download = od.download(url, '/data/gri/')
        filename = url.rsplit('/', 1)[1]

        path_extract = 'data/gri/' + filename
        with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
            zip_ref.extractall(path_extract)

        os.remove(path_extract + 'readme.txt')

        filenames = os.listdir(path_extract)
        scans = []
        for f in filenames:
            with Image.open(path_extract + f) as img:
                matrix = np.array(img)
                scans.append(matrix)

        # shutil.rmtree(path_extract)
        os.remove(path_extract[:-1] + '.zip')

    except (urllib.error.URLError, IOError, RuntimeError) as e:
        print('Download PDFs', e)

输出：

Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...

如果还有什么需要澄清的，请告诉我。

Answer 1

download 有一个正斜杠 / 作为保存目录的第一个字符（第二个参数）。我删除了这个：

download = od.download(url, 'data/gri/')

输出：

...
Downloading http://youxin.37.com/uploads/file/1556248045.pdf to data/gri/1556248045.pdf
450560it [00:02, 207848.59it/s]
...

“[Errno 13] 权限被拒绝”- AWS SageMaker 上的 Jupyter Labs

'[Errno 13] Permission denied' - Jupyter Labs on AWS SageMaker

permission-denied

python-3.x

jupyter-lab

amazon-sagemaker