“[Errno 13] 权限被拒绝”- AWS SageMaker 上的 Jupyter Labs

'[Errno 13] Permission denied' - Jupyter Labs on AWS SageMaker

我在 AWS SageMaker.

上使用 Jupyter Lab 实例

内核:conda_mxnet_latest_p37.

url_lib 包含一些虚假网址,我对此进行了异常处理。

['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf']  # sample

但是,那些有效的 URL 会抛出此错误:

[Errno 13] Permission denied: '/data'

我没有打开目录,也没有打开文件,因为我没有下载它们。

我 运行 在 终端 运气不好:

sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/

代码:

import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os

csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()

for url in url_list:
    try:
        download = od.download(url, '/data/gri/')
        filename = url.rsplit('/', 1)[1]

        path_extract = 'data/gri/' + filename
        with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
            zip_ref.extractall(path_extract)

        os.remove(path_extract + 'readme.txt')

        filenames = os.listdir(path_extract)
        scans = []
        for f in filenames:
            with Image.open(path_extract + f) as img:
                matrix = np.array(img)
                scans.append(matrix)

        # shutil.rmtree(path_extract)
        os.remove(path_extract[:-1] + '.zip')

    except (urllib.error.URLError, IOError, RuntimeError) as e:
        print('Download PDFs', e)

输出:

Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...

如果还有什么需要澄清的,请告诉我。

download 有一个正斜杠 / 作为保存目录的第一个字符(第二个参数)。我删除了这个:

download = od.download(url, 'data/gri/')

输出:

...
Downloading http://youxin.37.com/uploads/file/1556248045.pdf to data/gri/1556248045.pdf
450560it [00:02, 207848.59it/s]
...