“[Errno 13] 权限被拒绝”- AWS SageMaker 上的 Jupyter Labs
'[Errno 13] Permission denied' - Jupyter Labs on AWS SageMaker
我在 AWS SageMaker.
上使用 Jupyter Lab 实例
内核:conda_mxnet_latest_p37
.
url_lib
包含一些虚假网址,我对此进行了异常处理。
['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf'] # sample
但是,那些有效的 URL 会抛出此错误:
[Errno 13] Permission denied: '/data'
我没有打开目录,也没有打开文件,因为我没有下载它们。
我 运行 在 终端 运气不好:
sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/
代码:
import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os
csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()
for url in url_list:
try:
download = od.download(url, '/data/gri/')
filename = url.rsplit('/', 1)[1]
path_extract = 'data/gri/' + filename
with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
zip_ref.extractall(path_extract)
os.remove(path_extract + 'readme.txt')
filenames = os.listdir(path_extract)
scans = []
for f in filenames:
with Image.open(path_extract + f) as img:
matrix = np.array(img)
scans.append(matrix)
# shutil.rmtree(path_extract)
os.remove(path_extract[:-1] + '.zip')
except (urllib.error.URLError, IOError, RuntimeError) as e:
print('Download PDFs', e)
输出:
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...
如果还有什么需要澄清的,请告诉我。
download
有一个正斜杠 /
作为保存目录的第一个字符(第二个参数)。我删除了这个:
download = od.download(url, 'data/gri/')
输出:
...
Downloading http://youxin.37.com/uploads/file/1556248045.pdf to data/gri/1556248045.pdf
450560it [00:02, 207848.59it/s]
...
我在 AWS SageMaker.
上使用 Jupyter Lab 实例内核:conda_mxnet_latest_p37
.
url_lib
包含一些虚假网址,我对此进行了异常处理。
['15', '259', '26', '58', 'https://imagepool.1und1-drillisch.de/v2/download/nachhaltigkeitsbericht/1&1Drillisch_Sustainability_Report_EN_2018.pdf', 'https://imagepool.1und1-drillisch.de//v2/download/nachhaltigkeitsbericht/2018-04-06_1und1-Drillisch_Sustainability_Report_eng.pdf', '6', 'http://youxin.37.com/uploads/file/1556248045.pdf', '80', 'https://multimedia.3m.com/mws/media/1691941O/2019-sustainability-report.PDF', 'https://s3-us-west-2.amazonaws.com/ungc-production/attachments/cop_2020/483648/original/GPIC_Sustainability_Report_2020__-_40_Years_of_Sustainable_Success.pdf?1583154650', 'https://drive.google.com/open?id=1_dnBcfXWjexy9QoWRhOk_3gnOkWfYRCw', 'http://aepsustainability.com/performance/docs/2020AEPGRIReport.pdf'] # sample
但是,那些有效的 URL 会抛出此错误:
[Errno 13] Permission denied: '/data'
我没有打开目录,也没有打开文件,因为我没有下载它们。
我 运行 在 终端 运气不好:
sh-4.2$ chmod 777 data
sh-4.2$ chmod 777 data/
sh-4.2$ chmod 777 data/gri
sh-4.2$ chmod 777 data/gri/
代码:
import pandas as pd
import opendatasets as od
import urllib
import zipfile
import os
csr_df = pd.read_excel('data/Company Sustainability Reports.xlsx', index_col=None)
url_list = csr_df['Report PDF Address'].tolist()
for url in url_list:
try:
download = od.download(url, '/data/gri/')
filename = url.rsplit('/', 1)[1]
path_extract = 'data/gri/' + filename
with zipfile.ZipFile('data/gri/' + filename + '.zip', 'r') as zip_ref:
zip_ref.extractall(path_extract)
os.remove(path_extract + 'readme.txt')
filenames = os.listdir(path_extract)
scans = []
for f in filenames:
with Image.open(path_extract + f) as img:
matrix = np.array(img)
scans.append(matrix)
# shutil.rmtree(path_extract)
os.remove(path_extract[:-1] + '.zip')
except (urllib.error.URLError, IOError, RuntimeError) as e:
print('Download PDFs', e)
输出:
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs list index out of range
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'imagepool.1und1-drillisch.de'. (_ssl.c:1091)>
Download PDFs list index out of range
Download PDFs [Errno 13] Permission denied: '/data'
...
如果还有什么需要澄清的,请告诉我。
download
有一个正斜杠 /
作为保存目录的第一个字符(第二个参数)。我删除了这个:
download = od.download(url, 'data/gri/')
输出:
...
Downloading http://youxin.37.com/uploads/file/1556248045.pdf to data/gri/1556248045.pdf
450560it [00:02, 207848.59it/s]
...