从 URL 下载文件时指定输出路径
Specify outpath when downloading files from URL
我正在从 url 下载一些文件。
我目前可以这样访问我的文件:
import requests
from bs4 import BeautifulSoup
import os
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in soup.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
result = s.get(final_link, stream = True)
with open(a['href'], 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)
这会很好地下载文件并将其放入默认目录 C:/User。
不过我想选择下载文件的位置。您可以使用 wget
选择输出路径的位置,但我的方法会下载空文件,就好像它们没有被访问一样。
我用 wget
试过这个:
out_path = "C:/my_path"
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in page.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
download = wget.download(final_link, out = out_path)
我认为 wget 无法正常工作,因为我正在使用身份验证访问网站(未显示),而当我加入最终 link 我不再使用身份验证访问它。有没有办法用 shutil 指定出路?
那用第一种方法呢,把打开文件的路径换成os.path.join(out_path, a['href'])
?
import requests
from bs4 import BeautifulSoup
import os
out_path = "C:\my_path"
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in soup.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
result = s.get(final_link, stream = True)
new_file_path = os.path.join(out_path, a['href'])
with open(new_file_path, 'wb') as out_file: # this will create the new file at new_file_path
shutil.copyfileobj(result.raw, out_file)
您可以像下面这样创建目标路径,
target_path = r'c:\windows\temp'
with open(os.path.join(target_path, a['href']), 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)
我正在从 url 下载一些文件。
我目前可以这样访问我的文件:
import requests
from bs4 import BeautifulSoup
import os
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in soup.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
result = s.get(final_link, stream = True)
with open(a['href'], 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)
这会很好地下载文件并将其放入默认目录 C:/User。
不过我想选择下载文件的位置。您可以使用 wget
选择输出路径的位置,但我的方法会下载空文件,就好像它们没有被访问一样。
我用 wget
试过这个:
out_path = "C:/my_path"
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in page.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
download = wget.download(final_link, out = out_path)
我认为 wget 无法正常工作,因为我正在使用身份验证访问网站(未显示),而当我加入最终 link 我不再使用身份验证访问它。有没有办法用 shutil 指定出路?
那用第一种方法呢,把打开文件的路径换成os.path.join(out_path, a['href'])
?
import requests
from bs4 import BeautifulSoup
import os
out_path = "C:\my_path"
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"
s = requests.session()
soup = BeautifulSoup(s.get(download_url).text, "lxml")
for a in soup.find_all('a', href=True):
final_link = os.path.join(prefix, a['href'])
result = s.get(final_link, stream = True)
new_file_path = os.path.join(out_path, a['href'])
with open(new_file_path, 'wb') as out_file: # this will create the new file at new_file_path
shutil.copyfileobj(result.raw, out_file)
您可以像下面这样创建目标路径,
target_path = r'c:\windows\temp'
with open(os.path.join(target_path, a['href']), 'wb') as out_file:
shutil.copyfileobj(result.raw, out_file)