如何使用 Python 更改抓取图像的名称?
How to change names of scraped images with Python?
所以我需要下载CoinGecko上列表中每个币的图片,所以我写了下面的代码:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
with open(basename(link), "wb") as f:
f.write(requests.get(link).content)
但是,我需要保存图像,它们的名称与 CoinGecko 列表中硬币的代码相同(将 bitcoin.png?1547033579
重命名为 BTC.png
,将 ethereum.png?1595348880
重命名为ETH.png
,等等)。需要重命名的图片有7000多张,其中很多图片的名字都比较独特,所以切片在这里不起作用。
有什么方法可以做到?
我相信您可以使用字符串切片轻松实现此目的:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
with open(basename(link[:link.find('?')]), "wb") as f:
f.write(requests.get(link).content)
我正在使用 [:] 分割 link 字符串的一部分,并寻找标记查询开始的问号。
我正在浏览 html 文件,我发现您正在查看的标签有一个 alt 参数,该参数在字符串的末尾有代码。
<div class="coin-icon mr-2 center flex-column">
<img class="" alt="bitcoin (BTC)" data-src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x" src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x">
</div>
因此我们可以使用它来获得正确的名称,如下所示:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
raw_name = item1.get('alt')
name = raw_name[raw_name.find('(') + 1:-1]
with open(basename(name), "wb") as f:
f.write(requests.get(link).content)
我们基本上是使用字符串切片提取括号之间的值。
您也可以这样做:
import requests
from bs4 import BeautifulSoup
from os.path import basename
url = "https://www.coingecko.com/en"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for item1 in soup.select('td.coin-name[data-text]'):
ticker_name = item1.select_one(".center > span").get_text(strip=True)
image_link = item1.select_one(".coin-icon > img").get('data-src').replace('thumb','thumb_2x')
## with open(f"{basename(ticker_name)}.png", "wb") as f:
with open(basename(ticker_name), "wb") as f:
f.write(requests.get(image_link).content)
所以我需要下载CoinGecko上列表中每个币的图片,所以我写了下面的代码:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
with open(basename(link), "wb") as f:
f.write(requests.get(link).content)
但是,我需要保存图像,它们的名称与 CoinGecko 列表中硬币的代码相同(将 bitcoin.png?1547033579
重命名为 BTC.png
,将 ethereum.png?1595348880
重命名为ETH.png
,等等)。需要重命名的图片有7000多张,其中很多图片的名字都比较独特,所以切片在这里不起作用。
有什么方法可以做到?
我相信您可以使用字符串切片轻松实现此目的:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
with open(basename(link[:link.find('?')]), "wb") as f:
f.write(requests.get(link).content)
我正在使用 [:] 分割 link 字符串的一部分,并寻找标记查询开始的问号。
我正在浏览 html 文件,我发现您正在查看的标签有一个 alt 参数,该参数在字符串的末尾有代码。
<div class="coin-icon mr-2 center flex-column">
<img class="" alt="bitcoin (BTC)" data-src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" data-srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x" src="https://assets.coingecko.com/coins/images/1/thumb/bitcoin.png?1547033579" srcset="https://assets.coingecko.com/coins/images/1/thumb_2x/bitcoin.png?1547033579 2x">
</div>
因此我们可以使用它来获得正确的名称,如下所示:
import requests
from bs4 import BeautifulSoup
from os.path import basename
def getdata(url):
r = requests.get(url)
return r.text
htmldata = getdata("https://www.coingecko.com/en")
soup = BeautifulSoup(htmldata, 'html.parser')
for item1 in soup.select('.coin-icon img'):
link = item1.get('data-src').replace('thumb', 'thumb_2x')
raw_name = item1.get('alt')
name = raw_name[raw_name.find('(') + 1:-1]
with open(basename(name), "wb") as f:
f.write(requests.get(link).content)
我们基本上是使用字符串切片提取括号之间的值。
您也可以这样做:
import requests
from bs4 import BeautifulSoup
from os.path import basename
url = "https://www.coingecko.com/en"
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
for item1 in soup.select('td.coin-name[data-text]'):
ticker_name = item1.select_one(".center > span").get_text(strip=True)
image_link = item1.select_one(".coin-icon > img").get('data-src').replace('thumb','thumb_2x')
## with open(f"{basename(ticker_name)}.png", "wb") as f:
with open(basename(ticker_name), "wb") as f:
f.write(requests.get(image_link).content)