从 Instagram 抓取图像不起作用
Scraping Image from Instagram not working
我正在尝试使用 python 从 Instagram 抓取图像。为此,我编写了一小段代码。这是代码:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as req
url = "https://www.instagram.com/p/CE9CZmsghan/"
website = req(url)
pg = website.read()
website.close()
pgsoup = soup(pg,"html.parser")
print(pgsoup.findAll('div',{'class' : 'KL4Bh'}))
我刚刚编写了代码,直到将 html 转换为 soup 对象。奇怪的是,最后一行没有打印任何内容。只打印一个空列表。你知道为什么吗?你知道如何解决吗?
使用selenium
很简单:
from selenium import webdriver
import os
chrome_driver = os.path.abspath(os.path.dirname(__file__)) + '/chromedriver'
browser = webdriver.Chrome(chrome_driver)
url = 'https://www.instagram.com/p/CE9CZmsghan/'
browser.get(url)
image_url = browser.find_element_by_class_name('KL4Bh').find_element_by_tag_name('img').get_attribute('src')
输出:
https://scontent-frx5-1.cdninstagram.com/v/t51.2885-15/e35/s1080x1080/119122193_326279868428134_4046851753042951785_n.jpg?_nc_ht=scontent-frx5-1.cdninstagram.com&_nc_cat=1&_nc_ohc=MU53uPwIAzoAX-OPPES&_nc_tp=15&oh=7e874789a4624589c92c9c4f5e030387&oe=5F8F3371
我正在尝试使用 python 从 Instagram 抓取图像。为此,我编写了一小段代码。这是代码:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as req
url = "https://www.instagram.com/p/CE9CZmsghan/"
website = req(url)
pg = website.read()
website.close()
pgsoup = soup(pg,"html.parser")
print(pgsoup.findAll('div',{'class' : 'KL4Bh'}))
我刚刚编写了代码,直到将 html 转换为 soup 对象。奇怪的是,最后一行没有打印任何内容。只打印一个空列表。你知道为什么吗?你知道如何解决吗?
使用selenium
很简单:
from selenium import webdriver
import os
chrome_driver = os.path.abspath(os.path.dirname(__file__)) + '/chromedriver'
browser = webdriver.Chrome(chrome_driver)
url = 'https://www.instagram.com/p/CE9CZmsghan/'
browser.get(url)
image_url = browser.find_element_by_class_name('KL4Bh').find_element_by_tag_name('img').get_attribute('src')
输出:
https://scontent-frx5-1.cdninstagram.com/v/t51.2885-15/e35/s1080x1080/119122193_326279868428134_4046851753042951785_n.jpg?_nc_ht=scontent-frx5-1.cdninstagram.com&_nc_cat=1&_nc_ohc=MU53uPwIAzoAX-OPPES&_nc_tp=15&oh=7e874789a4624589c92c9c4f5e030387&oe=5F8F3371