遵循 IG 抓取教程并卡在 XPath/other 问题上
Followed IG scraping Tutorial and stuck on XPath/other issue
我一直在研究这个教程:https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c
当我尝试创建函数“insta_details”的更简单版本时,它会获得 Instagram 照片的点赞和评论 post,我似乎无法判断出了什么问题与代码。我想我错误地使用了 xpaths(第一次),但错误消息要求“NoSuchElementException”。
from selenium.webdriver import Chrome
def insta_details(urls):
browser = Chrome()
post_details = []
for link in urls:
browser.get(link)
likes = browser.find_element_by_partial_link_text('likes').text
age = browser.find_element_by_css_selector('a time').text
xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
comment = browser.find_element_by_xpath(xpath_comment).text
insta_link = link.replace('https://www.instagram.com/p', '')
post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
return post_details
urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)
错误信息:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}
复制和粘贴教程中的代码对我来说还行不通。是我调用错了函数还是代码中有其他东西?
看教程好像你的代码不完整。
在这里,试试这个:
import time
import re
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import Chrome
def find_mentions_or_hashtags(comment, pattern):
mentions = re.findall(pattern, comment)
if (len(mentions) > 1) & (len(mentions) != 1):
return mentions
elif len(mentions) == 1:
return mentions[0]
else:
return ""
def insta_link_details(url):
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = Chrome(options=chrome_options)
browser.get(url)
try:
# This captures the standard like count.
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/div/button/span""").text.split()[0]
post_type = 'photo'
except:
# This captures the like count for videos which is stored
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/span/span""").text.split()[0]
post_type = 'video'
age = browser.find_element_by_css_selector('a time').text
comment = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div[1]/article/
div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text
hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
post_details = {'link': url, 'type': post_type, 'likes/views': likes,
'age': age, 'comment': comment, 'hashtags': hashtags,
'mentions': mentions}
time.sleep(10)
return post_details
for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
print(insta_link_details(url))
输出:
{'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\nThe Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements. Even at our busiest, there’s always a way to take care of our health. \n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
{'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on! Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty Find 5% rewards for purchases at:\n\n Shaver Shop\n Fresh Fragrances\n Happy Hair Brush\n & many more online at our website bio !\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}
我一直在研究这个教程:https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c
当我尝试创建函数“insta_details”的更简单版本时,它会获得 Instagram 照片的点赞和评论 post,我似乎无法判断出了什么问题与代码。我想我错误地使用了 xpaths(第一次),但错误消息要求“NoSuchElementException”。
from selenium.webdriver import Chrome
def insta_details(urls):
browser = Chrome()
post_details = []
for link in urls:
browser.get(link)
likes = browser.find_element_by_partial_link_text('likes').text
age = browser.find_element_by_css_selector('a time').text
xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
comment = browser.find_element_by_xpath(xpath_comment).text
insta_link = link.replace('https://www.instagram.com/p', '')
post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
return post_details
urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)
错误信息:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}
复制和粘贴教程中的代码对我来说还行不通。是我调用错了函数还是代码中有其他东西?
看教程好像你的代码不完整。
在这里,试试这个:
import time
import re
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import Chrome
def find_mentions_or_hashtags(comment, pattern):
mentions = re.findall(pattern, comment)
if (len(mentions) > 1) & (len(mentions) != 1):
return mentions
elif len(mentions) == 1:
return mentions[0]
else:
return ""
def insta_link_details(url):
chrome_options = Options()
chrome_options.add_argument("--headless")
browser = Chrome(options=chrome_options)
browser.get(url)
try:
# This captures the standard like count.
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/div/button/span""").text.split()[0]
post_type = 'photo'
except:
# This captures the like count for videos which is stored
likes = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div/article/
div[3]/section[2]/div/span/span""").text.split()[0]
post_type = 'video'
age = browser.find_element_by_css_selector('a time').text
comment = browser.find_element_by_xpath(
"""/html/body/div[1]/section/main/div/div[1]/article/
div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text
hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
post_details = {'link': url, 'type': post_type, 'likes/views': likes,
'age': age, 'comment': comment, 'hashtags': hashtags,
'mentions': mentions}
time.sleep(10)
return post_details
for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
print(insta_link_details(url))
输出:
{'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\nThe Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements. Even at our busiest, there’s always a way to take care of our health. \n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
{'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on! Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty Find 5% rewards for purchases at:\n\n Shaver Shop\n Fresh Fragrances\n Happy Hair Brush\n & many more online at our website bio !\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}