遵循 IG 抓取教程并卡在 XPath/other 问题上

Followed IG scraping Tutorial and stuck on XPath/other issue

我一直在研究这个教程:https://medium.com/swlh/tutorial-web-scraping-instagrams-most-precious-resource-corgis-235bf0389b0c

当我尝试创建函数“insta_details”的更简单版本时,它会获得 Instagram 照片的点赞和评论 post,我似乎无法判断出了什么问题与代码。我想我错误地使用了 xpaths(第一次),但错误消息要求“NoSuchElementException”。

from selenium.webdriver import Chrome


def insta_details(urls):
    browser = Chrome()
    post_details = []
    for link in urls:
        browser.get(link)
        likes = browser.find_element_by_partial_link_text('likes').text
        age = browser.find_element_by_css_selector('a time').text
        xpath_comment = '//*[@id="react-root"]/section/main/div/div/article/div[2]/div[1]/ul/li[1]/div/div/div'
        comment = browser.find_element_by_xpath(xpath_comment).text
        insta_link = link.replace('https://www.instagram.com/p', '')
        post_details.append({'link': insta_link,'likes/views': likes,'age': age, 'comment': comment})
    return post_details


urls = ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']
insta_details(urls)

错误信息:

selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"partial link text","selector":"likes"}

复制和粘贴教程中的代码对我来说还行不通。是我调用错了函数还是代码中有其他东西?

看教程好像你的代码不完整。

在这里,试试这个:

import time
import re
from selenium.webdriver.chrome.options import Options
from selenium.webdriver import Chrome


def find_mentions_or_hashtags(comment, pattern):
    mentions = re.findall(pattern, comment)
    if (len(mentions) > 1) & (len(mentions) != 1):
        return mentions
    elif len(mentions) == 1:
        return mentions[0]
    else:
        return ""


def insta_link_details(url):
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    browser = Chrome(options=chrome_options)
    browser.get(url)
    try:
        # This captures the standard like count.
        likes = browser.find_element_by_xpath(
            """/html/body/div[1]/section/main/div/div/article/
                div[3]/section[2]/div/div/button/span""").text.split()[0]
        post_type = 'photo'
    except:
        # This captures the like count for videos which is stored
        likes = browser.find_element_by_xpath(
            """/html/body/div[1]/section/main/div/div/article/
                div[3]/section[2]/div/span/span""").text.split()[0]
        post_type = 'video'
    age = browser.find_element_by_css_selector('a time').text
    comment = browser.find_element_by_xpath(
        """/html/body/div[1]/section/main/div/div[1]/article/
        div[3]/div[1]/ul/div/li/div/div/div[2]/span""").text

    hashtags = find_mentions_or_hashtags(comment, '#[A-Za-z]+')
    mentions = find_mentions_or_hashtags(comment, '@[A-Za-z]+')
    post_details = {'link': url, 'type': post_type, 'likes/views': likes,
                    'age': age, 'comment': comment, 'hashtags': hashtags,
                    'mentions': mentions}
    time.sleep(10)
    return post_details


for url in ['https://www.instagram.com/p/CFdNu1lnCmm/', 'https://www.instagram.com/p/CFYR2OtHDbD/']:
    print(insta_link_details(url))

输出:

{'link': 'https://www.instagram.com/p/CFdNu1lnCmm/', 'type': 'photo', 'likes/views': '4', 'age': '6h', 'comment': 'Natural ingredients for natural skincare is the best way to go, with:\n\nThe Body Shop @thebodyshopaust\n☘️The Beauty Chef @thebeautychef\n\nWalk your body to a happier, healthier you with The Body Shop’s fair trade, high quality products. Be a powerhouse of digestive health with The Beauty Chef’s ingenious food supplements.  Even at our busiest, there’s always a way to take care of our health. \n\n5% rebate on all online purchases with #sosure. T&Cs apply. All rates for limited time only.', 'hashtags': '#sosure', 'mentions': ['@thebodyshopaust', '@thebeautychef']}
{'link': 'https://www.instagram.com/p/CFYR2OtHDbD/', 'type': 'photo', 'likes/views': '10', 'age': '2 DAYS AGO', 'comment': 'The weather can dry out your skin and hair this season, and there’s no reason to suffer through more when there’s so much going on!  Look better, feel better and brush better with these great offers for haircare, skin rejuvenation and beauty  Find 5% rewards for purchases at:\n\n Shaver Shop\n Fresh Fragrances\n Happy Hair Brush\n & many more online at our website bio !\n\nSoSure T&Cs apply. All rates for limited time only.\n.\n.\n.\n#sosure #sosureapp #haircare #skincare #perfume #beauty #healthylifestyle #shavershop #freshfragrances #happyhairbrush #onlineshopping #deals #melbournelifestyle #australia #onlinedeals', 'hashtags': ['#sosure', '#sosureapp', '#haircare', '#skincare', '#perfume', '#beauty', '#healthylifestyle', '#shavershop', '#freshfragrances', '#happyhairbrush', '#onlineshopping', '#deals', '#melbournelifestyle', '#australia', '#onlinedeals'], 'mentions': ''}