为什么通过抓取 LinkedIn 它无法加载请求的 url？ Python

Question

我正在尝试抓取 LinkedIn，该脚本运行了 3 个月，但昨天它崩溃了。

我使用 selenium webdriver，带有假用户代理的 Firefox。

The URL is https://www.linkedin.com/company/my_company/

def init_driver():
    """Initiates selenium webdriver.
    :return: Firefox browser instance
    """
    try:
        #  use random UserAgent to avoid captcha
        fp = webdriver.FirefoxProfile()
        fp.set_preference("general.useragent.override", UserAgent().random)
        fp.update_preferences()
        # initiate driver
        options = FirefoxOptions()
        #options.add_argument("--headless")
        return webdriver.Firefox(firefox_options=options)
    except Exception as e:
        logging.error('Exception occurred initiating webdriver', exc_info=True)

然后只打开一个页面driver.get(url)

此时打开但无法加载

同样的情况发生在没有假代理和使用 chrome.

有人遇到过这样的事情吗？当我自己打开 link 一切 os ok.

https://www.linkedin.com/authwall?trk=gf&trkInfo=AQFvPeNP8NQIxwAAAXLqc-uI5rnQe1ZIysPcZOgjZCzbrBHZj7q6gd68fPG9NzbX00Rlre_yC0tITChjMDEXSNnD8tZRaMXqcRG-z_3QUMlCvQPR4uVGBQYoSOl3ycoO2E6Jl9w=&originalReferer=&sessionRedirect=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2my_company%2F

该函数打开其他网址没有问题

Answer 1

这是您修改代码的方式。

我修改了你的代码，你的代码被正确执行了。

from selenium import webdriver
from fake_useragent import UserAgent
import logging

def init_driver():
    """Initiates selenium webdriver.
    :return: Firefox browser instance
    """

    path = r"your firefox driver path"

    try:
        #  use random UserAgent to avoid captcha
        fp = webdriver.FirefoxProfile()
        fp.set_preference("general.useragent.override", UserAgent().random)
        fp.update_preferences()
        # initiate driver
        options = webdriver.FirefoxOptions()
        # options.add_argument("--headless")
        return webdriver.Firefox(firefox_options=options, executable_path=path)
    except Exception:
        logging.error('Exception occurred initiating webdriver', exc_info=True)




url = "your url"

driver = init_driver()


driver.get(url)

为什么通过抓取 LinkedIn 它无法加载请求的 url？ Python

Why by scraping LinkedIn it cannot load the requested url? Python

python

selenium

linkedin

web-scraping