为什么通过抓取 LinkedIn 它无法加载请求的 url? Python
Why by scraping LinkedIn it cannot load the requested url? Python
我正在尝试抓取 LinkedIn,该脚本运行了 3 个月,但昨天它崩溃了。
我使用 selenium webdriver,带有假用户代理的 Firefox。
The URL is https://www.linkedin.com/company/my_company/
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = FirefoxOptions()
#options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options)
except Exception as e:
logging.error('Exception occurred initiating webdriver', exc_info=True)
然后只打开一个页面driver.get(url)
此时打开但无法加载
同样的情况发生在没有假代理和使用 chrome.
有人遇到过这样的事情吗?当我自己打开 link 一切 os ok.
https://www.linkedin.com/authwall?trk=gf&trkInfo=AQFvPeNP8NQIxwAAAXLqc-uI5rnQe1ZIysPcZOgjZCzbrBHZj7q6gd68fPG9NzbX00Rlre_yC0tITChjMDEXSNnD8tZRaMXqcRG-z_3QUMlCvQPR4uVGBQYoSOl3ycoO2E6Jl9w=&originalReferer=&sessionRedirect=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2my_company%2F
该函数打开其他网址没有问题
这是您修改代码的方式。
我修改了你的代码,你的代码被正确执行了。
from selenium import webdriver
from fake_useragent import UserAgent
import logging
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
path = r"your firefox driver path"
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = webdriver.FirefoxOptions()
# options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options, executable_path=path)
except Exception:
logging.error('Exception occurred initiating webdriver', exc_info=True)
url = "your url"
driver = init_driver()
driver.get(url)
我正在尝试抓取 LinkedIn,该脚本运行了 3 个月,但昨天它崩溃了。
我使用 selenium webdriver,带有假用户代理的 Firefox。
The URL is https://www.linkedin.com/company/my_company/
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = FirefoxOptions()
#options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options)
except Exception as e:
logging.error('Exception occurred initiating webdriver', exc_info=True)
然后只打开一个页面driver.get(url)
此时打开但无法加载
同样的情况发生在没有假代理和使用 chrome.
有人遇到过这样的事情吗?当我自己打开 link 一切 os ok.
https://www.linkedin.com/authwall?trk=gf&trkInfo=AQFvPeNP8NQIxwAAAXLqc-uI5rnQe1ZIysPcZOgjZCzbrBHZj7q6gd68fPG9NzbX00Rlre_yC0tITChjMDEXSNnD8tZRaMXqcRG-z_3QUMlCvQPR4uVGBQYoSOl3ycoO2E6Jl9w=&originalReferer=&sessionRedirect=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2my_company%2F
该函数打开其他网址没有问题
这是您修改代码的方式。
我修改了你的代码,你的代码被正确执行了。
from selenium import webdriver
from fake_useragent import UserAgent
import logging
def init_driver():
"""Initiates selenium webdriver.
:return: Firefox browser instance
"""
path = r"your firefox driver path"
try:
# use random UserAgent to avoid captcha
fp = webdriver.FirefoxProfile()
fp.set_preference("general.useragent.override", UserAgent().random)
fp.update_preferences()
# initiate driver
options = webdriver.FirefoxOptions()
# options.add_argument("--headless")
return webdriver.Firefox(firefox_options=options, executable_path=path)
except Exception:
logging.error('Exception occurred initiating webdriver', exc_info=True)
url = "your url"
driver = init_driver()
driver.get(url)