为什么我在使用 Python Flask ChromeDriver 和 Chrome 时收到无效 URL 错误
Why am I receiving an Invalid URL Error, when using Python Flask ChromeDriver and Chrome
我一直在关注这个教程:https://kb.objectrocket.com/postgresql/scrape-a-website-to-postgres-with-python-938
我的 app.py 文件如下所示(取自上述教程):
from flask import Flask # needed for flask-dependent libraries below
from flask import render_template # to render the error page
from selenium import webdriver # to grab source from URL
from bs4 import BeautifulSoup # for searching through HTML
import psycopg2 # for database access
# set up Postgres database connection and cursor.
t_host = "localhost" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "scrape"
t_user = "postgres"
t_pw = "********"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()
app = Flask(__name__)
@app.route("/")
@app.route('/import_temp')
def import_temp():
# set up your webdriver to use Chrome web browser
my_web_driver = webdriver.Chrome("/usr/local/bin/chromedriver")
# designate the URL we want to scrape
# NOTE: the long string of characters at the end of this URL below is a clue that
# maybe this page is so dynamic, like maybe refers to a specific web session and/or day/time,
# that we can't necessarily count on it to be the same more than one time.
# Which means... we may want to find another source for our data; one that is more
# dependable. That said, whatever URL you use, the methodology in this lesson stands.
t_url = "https://weather.com/weather/today/l/7ebb344012f0c5ff88820d763da89ed94306a86c770fda50c983bf01a0f55c0d"
# initiate scrape of website page data
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
# return entire page into "t_content"
t_content = my_web_driver.page_source
# use soup to make page content easily searchable
soup_in_bowl = BeautifulSoup(t_content)
# search for the UNIQUE span and class for the data we are looking for:
o_temp = soup_in_bowl.find('span', attrs={'class': 'deg-feels'})
# from the resulting object, "o_temp", get the text parameter and assign it to "n_temp"
n_temp = o_temp.text
# Build SQL for purpose of:
# saving the temperature data to a new row
s = ""
s += "INSERT INTO tbl_temperatures"
s += "("
s += "n_temp"
s += ") VALUES ("
s += "(%n_temp)"
s += ")"
# Trap errors for opening the file
try:
db_cursor.execute(s, [n_temp, n_temp])
db_conn.commit()
except psycopg2.Error as e:
t_msg = "Database error: " + e + "/n open() SQL: " + s
return render_template("error_page.html", t_msg = t_msg)
# Success!
# Show a message to user.
t_msg = "Successful scrape!"
return render_template("progress.html", t_msg = t_msg)
# Clean up the cursor and connection objects
db_cursor.close()
db_conn.close()
通过查看 Python Shell 错误日志,URL 似乎无效:
FLASK_APP = app.py
FLASK_ENV = development
FLASK_DEBUG = 0
In folder /home/lloyd/PycharmProjects/flaskProject
/home/lloyd/PycharmProjects/flaskProject/venv/bin/python -m flask run
* Serving Flask app 'app.py' (lazy loading)
* Environment: development
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/lloyd/PycharmProjects/flaskProject/app.py", line 33, in import_temp
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"Cannot navigate to invalid URL"}
(Session info: chrome=96.0.4664.110)
(Driver info: chromedriver=2.35.528139 (47ead77cb35ad2a9a83248b292151462a66cd881),platform=Linux 4.18.0-259.el8.x86_64 x86_64)
127.0.0.1 - - [28/Dec/2021 18:13:05] "GET / HTTP/1.1" 500 -
但是,我手动输入地址后可以访问URL
当我 运行 应用程序出现一个带有错误信息的 Web 控制台:
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
然后在网络浏览器的地址栏中出现第二个网络控制台,文本 data:,
。
任何关于为什么会发生这种情况的见解都将不胜感激。我之前 post 在这里回答了 404 错误的类似问题:
所有这些错误:
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
- selenium.common.exceptions.WebDriverException:消息:未知错误:未处理的检查器错误:{“代码”:-32000,“消息”:“无法导航到无效 URL”}
- Web 控制台显示文本
data:,
是由于您使用的二进制文件版本不兼容。
- 您正在使用 chrome=96.0.4664.45
- ChromeDriver v96.0 的发行说明清楚地提到了以下内容:
Supports Chrome version 96
- 但是你使用的是chromedriver=2.35.528139
- chromedriver=2.35.528139 的发行说明清楚地提到了以下内容:
Supports Chrome v62-64
所以 chromedriver=91.0 和 chrome=2.35.528139[=53= 之间存在明显的不匹配]
解决方案
确保:
- Chrome驱动程序 已更新至当前 ChromeDriver v96.0 级别。
- Chrome 更新为当前 chrome=96.0.4664.45(根据 chrome=96.0.4664.45 release notes).
我一直在关注这个教程:https://kb.objectrocket.com/postgresql/scrape-a-website-to-postgres-with-python-938
我的 app.py 文件如下所示(取自上述教程):
from flask import Flask # needed for flask-dependent libraries below
from flask import render_template # to render the error page
from selenium import webdriver # to grab source from URL
from bs4 import BeautifulSoup # for searching through HTML
import psycopg2 # for database access
# set up Postgres database connection and cursor.
t_host = "localhost" # either "localhost", a domain name, or an IP address.
t_port = "5432" # default postgres port
t_dbname = "scrape"
t_user = "postgres"
t_pw = "********"
db_conn = psycopg2.connect(host=t_host, port=t_port, dbname=t_dbname, user=t_user, password=t_pw)
db_cursor = db_conn.cursor()
app = Flask(__name__)
@app.route("/")
@app.route('/import_temp')
def import_temp():
# set up your webdriver to use Chrome web browser
my_web_driver = webdriver.Chrome("/usr/local/bin/chromedriver")
# designate the URL we want to scrape
# NOTE: the long string of characters at the end of this URL below is a clue that
# maybe this page is so dynamic, like maybe refers to a specific web session and/or day/time,
# that we can't necessarily count on it to be the same more than one time.
# Which means... we may want to find another source for our data; one that is more
# dependable. That said, whatever URL you use, the methodology in this lesson stands.
t_url = "https://weather.com/weather/today/l/7ebb344012f0c5ff88820d763da89ed94306a86c770fda50c983bf01a0f55c0d"
# initiate scrape of website page data
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
# return entire page into "t_content"
t_content = my_web_driver.page_source
# use soup to make page content easily searchable
soup_in_bowl = BeautifulSoup(t_content)
# search for the UNIQUE span and class for the data we are looking for:
o_temp = soup_in_bowl.find('span', attrs={'class': 'deg-feels'})
# from the resulting object, "o_temp", get the text parameter and assign it to "n_temp"
n_temp = o_temp.text
# Build SQL for purpose of:
# saving the temperature data to a new row
s = ""
s += "INSERT INTO tbl_temperatures"
s += "("
s += "n_temp"
s += ") VALUES ("
s += "(%n_temp)"
s += ")"
# Trap errors for opening the file
try:
db_cursor.execute(s, [n_temp, n_temp])
db_conn.commit()
except psycopg2.Error as e:
t_msg = "Database error: " + e + "/n open() SQL: " + s
return render_template("error_page.html", t_msg = t_msg)
# Success!
# Show a message to user.
t_msg = "Successful scrape!"
return render_template("progress.html", t_msg = t_msg)
# Clean up the cursor and connection objects
db_cursor.close()
db_conn.close()
通过查看 Python Shell 错误日志,URL 似乎无效:
FLASK_APP = app.py
FLASK_ENV = development
FLASK_DEBUG = 0
In folder /home/lloyd/PycharmProjects/flaskProject
/home/lloyd/PycharmProjects/flaskProject/venv/bin/python -m flask run
* Serving Flask app 'app.py' (lazy loading)
* Environment: development
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 2073, in wsgi_app
response = self.full_dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1518, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1516, in full_dispatch_request
rv = self.dispatch_request()
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/flask/app.py", line 1502, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
File "/home/lloyd/PycharmProjects/flaskProject/app.py", line 33, in import_temp
my_web_driver.get("<a href='" + t_url + "'>" + t_url + "</a>")
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get
self.execute(Command.GET, {'url': url})
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "/home/lloyd/PycharmProjects/flaskProject/venv/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.WebDriverException: Message: unknown error: unhandled inspector error: {"code":-32000,"message":"Cannot navigate to invalid URL"}
(Session info: chrome=96.0.4664.110)
(Driver info: chromedriver=2.35.528139 (47ead77cb35ad2a9a83248b292151462a66cd881),platform=Linux 4.18.0-259.el8.x86_64 x86_64)
127.0.0.1 - - [28/Dec/2021 18:13:05] "GET / HTTP/1.1" 500 -
但是,我手动输入地址后可以访问URL
当我 运行 应用程序出现一个带有错误信息的 Web 控制台:
Internal Server Error
The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.
然后在网络浏览器的地址栏中出现第二个网络控制台,文本 data:,
。
任何关于为什么会发生这种情况的见解都将不胜感激。我之前 post 在这里回答了 404 错误的类似问题:
所有这些错误:
[2021-12-28 18:13:05,988] ERROR in app: Exception on / [GET]
- selenium.common.exceptions.WebDriverException:消息:未知错误:未处理的检查器错误:{“代码”:-32000,“消息”:“无法导航到无效 URL”}
- Web 控制台显示文本
data:,
是由于您使用的二进制文件版本不兼容。
- 您正在使用 chrome=96.0.4664.45
- ChromeDriver v96.0 的发行说明清楚地提到了以下内容:
Supports Chrome version 96
- 但是你使用的是chromedriver=2.35.528139
- chromedriver=2.35.528139 的发行说明清楚地提到了以下内容:
Supports Chrome v62-64
所以 chromedriver=91.0 和 chrome=2.35.528139[=53= 之间存在明显的不匹配]
解决方案
确保:
- Chrome驱动程序 已更新至当前 ChromeDriver v96.0 级别。
- Chrome 更新为当前 chrome=96.0.4664.45(根据 chrome=96.0.4664.45 release notes).