如何从浏览器获取所有 URL Python

Question

我需要在我的浏览器中获取当前打开的所有 URL，（不是用 selenium 打开浏览器并获取 link）这可能吗？ Whosebug 上可用的所有信息都是关于从 selenium 浏览器获取 links 的。但是我需要我当前打开的浏览器中的 links。

我试过了：

import sqlite3

con = sqlite3.connect('C:/Users/name/AppData/Local/BraveSoftware/Brave-Browser/User Data/Default/History')
cur = con.cursor()

cur.execute('select url from urls where id > 390')
print(cur.fetchall())

但是我得到这个错误：

cur.execute('select url from urls where id > 390')
sqlite3.OperationalError: database is locked

• Windows 10

• Brave 浏览器（版本 1.28.105 Chromium：92.0.4515.131）

• Python 3.9（64 位）

注意：我想要来自浏览器的链接，而不是来自网站的链接

Answer 1

您的问题是：如何从网站获取网页的所有网址：据此 :

from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
import re

req = Request("http://slashdot.org")
html_page = urlopen(req)

soup = BeautifulSoup(html_page, "lxml")

links = []
for link in soup.findAll('a'):
    links.append(link.get('href'))

print(links)

这将从网站中提取所有 url。

Answer 2

所有基于 webkit 的浏览器（brave、vivaldi、chrome、...）都使用 sqlite3 格式的历史数据库，因此您可以使用 Python 连接到数据库：

import sqlite3

con = sqlite3.connect(
    # Path on my Mac, there must be a equivalent on Windows.
    '/Users/foobar/Library/Application Support/BraveSoftware/Brave-Browser/Default/History'
)
cur = con.cursor()

cur.execute('select url from urls where id > 390')
print(cur.fetchall())

输出：

[('https://cybernews.com/how-to-use-vpn/change-google-play-country/',), ('https://duckduckgo.com/?q=android+change+coutnry&t=bravened',), ('https://duckduckgo.com/?q=android+change+coutnry&t=bravened&ia=web',) ...

或者：

编写您自己的 Chrome 扩展，它允许您监听所有 URL 更改，并且可以保存在您本地计算机的某个位置。

如何从浏览器获取所有 URL Python

How to get all URLs from browser Python

python

browser

python-3.x