使用 bs4 从列表或 html 解析器获取参数
getting a parameter from a list or an html parser using bs4
我正在尝试通过输入拼写错误的单词来获取建议的 google 单词:
下面是我的代码: 输入:Johnny walker rd lbl
输出:Johnny walker red label
import requests
import pandas as pd
from bs4 import BeautifulSoup
from pprint import pprint
key = "Johnny walker rd lbl"
query = "https://www.google.com/search?q=" + key
r = requests.get(query)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'html.parser')
#for s in soup.find_all(id="rhs_block"):
# pprint(s.text)
find=soup.find_all('script',attrs={'type':'text/javascript'})
mylist = []
for x in find:
mylist.append(str(x.string))
print(mylist)
输出:
['None', "(function(){var eventid='QO7rW5TtM5OYvQT516NY';google.kEI =
eventid;})();", 'google.ac&&google.ac.c({"agen":true,"cgen":true,
"client":"heirloom-serp","dh":true,"dhqt":true,"ds":"","ffql":"en","fl":true,"host":"google.com","isbh":28,"jsonp":true,
"msgs":{"cibl":"Clear Search","dym":"Did you mean:","lcky":"I\u0026#39;m Feeling Lucky","lml":"Learn more",
"oskt":"Input tools","psrc":"This search was removed from your \u003Ca href=\"/history\"\u003EWeb History\u003C/a\u003E","psrl":"Remove",
"sbit":"Search by image","srch":"Google Search"},"ovr":{},"pq":"Johnny walker red label","refpd":true,"rfs":[],"sbpl":24,"sbpr":24,"scd":10,"sce":5,"stok":"7UqfdDr4nbKtZNfvytsBW8kPB9E","uhde":false})']
我应该如何从可用的输出列表中获取 "pq" 标签。请帮忙。
使用正则表达式
import re
....
html_doc = r.text
output = re.search(r'"pq":"([^"]+)', html_doc).group(1)
我正在尝试通过输入拼写错误的单词来获取建议的 google 单词:
下面是我的代码: 输入:Johnny walker rd lbl
输出:Johnny walker red label
import requests
import pandas as pd
from bs4 import BeautifulSoup
from pprint import pprint
key = "Johnny walker rd lbl"
query = "https://www.google.com/search?q=" + key
r = requests.get(query)
html_doc = r.text
soup = BeautifulSoup(html_doc, 'html.parser')
#for s in soup.find_all(id="rhs_block"):
# pprint(s.text)
find=soup.find_all('script',attrs={'type':'text/javascript'})
mylist = []
for x in find:
mylist.append(str(x.string))
print(mylist)
输出:
['None', "(function(){var eventid='QO7rW5TtM5OYvQT516NY';google.kEI =
eventid;})();", 'google.ac&&google.ac.c({"agen":true,"cgen":true,
"client":"heirloom-serp","dh":true,"dhqt":true,"ds":"","ffql":"en","fl":true,"host":"google.com","isbh":28,"jsonp":true,
"msgs":{"cibl":"Clear Search","dym":"Did you mean:","lcky":"I\u0026#39;m Feeling Lucky","lml":"Learn more",
"oskt":"Input tools","psrc":"This search was removed from your \u003Ca href=\"/history\"\u003EWeb History\u003C/a\u003E","psrl":"Remove",
"sbit":"Search by image","srch":"Google Search"},"ovr":{},"pq":"Johnny walker red label","refpd":true,"rfs":[],"sbpl":24,"sbpr":24,"scd":10,"sce":5,"stok":"7UqfdDr4nbKtZNfvytsBW8kPB9E","uhde":false})']
我应该如何从可用的输出列表中获取 "pq" 标签。请帮忙。
使用正则表达式
import re
....
html_doc = r.text
output = re.search(r'"pq":"([^"]+)', html_doc).group(1)