如何从 Google 搜索中获取 "feedback" 框的内容?
How can I get the contents of the "feedback" box from Google searches?
当您在 Google 搜索中提出问题或请求某个词的定义时,Google 会在 "feedback" 中为您提供答案摘要箱子。
例如,当您搜索 define apple
时,您会得到以下结果:
现在,我想说明一下,我不需要整个页面或其他结果,我只需要这个框:
如何使用Requests
和Beautiful Soup
模块获取Python中"feedback"框的内容3?
如果那不可能,我可以使用 Google 搜索 Api 来获取 "feedback" 框中的内容吗?
我在 SO 上找到了一个 similar question,但是 OP 没有指定语言,没有答案,我担心这两个评论已经过时了,因为这个问题是将近 9 个月前提出的。
提前感谢您抽出宝贵时间提供帮助。
问题是个好主意
程序可以用
python3 defineterm.py 苹果
#! /usr/bin/env python3.5
# defineterm.py
import requests
from bs4 import BeautifulSoup
import sys
import html
import codecs
searchterm = ' '.join(sys.argv[1:])
url = 'https://www.google.com/search?q=define+' + searchterm
res = requests.get(url)
try:
res.raise_for_status()
except Exception as exc:
print('error while loading page occured: ' + str(exc))
text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()
#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()
firsttag = soup.find('h3', class_="r")
if firsttag != None:
print(firsttag.getText())
print()
#second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
print(secondtag.getText())
print()
termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})
count = 0
for tag in termtags:
count += 1
print( str(count)+'. ' + tag.getText())
print()
使脚本可执行
然后在 ~/.bashrc
可以添加此行
alias defterm="/data/Scrape/google/defineterm.py "
为您的位置编写正确的脚本路径
然后执行
source ~/.bashrc
程序可以开始于:
defterm apple (or other term)
使用 requests 和 bs4 很容易完成,你只需要从 div 与 class lr_dct_ent
import requests
from bs4 import BeautifulSoup
h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
soup = BeautifulSoup(r)
print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))
正文在有序列表中,名词在div与lr_dct_sf_hclass:
In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
In [12]: soup = BeautifulSoup(r,"lxml")
In [13]: div = soup.select_one("div.lr_dct_ent")
In [14]: n_v = div.select_one("div.lr_dct_sf_h").text
In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]
In [16]: print(n_v)
noun
In [17]: print("\n".join(expl))
1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.
最简单的方法是使用 SelectorGadget.
获取此文本的 CSS 选择器
from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')
syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
或者,您可以使用 SerpApi 中的 Google Direct Answer Box API 来做同样的事情。这是付费 API,可免费试用 5,000 次搜索。
要集成的代码:
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "define apple",
"google_domain": "google.com",
}
search = GoogleSearch(params)
results = search.get_dict()
syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # array output
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
Disclaimer, I work for SerpApi
当您在 Google 搜索中提出问题或请求某个词的定义时,Google 会在 "feedback" 中为您提供答案摘要箱子。
例如,当您搜索 define apple
时,您会得到以下结果:
现在,我想说明一下,我不需要整个页面或其他结果,我只需要这个框:
如何使用Requests
和Beautiful Soup
模块获取Python中"feedback"框的内容3?
如果那不可能,我可以使用 Google 搜索 Api 来获取 "feedback" 框中的内容吗?
我在 SO 上找到了一个 similar question,但是 OP 没有指定语言,没有答案,我担心这两个评论已经过时了,因为这个问题是将近 9 个月前提出的。
提前感谢您抽出宝贵时间提供帮助。
问题是个好主意
程序可以用 python3 defineterm.py 苹果
#! /usr/bin/env python3.5
# defineterm.py
import requests
from bs4 import BeautifulSoup
import sys
import html
import codecs
searchterm = ' '.join(sys.argv[1:])
url = 'https://www.google.com/search?q=define+' + searchterm
res = requests.get(url)
try:
res.raise_for_status()
except Exception as exc:
print('error while loading page occured: ' + str(exc))
text = html.unescape(res.text)
soup = BeautifulSoup(text, 'lxml')
prettytext = soup.prettify()
#next lines are for analysis (saving raw page), you can comment them
frawpage = codecs.open('rawpage.txt', 'w', 'utf-8')
frawpage.write(prettytext)
frawpage.close()
firsttag = soup.find('h3', class_="r")
if firsttag != None:
print(firsttag.getText())
print()
#second tag may be changed, so check it if not returns correct result. That might be situation for all searched tags.
secondtag = soup.find('div', {'style': 'color:#666;padding:5px 0'})
if secondtag != None:
print(secondtag.getText())
print()
termtags = soup.findAll("li", {"style" : "list-style-type:decimal"})
count = 0
for tag in termtags:
count += 1
print( str(count)+'. ' + tag.getText())
print()
使脚本可执行
然后在 ~/.bashrc
可以添加此行
alias defterm="/data/Scrape/google/defineterm.py "
为您的位置编写正确的脚本路径
然后执行
source ~/.bashrc
程序可以开始于:
defterm apple (or other term)
使用 requests 和 bs4 很容易完成,你只需要从 div 与 class lr_dct_ent
import requests
from bs4 import BeautifulSoup
h = {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36"}
r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
soup = BeautifulSoup(r)
print("\n".join(soup.select_one("div.lr_dct_ent").text.split(";")))
正文在有序列表中,名词在div与lr_dct_sf_hclass:
In [11]: r = requests.get("https://www.google.ie/search?q=define+apple", headers=h).text
In [12]: soup = BeautifulSoup(r,"lxml")
In [13]: div = soup.select_one("div.lr_dct_ent")
In [14]: n_v = div.select_one("div.lr_dct_sf_h").text
In [15]: expl = [li.text for li in div.select("ol.lr_dct_sf_sens li")]
In [16]: print(n_v)
noun
In [17]: print("\n".join(expl))
1. the round fruit of a tree of the rose family, which typically has thin green or red skin and crisp flesh.used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
used in names of unrelated fruits or other plant growths that resemble apples in some way, e.g. custard apple, oak apple.
2. the tree bearing apples, with hard pale timber that is used in carpentry and to smoke food.
最简单的方法是使用 SelectorGadget.
获取此文本的 CSS 选择器from bs4 import BeautifulSoup
import requests, lxml
headers = {
'User-agent':
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}
html = requests.get('https://www.google.de/search?q=define apple', headers=headers)
soup = BeautifulSoup(html.text, 'lxml')
syllables = soup.select_one('.frCXef span').text
phonetic = soup.select_one('.g30o5d span span').text
noun = soup.select_one('.h3TRxf span').text
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
或者,您可以使用 SerpApi 中的 Google Direct Answer Box API 来做同样的事情。这是付费 API,可免费试用 5,000 次搜索。
要集成的代码:
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google",
"q": "define apple",
"google_domain": "google.com",
}
search = GoogleSearch(params)
results = search.get_dict()
syllables = results['answer_box']['syllables']
phonetic = results['answer_box']['phonetic']
noun = results['answer_box']['definitions'][0] # array output
print(f'{syllables}\n{phonetic}\n{noun}')
# Output:
'''
ap·ple
ˈapəl
the round fruit of a tree of the rose family, which typically has thin red or green skin and crisp flesh. Many varieties have been developed as dessert or cooking fruit or for making cider.
'''
Disclaimer, I work for SerpApi