使用 python 请求和 google 搜索
Using python requests with google search
我是 python 的新手。
在 PyCharm 我写了这段代码:
import requests
from bs4 import BeautifulSoup
response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
而不是得到搜索结果的HTML,我得到的是下一页的HTML
我在 pythonanywhere.com 上的脚本中使用了相同的代码,它运行良好。我尝试了很多我找到的解决方案,但结果总是一样,所以现在我坚持使用它。
我认为这应该可行:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
url = f"https://www.google.com/search?q=fitness+wear"
headers = {
"referer":"referer: https://www.google.com/",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
}
s.post(url, headers=headers)
response = s.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
它使用一个请求会话和一个 post 请求来创建任何初始 cookie(对此不完全确定),然后允许您抓取。
如果您在浏览器中打开私人 Window 并转到 google.com,您应该会看到相同的弹出窗口,提示您同意。这是因为您没有发送会话 cookie。
你有不同的选择来解决这个问题。
一种是直接发送您可以在网站上观察到的 cookie,如下所示:
import requests
cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412", ...}
resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)
@Dimitriy Kruglikov 使用的解决方案更简洁,使用会话是与网站进行持久会话的好方法。
Google 不会阻止你,你仍然可以从 HTML.
中提取数据
使用 cookie 不是很方便,使用 session 和 post 并获取请求会导致更大的流量。
您可以使用 decompose()
或 extract()
BS4
方法删除此弹出窗口:
annoying_popup.decompose()
将彻底摧毁它及其内容。 Documentation.
annoying_popup.extract()
将生成另一棵 html 树:一棵植根于您用来解析文档的 BeautifulSoup
object,另一棵植根于提取的标签。 Documentation.
在那之后,你可以抓取你需要的所有东西,也可以不删除它。
看到这个 Organic Results extraction 我最近做过。它从 Google 搜索结果中抓取标题、摘要和 link。
或者,您可以使用 Google Search Engine Results API from SerpApi. Check out the Playground。
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "fus ro dah",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(f"Title: {result['title']}\nSnippet: {result['snippet']}\nLink: {result['link']}\n")
输出:
Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
Snippet: I looked around for a fan made track that included Fus Ro Dah, but the ones that I found were pretty bad - some ...
Link: https://www.youtube.com/watch?v=JblD-FN3tgs
Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
Snippet: If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)
Title: Fus Ro Dah | Know Your Meme
Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
Link: https://knowyourmeme.com/memes/fus-ro-dah
Title: Fus ro dah - Urban Dictionary
Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah
JSON的一部分:
"organic_results": [
{
"position": 1,
"title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)",
"displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...",
"snippet": "If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: \"Fus Rah Do\" instead of the proper \"Fus Ro Dah.\" ...",
"sitelinks": {
"inline": [
{
"title": "Location",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
},
{
"title": "Effect",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
},
{
"title": "Usage",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
},
{
"title": "Word Wall",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
}
]
},
"cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
}
]
Disclaimer, I work for SerpApi.
我是 python 的新手。 在 PyCharm 我写了这段代码:
import requests
from bs4 import BeautifulSoup
response = requests.get(f"https://www.google.com/search?q=fitness+wear")
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
而不是得到搜索结果的HTML,我得到的是下一页的HTML
我在 pythonanywhere.com 上的脚本中使用了相同的代码,它运行良好。我尝试了很多我找到的解决方案,但结果总是一样,所以现在我坚持使用它。
我认为这应该可行:
import requests
from bs4 import BeautifulSoup
with requests.Session() as s:
url = f"https://www.google.com/search?q=fitness+wear"
headers = {
"referer":"referer: https://www.google.com/",
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36"
}
s.post(url, headers=headers)
response = s.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)
它使用一个请求会话和一个 post 请求来创建任何初始 cookie(对此不完全确定),然后允许您抓取。
如果您在浏览器中打开私人 Window 并转到 google.com,您应该会看到相同的弹出窗口,提示您同意。这是因为您没有发送会话 cookie。
你有不同的选择来解决这个问题。 一种是直接发送您可以在网站上观察到的 cookie,如下所示:
import requests
cookies = {"CONSENT":"YES+shp.gws-20210330-0-RC1.de+FX+412", ...}
resp = request.get(f"https://www.google.com/search?q=fitness+wear",cookies=cookies)
@Dimitriy Kruglikov 使用的解决方案更简洁,使用会话是与网站进行持久会话的好方法。
Google 不会阻止你,你仍然可以从 HTML.
中提取数据使用 cookie 不是很方便,使用 session 和 post 并获取请求会导致更大的流量。
您可以使用 decompose()
或 extract()
BS4
方法删除此弹出窗口:
annoying_popup.decompose()
将彻底摧毁它及其内容。 Documentation.annoying_popup.extract()
将生成另一棵 html 树:一棵植根于您用来解析文档的BeautifulSoup
object,另一棵植根于提取的标签。 Documentation.
在那之后,你可以抓取你需要的所有东西,也可以不删除它。
看到这个 Organic Results extraction 我最近做过。它从 Google 搜索结果中抓取标题、摘要和 link。
或者,您可以使用 Google Search Engine Results API from SerpApi. Check out the Playground。
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "fus ro dah",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(f"Title: {result['title']}\nSnippet: {result['snippet']}\nLink: {result['link']}\n")
输出:
Title: Skyrim - FUS RO DAH (Dovahkiin) HD - YouTube
Snippet: I looked around for a fan made track that included Fus Ro Dah, but the ones that I found were pretty bad - some ...
Link: https://www.youtube.com/watch?v=JblD-FN3tgs
Title: Unrelenting Force (Skyrim) | Elder Scrolls | Fandom
Snippet: If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: "Fus Rah Do" instead of the proper "Fus Ro Dah." ...
Link: https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)
Title: Fus Ro Dah | Know Your Meme
Snippet: Origin. "Fus Ro Dah" are the words for the "unrelenting force" thu'um shout in the game Elder Scrolls V: Skyrim. After reaching the first town of ...
Link: https://knowyourmeme.com/memes/fus-ro-dah
Title: Fus ro dah - Urban Dictionary
Snippet: 1. A dragon shout used in The Elder Scrolls V: Skyrim. 2.An international term for oral sex given by a female. ex.1. The Dragonborn yelled "Fus ...
Link: https://www.urbandictionary.com/define.php?term=Fus%20ro%20dah
JSON的一部分:
"organic_results": [
{
"position": 1,
"title": "Unrelenting Force (Skyrim) | Elder Scrolls | Fandom",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)",
"displayed_link": "https://elderscrolls.fandom.com › wiki › Unrelenting_F...",
"snippet": "If the general subtitles are turned on, it can be seen that the text for the Draugr's Unrelenting Force is misspelled: \"Fus Rah Do\" instead of the proper \"Fus Ro Dah.\" ...",
"sitelinks": {
"inline": [
{
"title": "Location",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Location"
},
{
"title": "Effect",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Effect"
},
{
"title": "Usage",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Usage"
},
{
"title": "Word Wall",
"link": "https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)#Word_Wall"
}
]
},
"cached_page_link": "https://webcache.googleusercontent.com/search?q=cache:K3LEBjvPps0J:https://elderscrolls.fandom.com/wiki/Unrelenting_Force_(Skyrim)+&cd=17&hl=en&ct=clnk&gl=us"
}
]
Disclaimer, I work for SerpApi.