动态 Google 搜索 Python/C#

Dynamic Google Search with Python/C#

我想检索 Google 个搜索结果数(106,000,000 个结果(0.58 秒))。我在 Python:

中写了这个脚本
import requests, webbrowser
from bs4 import BeautifulSoup

user_input = input("Type in query: ")
print("Googling..")
link = "http://www.google.com/search?q=" + user_input
google_search = requests.get(link)
print(google_search.headers)

#print it out as file

with open("Output.html", "w") as text_file:
    print("{}".format(google_search.text), file=text_file)

但是当我查看文件时,结果统计信息丢失了。除了 Google 搜索 API 之外,还有什么方法可以做到这一点,这很糟糕,因为它是有限的,甚至无法获得正确的结果。 我写了 Python 和 C#,因为我都知道。

要从 Google 获得正确的结果,您必须设置正确的 User-Agent http header:

import requests
from bs4 import BeautifulSoup


user_input = input("Type in query: ")
print("Googling for keyword={}..".format(user_input))

params = {
    'q': user_input,
    'hl': 'en'   # <-- set hl=en to obtain english only results.
}
headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'
}

google_search = requests.get("https://www.google.com/search", params=params, headers=headers)
soup = BeautifulSoup(google_search.content, 'html.parser')
print(soup.select_one('#result-stats').text)

打印(例如):

Type in query: moon
Googling for keyword=moon..
About 1,720,000,000 results (0.99 seconds) 

查看 SelectorGadget Chrome 扩展,通过在浏览器中单击所需的元素来获取 CSS 选择器。或者,如果您不喜欢通过 $$('SELECTOR') 命令在开发工具控制台中执行此操作,则可以使用它测试 css 选择器。

使用css选择器更灵活,可读性更好,尽量使用select_one() or select() bs4 methods indead of find()/findAll(). CSS selectors reference.

此外,您可以像这样传递 URL 查询 params

params = {
  'q': 'the most amazing query in 2021',
  'gl': 'hl',
}

requests.get(YOUR_URL, params=params)

代码:

from bs4 import BeautifulSoup
import requests, lxml

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

user_input = input("Type in query: ")
print(f"Googling... {user_input}")

params = {
  'q': user_input,
  'gl': 'hl',
}

soup = BeautifulSoup(requests.get('https://www.google.com/search', headers=headers, params=params).text, 'lxml')

print(f"Found {soup.select_one('#result-stats').text}"
      .replace("About", "about")
      .replace(" (", " in ")
      .replace(")", ""))

---------
'''
Type in query: fus ro dah
Googling... fus ro dah
Found about 628,000 results in 0.36 seconds 
'''

或者,您可以使用 SerpApi 中的 Google Organic Results API 来实现相同的目的。这是付费 API 和免费计划。

您的特定示例的主要区别在于您不需要弄清楚为什么某些事情没有按预期工作,因为它已经为最终用户完成了。在这种情况下唯一应该做的就是从结构化 JSON 字符串中获取所需的数据。

要集成的代码:

from serpapi import GoogleSearch
import os

user_input = input("Type in query: ")
print(f"Googling... {user_input}")

params = {
  "api_key": os.getenv("API_KEY"),
  "engine": "google",
  "q": user_input,
  "hl": "en"
}

search = GoogleSearch(params)
results = search.get_dict()

print(f"Total results: {results['search_information']['total_results']}\n"
      f"Time taken: {results['search_information']['time_taken_displayed']}")

-------
'''
Type in query: fus ro dah
Googling... fus ro dah
Total results: 663000
Time took: 0.59 sec
'''

Disclaimer, I work for SerpApi.