我如何使用 Selenium 获取各州不同餐厅的菜单项和价格?

How do I use Selenium to get the menu items and prices for different restaurants by State?

这是我第一次使用 Selenium 和网络抓取。我一直在尝试从以下网站 (https://www.fastfoodmenuprices.com/baskin-robbins-prices/) 获取加利福尼亚某家餐厅的菜单项和价格。我已经能够使用 Selenium 从下拉菜单中将其设为 select California 但我一直 运行 遇到无法抓取菜单项和价格并出现空白的问题数据框。如何从以下网站抓取菜单项和价格并将它们存储到数据框中?代码如下:

from selenium import webdriver
import time 
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup

path = "/path/to/chromedriver"

driver = webdriver.Chrome(executable_path = path)
url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)
Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")

print(driver.page_source)
driver.quit

menu = []
prices = []

content = driver.page_source
soup = BeautifulSoup (content, features = "html.parser")

for element in soup.findAll('div', attrs = {'tbody': 'row-hover'}):
    menu = element.find ('td', attrs = {'class': "column-1"})
    prices = element.find('td', attrs = {'class':'column-3'})
    menu.append(menu.text)
    prices.append(prices.text)

df = pd.DataFrame({'Menu Item':menu, 'Prices':prices})
df

尝试:

import requests
import base64
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for td in soup.select(
    "tr:has(.column-1):has(.column-2):has(.column-3):has(input)"
):
    data.append(
        {
            "Type": td.find_previous(colspan="3").get_text(strip=True),
            "Food": td.select_one(".column-1").get_text(strip=True),
            "Size": td.select_one(".column-2").get_text(strip=True),
            "Price": float(
                td.select_one(".column-3").get_text(strip=True).strip("$")
            ),
        }
    )


adjust = soup.select_one('.tp-variation option:-soup-contains("California")')
adjust = float(base64.b64decode(adjust["value"]))

df = pd.DataFrame(data)
df["Price"] = (df["Price"] * adjust).round(2)

print(df)
df.to_csv("data.csv", index=False)

打印:

Type Food Size Price
0 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Mini 2.8
1 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Small 4.84
2 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Medium 5.61
3 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Soft Serve Below Large 7.65
4 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Kids 2.02
5 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Regular 2.53
6 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Cups & Cones Large 3.81
7 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Parfaits Mini 2.8
8 Soft ServeFlavors: Reese’s, Heath, Snickers, M&M’s, Oreo, Butterfinger, andChocolate Chip Cookie Dough Parfaits Regular 6.39
9 Sundaes Banana Royale 7.03
10 Sundaes Brownie 7.03
11 Sundaes Banana Split 8.56
12 Sundaes Reese’s Peanut Butter Cup Sundae 7.67
13 Sundaes Chocolate Chip Cookie Dough Sundae 7.67
14 Sundaes Oreo® Layered Sundae 7.67
15 Sundaes Made with Snickers Sundae 7.67
16 Sundaes One Scoop Sundae 4.47
17 Sundaes Two Scoops Sundae 5.75
18 Sundaes Three Scoops Sundae 6.64
19 Sundaes Candy Topping 1.01
20 Sundaes Waffle Bowl 1.27
21 Ice Cream Kid’s Scoop 2.8
22 Ice Cream Single Scoop 3.57
23 Ice Cream Double Scoop 5.11
24 Ice Cream Regular Waffle Cone 1.27
25 Ice Cream Chocolate Waffle Cone 1.91
26 Ice Cream Fancy Waffle Cone 1.91
27 Beverages Cappuccino Blast Mini 4.72
28 Beverages Cappuccino Blast Small 6
29 Beverages Cappuccino Blast Medium 7.28
30 Beverages Cappuccino Blast Large 8.56
31 Beverages Iced Cappy Blast Mini 4.72
32 Beverages Iced Cappy Blast Small 6
33 Beverages Iced Cappy Blast Medium 7.28
34 Beverages Iced Cappy Blast Large 8.56
35 Beverages Add a Boost (Cappuccino or Iced Cappy Blast) 0.64
36 Beverages Smoothie Mini 4.72
37 Beverages Smoothie Small 6
38 Beverages Smoothie Medium 7.28
39 Beverages Smoothie Large 8.56
40 Beverages Shake Mini 4.72
41 Beverages Shake Small 6
42 Beverages Shake Medium 7.28
43 Beverages Shake Large 8.56
44 Ice Cream To Go Pre-Packed Quart 7.67
45 Ice Cream To Go Hand-Packed Pint 6.39
46 Ice Cream To Go Hand-Packed Quart 10.23
47 Ice Cream To Go Clown Cones 3.7

并创建 data.csv(来自 LibreOffice 的屏幕截图):

*该网站正在使用 cloudflare 保护

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare CDN/Proxy!

https://www.fastfoodmenuprices.com/baskin-robbins-prices/ is using Cloudflare SSL!

** 所以我不得不使用以下选项来逃避检测

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

*** 对于select table tr, td,我使用css select或者更健壮和灵活的

**** 我必须在 pandas DataFrame 中使用 list and zip 函数,因为它显示的形状不同。

***** 我必须使用 try 除非你会看到一些菜单项丢失

脚本:

from selenium import webdriver
from selenium.webdriver.support.ui import Select
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup


options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')

driver = webdriver.Chrome(ChromeDriverManager().install(),options=options)

url = "https://www.fastfoodmenuprices.com/baskin-robbins-prices/"
driver.get(url)

Select(WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")



price=[]
menu=[]
 
soup = BeautifulSoup (driver.page_source,"lxml")
driver.close()

for element in soup.select('#tablepress-34 tbody tr'):
    try:
        menus = element.select_one('td:nth-child(2)').text
        menu.append(menus)
    except:
        pass
    try:
        prices = element.select_one('td:nth-child(3) span').text
        price.append(prices)
    except:
        pass
   
 

df = pd.DataFrame(data=list(zip(price,menu)),columns=['price','menu'])
print(df)

webdriver-manager

输出:

    price      menu
0    .80     Mini
1    .84    Small
2    .61   Medium
3    .65    Large
4    .02     Kids
5    .53  Regular
6    .81    Large
7    .80     Mini
8    .39  Regular
9    .03
10   .03
11   .56
12   .67
13   .67
14   .67
15   .67
16   .47
17   .75
18   .64
19   .01
20   .27
21   .80
22   .57
23   .11
24   .27
25   .91
26   .91
27   .72     Mini
28   .00    Small
29   .28   Medium
30   .56    Large
31   .72     Mini
32   .00    Small
33   .28   Medium
34   .56    Large
35   [=13=].64
36   .72     Mini
37   .00    Small
38   .28   Medium
39   .56    Large
40   .72     Mini
41   .00    Small
42   .28   Medium
43   .56    Large
44   .67    Quart
45   .39     Pint
46  .23    Quart
47   .70

一旦您 select California 提取 website you need to induce WebDriverWait for the and using from Pandas you can use the following 中的 table 内容:

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.chrome.service import Service
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    import pandas as pd
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('excludeSwitches', ['enable-logging'])
    options.add_experimental_option('useAutomationExtension', False)
    options.add_argument('--disable-blink-features=AutomationControlled')
    s = Service('C:\BrowserDrivers\chromedriver.exe')
    driver = webdriver.Chrome(service=s, options=options)
    driver.get("https://www.fastfoodmenuprices.com/baskin-robbins-prices")
    Select(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//select[@class='tp-variation']")))).select_by_value("MS4yOA==")
    tabledata = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@id='tablepress-34']"))).get_attribute("outerHTML")
    tabledf = pd.read_html(tabledata)
    print(tabledf)
    
  • 控制台输出:

    [                                                 Food  ...                                              Price
    0   Soft Serve Flavors: Reese’s, Heath, Snickers, ...  ...  Soft Serve Flavors: Reese’s, Heath, Snickers, ...
    1                                    Soft Serve Below  ...                                              .80
    2                                    Soft Serve Below  ...                                              .84
    3                                    Soft Serve Below  ...                                              .61
    4                                    Soft Serve Below  ...                                              .65
    5                                        Cups & Cones  ...                                              .02
    6                                        Cups & Cones  ...                                              .53
    7                                        Cups & Cones  ...                                              .81
    8                                            Parfaits  ...                                              .80
    9                                            Parfaits  ...                                              .39
    10                                            Sundaes  ...                                            Sundaes
    11                                      Banana Royale  ...                                              .03
    12                                            Brownie  ...                                              .03
    13                                       Banana Split  ...                                              .56
    14                   Reese’s Peanut Butter Cup Sundae  ...                                              .67
    15                 Chocolate Chip Cookie Dough Sundae  ...                                              .67
    16                               Oreo® Layered Sundae  ...                                              .67
    17                          Made with Snickers Sundae  ...                                              .67
    18                                   One Scoop Sundae  ...                                              .47
    19                                  Two Scoops Sundae  ...                                              .75
    20                                Three Scoops Sundae  ...                                              .64
    21                                      Candy Topping  ...                                              .01
    22                                        Waffle Bowl  ...                                              .27
    23                                          Ice Cream  ...                                          Ice Cream
    24                                        Kid’s Scoop  ...                                              .80
    25                                       Single Scoop  ...                                              .57
    26                                       Double Scoop  ...                                              .11
    27                                Regular Waffle Cone  ...                                              .27
    28                              Chocolate Waffle Cone  ...                                              .91
    29                                  Fancy Waffle Cone  ...                                              .91
    30                                          Beverages  ...                                          Beverages
    31                                   Cappuccino Blast  ...                                              .72
    32                                   Cappuccino Blast  ...                                              .00
    33                                   Cappuccino Blast  ...                                              .28
    34                                   Cappuccino Blast  ...                                              .56
    35                                   Iced Cappy Blast  ...                                              .72
    36                                   Iced Cappy Blast  ...                                              .00
    37                                   Iced Cappy Blast  ...                                              .28
    38                                   Iced Cappy Blast  ...                                              .56
    39       Add a Boost (Cappuccino or Iced Cappy Blast)  ...                                              [=11=].64
    40                                           Smoothie  ...                                              .72
    41                                           Smoothie  ...                                              .00
    42                                           Smoothie  ...                                              .28
    43                                           Smoothie  ...                                              .56
    44                                              Shake  ...                                              .72
    45                                              Shake  ...                                              .00
    46                                              Shake  ...                                              .28
    47                                              Shake  ...                                              .56
    48                                    Ice Cream To Go  ...                                    Ice Cream To Go
    49                                         Pre-Packed  ...                                              .67
    50                                        Hand-Packed  ...                                              .39
    51                                        Hand-Packed  ...                                             .23
    52                                        Clown Cones  ...                                              .70
    
    [53 rows x 3 columns]]