如何使用 python 提取动态 html 中的总值？

Question

我可以在 html 中提取我需要的部分值，但我无法提取所有值。我怎样才能完全获得 python 中的值？

import time
import requests
!pip install beautifulsoup4
import bs4
!pip install lxml
from bs4 import BeautifulSoup
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}

output =[]

url = "https://m.pcone.com.tw/store/0670386?ref=d_item_store"
       
r = requests.get(url, headers = headers)
soup = bs4.BeautifulSoup(r.text,"lxml")



for product in soup.find_all("a",class_='product-list-item'):
    
    productname = product.find("div",class_='name limit-2-line').get_text(strip=True)
    productprice= product.select_one("span",class_='symbol-price').string

    ordercount = product.find('span',class_="order_count").string[:-3] if product.find('span',class_="order_count")else None
    print(f'{productname}:{productprice}:{ordercount}')
    output.append([productname, productprice, ordercount])

  
df = pd.DataFrame(output, columns=['商品名稱', '價格', '購買人數'])
df.to_excel('松果-瑞昌.xlsx', index=False)

Answer 1

实际上，数据是由 javascript 从 api 调用 json 响应动态加载的，这就是 BeautifulSoup 无法获取数据的原因。 api 的最小工作解决方案仅使用如下请求：

import requests
import pandas as pd

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'}

params={
    'items_per_page': '20',
    'null':'' ,
    'page': '1',
    'sortBy': 'default',
    'sortDir': 'desc',
    'store_id': '0670386'
}
output =[]

#url = "https://m.pcone.com.tw/store/0670386?ref=d_item_store"

api_url='https://www.pcone.com.tw/api/filterSearchTP'

for i in range(1,14):
    params['total_pages'] = i    
    resp = requests.get(api_url, headers = headers,params=params).json()
    for item in resp['products']:
        productname=item['name']
        productprice=item['msrp']
        ordercount=item['order_count']
        #print(ordercount)

        output.append([productname, productprice, ordercount])

  
df = pd.DataFrame(output, columns=['商品名稱', '價格', '購買人數'])
df.to_excel('松果-瑞昌.xlsx', index=False)

如何使用 python 提取动态 html 中的总值？

How to extract total values in a dynamic html with python?

python

dynamic

beautifulsoup

pandas

python-requests