Python BeautifulSoup 解析问题

Issues with Python BeautifulSoup parsing

我正在尝试解析 html page with BeautifulSoup. The task is to get the data underlined with red color for all the lots on this page. 我从左侧和右侧块(关于拍品、拍卖名称、国家等)获取数据,但从中央块获取数据似乎有问题为了我。这是完成的示例。

import requests
import re
from bs4 import BeautifulSoup as bs
import pandas as pd

URL_TEMPLATE = "https://www.artprice.com/artist/15079/wassily-kandinsky/lots/pasts?ipp=100"
FILE_NAME = "test"

def parse(url = URL_TEMPLATE):
    result_list = {'lot': [], 'name': [], 'date': [], 'type1': [], 'type2': [], 'width': [], 'height': [], 'estimate': [], 'hummerprice': [], 'auction_date': [], 'auction': [], 'country': []}
    r = requests.get(URL_TEMPLATE)
    soup = bs(r.text, "html.parser")
    lot_info = soup.find_all('p', class_='hidden-xs')
    date_info = soup.find_all('date')
    names_info = soup.find_all('a', class_='sln_lot_show')
    auction_info = soup.find_all('p', class_='visible-xs')
    auction_date_info = soup.find_all(string=re.compile('\d\d\s\w\w\w\s\d\d\d\d'))[1::2]
    type1_info = soup.find_all('div')
    for i in range(len(lot_info)):
        result_list['lot'].append(lot_info[i].text)
    for i in range(len(date_info)):
        result_list['date'].append(date_info[i].text)
    for i in range (len(names_info)):
        result_list['name'].append(names_info[i].text)
    for i in range(0, len(auction_info), 2):
        result_list['auction'].append(soup.find_all('p', class_='visible-xs')[i].strong.string)
    for i in range(1, len(auction_info), 2):
        result_list['country'].append(soup.find_all('p', class_='visible-xs')[i].string)
    for i in range(len(auction_date_info)):
        result_list['auction_date'].append(auction_date_info[i])
    return result_list
df = pd.DataFrame(data=parse())
df.to_excel("test.xlsx")

因此,任务是从中央块分别为该页面上的每个批次获取数据。

您需要 nth-of-type 才能访问所有这些 <p> 元素。

这只是第一个证明它有效的方法。
我会留给你清理输出。

for div in soup.find_all('div',class_='col-xs-8 col-sm-6'): 
    print(div.select_one('a').text.strip()) 
    print(div.select_one('p:nth-of-type(2)').text.strip()) 
    print(div.select_one('p:nth-of-type(3)').text.strip()) 
    print(div.select_one('p:nth-of-type(4)').text.strip()) 
    break 

结果:

Abstract
Print-Multiple, Print in colors, 29 1/2 x 31 1/2 in75 x 80 cm
Estimate:

              € 560 - € 784


              $ 605 - $ 848


              £ 500 - £ 700


              ¥ 4,303 - ¥ 6,025
Hammer price:
              not communicated
not communicated
not communicated
not communicated