如何提取 HTML table 并添加一个新列,其中包含来自早期 <strong> 标记的常量值?

How do you extract a HTML table and add a new column with constant values from an earlier <strong> tag?

我正在尝试从 HTML 文档中提取一系列 table,并从用作 header 的标签中附加一个具有常量值的新列。然后我们的想法是让这个新的三列 table 成为一个数据框。以下是我到目前为止提出的代码。 IE。每个 table 将有第三列,其中所有行值将等于 AGO、DPK、ATK 或 PMS,具体取决于哪个 header 在 table 系列之前。我是 python 和 HTML 的新手,如有任何帮助,我将不胜感激。谢谢磨坊!

import pandas as pd
from bs4 import BeautifulSoup
from robobrowser import RoboBrowser

br = RoboBrowser()
br.open("https://oilpriceng.net/03-09-2019")

table = br.find_all('td', class_='vc_table_cell')

for element in table:
    data = element.find('span', class_='vc_table_content')
    prod_name = br.find_all('strong')
    ago = prod_name[0].text
    dpk = prod_name[1].text
    atk = prod_name[2].text
    pms = prod_name[3].text
    if br.find('strong').text == ago:
        data.append(ago.text)
    elif br.find('strong').text == dpk:
        data.append(dpk.text)
    elif br.find('strong').text == atk:
        data.append(atk.text)
    elif br.find('strong').text == pms:
        data.append(pms.text)
    print(data.text)

df = pd.DataFrame(data)

The result i'm hoping for is to go from this

                AGO

Enterprise     Price
Coy A          [=13=].5/L
Coy B          [=13=].6/L
Coy C          [=13=].7/L

to the new table below as a dataframe in Pandas

Enterprise     Price            Product
Coy A          [=13=].5/L           AGO
Coy B          [=13=].6/L           AGO
Coy C          [=13=].7/L           AGO

and to repeat the same thing for other tables with DPK, ATK and PMS information

希望我理解你的问题。此脚本会将页面中找到的所有表格抓取到数据框中,并将其保存到 csv 文件:

import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://oilpriceng.net/03-09-2019/'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')

data, last = {'Enterprise':[], 'Price':[], 'Product':[]}, ''
for tag in soup.select('h1 strong, tr:has(td.vc_table_cell)'):
    if tag.name == 'strong':
        last = tag.get_text(strip=True)
    else:
        a, b = tag.select('td')
        a, b = a.get_text(strip=True), b.get_text(strip=True)
        if a and b != 'DEPOT PRICE':
            data['Enterprise'].append(a)
            data['Price'].append(b)
            data['Product'].append(last)

df = pd.DataFrame(data)
print(df)
df.to_csv('data.csv')

打印:

            Enterprise         Price Product
0            AVIDOR PH        ₦190.0     AGO
1            SHORELINK                   AGO
2    BULK STRATEGIC PH        ₦190.0     AGO
3                  TSL                   AGO
4              MASTERS                   AGO
..                 ...           ...     ...
165             CHIPET        ₦132.0     PMS
166               BOND                   PMS
167           RAIN OIL                   PMS
168               MENJ        ₦133.0     PMS
169              NIPCO  ₦ 2,9000,000     LPG

[170 rows x 3 columns]

data.csv(来自 LibreOffice 的屏幕截图):