使用 BeautifulSoup 从 table 中提取 tds,并将它们与 table id 一起排列在 Pandas 数据框中

Extraction of tds from table using BeautifulSoup and arranging them in Pandas dataframe together with the table id

我提取了以下 html 代码:

<table id=table1>

  <thead>
    <tr class="table_columns">
      <th id="header1">
        "Column 1 Title"
      </th>
      <th id="header2">
        "Column 2 Title"
      </th>
    </tr>
  </thead>
  
  <tbody>
    <tr class="evenRow">
      <td headers="header1">firstrowcolumn1data</td>
      <td headers="header2">firstrowcolumn2data</td>
    </tr>
    <tr class="oddRow">
      <td headers="header1">secondrowcolumn1data</td>
      <td headers="header2">secondrowcolumn2data</td>
    </tr>
  </tbody>
</table>

我需要提取 table 数据和 table (table1) 的 id,然后将它们排列成 Pandas 数据帧,类似于:

id table data
table1 firstrowcolumn1data
table1 firstrowcolumn2data
table1 secondrowcolumn1data
table1 secondrowcolumn2data

试试这个:

data = []
for table in s.find_all('table'):
    for td in table.find_all('td'):
        data.append((table.get('id'), td.text))
df = pd.DataFrame(data, columns=['id', 'table data'])

输出:

>>> df
       id            table data
0  table1   firstrowcolumn1data
1  table1   firstrowcolumn2data
2  table1  secondrowcolumn1data
3  table1  secondrowcolumn2data