使用 BeautifulSoup 从 table 中提取 tds,并将它们与 table id 一起排列在 Pandas 数据框中
Extraction of tds from table using BeautifulSoup and arranging them in Pandas dataframe together with the table id
我提取了以下 html 代码:
<table id=table1>
<thead>
<tr class="table_columns">
<th id="header1">
"Column 1 Title"
</th>
<th id="header2">
"Column 2 Title"
</th>
</tr>
</thead>
<tbody>
<tr class="evenRow">
<td headers="header1">firstrowcolumn1data</td>
<td headers="header2">firstrowcolumn2data</td>
</tr>
<tr class="oddRow">
<td headers="header1">secondrowcolumn1data</td>
<td headers="header2">secondrowcolumn2data</td>
</tr>
</tbody>
</table>
我需要提取 table 数据和 table (table1) 的 id,然后将它们排列成 Pandas 数据帧,类似于:
id
table data
table1
firstrowcolumn1data
table1
firstrowcolumn2data
table1
secondrowcolumn1data
table1
secondrowcolumn2data
试试这个:
data = []
for table in s.find_all('table'):
for td in table.find_all('td'):
data.append((table.get('id'), td.text))
df = pd.DataFrame(data, columns=['id', 'table data'])
输出:
>>> df
id table data
0 table1 firstrowcolumn1data
1 table1 firstrowcolumn2data
2 table1 secondrowcolumn1data
3 table1 secondrowcolumn2data
我提取了以下 html 代码:
<table id=table1>
<thead>
<tr class="table_columns">
<th id="header1">
"Column 1 Title"
</th>
<th id="header2">
"Column 2 Title"
</th>
</tr>
</thead>
<tbody>
<tr class="evenRow">
<td headers="header1">firstrowcolumn1data</td>
<td headers="header2">firstrowcolumn2data</td>
</tr>
<tr class="oddRow">
<td headers="header1">secondrowcolumn1data</td>
<td headers="header2">secondrowcolumn2data</td>
</tr>
</tbody>
</table>
我需要提取 table 数据和 table (table1) 的 id,然后将它们排列成 Pandas 数据帧,类似于:
id | table data |
---|---|
table1 | firstrowcolumn1data |
table1 | firstrowcolumn2data |
table1 | secondrowcolumn1data |
table1 | secondrowcolumn2data |
试试这个:
data = []
for table in s.find_all('table'):
for td in table.find_all('td'):
data.append((table.get('id'), td.text))
df = pd.DataFrame(data, columns=['id', 'table data'])
输出:
>>> df
id table data
0 table1 firstrowcolumn1data
1 table1 firstrowcolumn2data
2 table1 secondrowcolumn1data
3 table1 secondrowcolumn2data