使用 XPATH 提取 table 行
extract table rows with XPATH
我有 html 代码如下:
<TR ALIGN="LEFT">
<TD headers="usdot_number" ><center><font size="-1" face="Arial, Helvetica">1259247</FONT></center></TD>
<TD headers="prefix"><center><font size="-1" face="Arial, Helvetica">MC</FONT></center></TD>
<TD headers="docket_number" ><center><font size="-1" face="Arial, Helvetica">493001</FONT></center></TD>
<TD headers="legal_name" ><center><font size="-1" face="Arial, Helvetica"> E L ZAPATA TRANS INC</FONT></center></TD>
<TD headers="dba_name"> </TD>
<TD headers="city" ><center><font size="-1" face="Arial, Helvetica">SPRING VALLEY</FONT></center></TD>
<TD headers="state" ><center><font size="-1" face="Arial, Helvetica">CA</FONT></center></TD>
<td headers="view_details"><center><font size="-1" face="Arial, Helvetica">
<BR>
<FORM ACTION="pkg_carrquery.prc_getdetail" METHOD="POST">
<INPUT TYPE="hidden" NAME="pv_apcant_id" VALUE="406294">
<INPUT TYPE="hidden" NAME="pv_vpath" VALUE="LIVIEW">
<input type="submit" value="HTML" onClick="">
</FORM>
</font></center></td>
<td headers="view_details"><center><font size="-1" face="Arial, Helvetica">
<BR>
<FORM ACTION="http://li-public.fmcsa.dot.gov/reports/rwservlet" METHOD="POST" name="reportForm" onSubmit="submitReportRequest(this.rptSummit,this)">
<INPUT TYPE="hidden" NAME="hidden_run_parameters" VALUE="lirpt">
<INPUT TYPE="hidden" NAME="report" VALUE="/u01/oracle/lirpts/li_carrier.rdf">
<INPUT TYPE="hidden" NAME="p_apcant" VALUE="406294">
<INPUT TYPE="hidden" NAME="p_user" VALUE="WEBLIVIEW">
<INPUT TYPE="submit" VALUE="Report" name="rptSummit">
</FORM>
</td>
我想提取每个 TD 的一些值(usdot_number、docket_number、dba_name 和 legal_name)和 pv_apcant_id 值 (406294)从这个例子。我试着开始:
('//TABLE/TD headers/')
但是没用。我不知道如何处理 TD[space]headers=value/
表达式。谁能帮我提个建议?
谢谢!
要访问 XPath 中的属性,您需要使用 @
字符。
以下是获取 usdot_number
文本的方法:
response.xpath("//td[@headers = 'usdot_number']/center/font").extract()
这是提取 pv_apcant_id
值的示例表达式:
response.xpath("//input[@name = 'pv_apcant_id']/@value").extract()
我有 html 代码如下:
<TR ALIGN="LEFT">
<TD headers="usdot_number" ><center><font size="-1" face="Arial, Helvetica">1259247</FONT></center></TD>
<TD headers="prefix"><center><font size="-1" face="Arial, Helvetica">MC</FONT></center></TD>
<TD headers="docket_number" ><center><font size="-1" face="Arial, Helvetica">493001</FONT></center></TD>
<TD headers="legal_name" ><center><font size="-1" face="Arial, Helvetica"> E L ZAPATA TRANS INC</FONT></center></TD>
<TD headers="dba_name"> </TD>
<TD headers="city" ><center><font size="-1" face="Arial, Helvetica">SPRING VALLEY</FONT></center></TD>
<TD headers="state" ><center><font size="-1" face="Arial, Helvetica">CA</FONT></center></TD>
<td headers="view_details"><center><font size="-1" face="Arial, Helvetica">
<BR>
<FORM ACTION="pkg_carrquery.prc_getdetail" METHOD="POST">
<INPUT TYPE="hidden" NAME="pv_apcant_id" VALUE="406294">
<INPUT TYPE="hidden" NAME="pv_vpath" VALUE="LIVIEW">
<input type="submit" value="HTML" onClick="">
</FORM>
</font></center></td>
<td headers="view_details"><center><font size="-1" face="Arial, Helvetica">
<BR>
<FORM ACTION="http://li-public.fmcsa.dot.gov/reports/rwservlet" METHOD="POST" name="reportForm" onSubmit="submitReportRequest(this.rptSummit,this)">
<INPUT TYPE="hidden" NAME="hidden_run_parameters" VALUE="lirpt">
<INPUT TYPE="hidden" NAME="report" VALUE="/u01/oracle/lirpts/li_carrier.rdf">
<INPUT TYPE="hidden" NAME="p_apcant" VALUE="406294">
<INPUT TYPE="hidden" NAME="p_user" VALUE="WEBLIVIEW">
<INPUT TYPE="submit" VALUE="Report" name="rptSummit">
</FORM>
</td>
我想提取每个 TD 的一些值(usdot_number、docket_number、dba_name 和 legal_name)和 pv_apcant_id 值 (406294)从这个例子。我试着开始:
('//TABLE/TD headers/')
但是没用。我不知道如何处理 TD[space]headers=value/
表达式。谁能帮我提个建议?
谢谢!
要访问 XPath 中的属性,您需要使用 @
字符。
以下是获取 usdot_number
文本的方法:
response.xpath("//td[@headers = 'usdot_number']/center/font").extract()
这是提取 pv_apcant_id
值的示例表达式:
response.xpath("//input[@name = 'pv_apcant_id']/@value").extract()