Mechanical soup - 填表,工作有技巧HTMLtable
Mechanical soup - form filling, working with a tricky HTML table
我正在尝试使用机械汤自动填充并为我提交时间sheet。
表格如下所示:
timesheet_form
这是页面该部分的相关源代码:
<FORM ACTION="bwpkteci.P_UpdateTimeInOut" METHOD="post">
<INPUT TYPE="hidden" NAME="JobsSeqNo" VALUE="208138">
<INPUT TYPE="hidden" NAME="LastDate" VALUE="0">
<INPUT TYPE="hidden" NAME="par_restart" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_update" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_submit" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_recall" VALUE="N">
<INPUT TYPE="hidden" NAME="EarnCode" VALUE="RSA">
<INPUT TYPE="hidden" NAME="DateSelected" VALUE="20-FEB-2018">
<TABLE CLASS="dataentrytable" SUMMARY="This user enters the time of day for the hours worked into this table in order for the system to calculate the hours.">
<TR>
<TD CLASS="delabel" scope="row" >Date:</TD>
<TD CLASS="dedefault">Tuesday, Feb 20,2018</TD>
</TR>
<TR>
<TD CLASS="delabel" scope="row" >Earnings Code:</TD>
<TD CLASS="dedefault">Regular Student Aide</TD>
</TR>
</TABLE>
<TABLE CLASS="dataentrytable" SUMMARY="This is the detail table where the user enters the time of the day for the hours worked in order for the system to calculate the hours.">
<TR>
<TD CLASS="deheader" scope="col" ><LABEL for=shift_input_id><SPAN class="fieldlabeltext">Shift</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timein_input_id><SPAN class="fieldlabeltext">Time In</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timeout_input_id><SPAN class="fieldlabeltext">Time Out</SPAN></LABEL></TD>
<TD CLASS="deheader" scope="col" >Total Hours</TD>
</TR>
<INPUT TYPE="hidden" NAME="LineNumber" VALUE="1">
<TR>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="Shift" SIZE="2" MAXLENGTH="1" VALUE="1" ID="shift_input_id"></TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TimeIn" SIZE="6" MAXLENGTH="5" ID="timein_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeInAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TS_TimeOut" SIZE="6" MAXLENGTH="5" ID="timeout_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeOutAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
<TR>
<TD COLSPAN="5" CLASS="dedead"> </TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
</TABLE>
<P>
<TABLE CLASS="plaintable" SUMMARY="This layout table is used to align buttons.">
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Time Sheet">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Previous Day">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Next Day">
</TD>
</TR>
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Add New Line">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Save">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Copy">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Delete">
</TD>
</TR>
</TABLE>
</FORM>
我正在使用机械汤填写表格并提交。
#navigate to the timesheet page
browser.open(timesheetURL)
browser.get_current_page()
#select the form
browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form = browser.get_current_form()
#add required controls and set values for testing
form.new_control('text', 'shift', '1')
form.new_control('text', 'TimeIn', '08:30')
form.new_control('select', 'TimeInAm', 1) #1 stands for PM
form.new_control('text', 'TS_TimeOut', '09:30')
form.new_control('select', 'TimeOutAm', 1) #1 stands for PM
form.new_control('submit', 'ButtonSelected', 'Save')
form.choose_submit('ButtonSelected')
browser.launch_browser()
browser.submit_selected()
browser.launch_browser()
我必须使用 new_control 函数添加新控件,因为当我通过启动浏览器进行检查时,TimeIn、TimeOut 框没有显示。
这段代码不起作用,我不明白为什么。我想也许我弄乱了我添加的控件的名称或类型(也许它与表单的实际输入类型和名称不匹配 - 我检查了 chrome 扩展名,那也是不是这种情况)因为保存按钮会被点击但测试值并没有真正注册。
check
这是点击保存按钮前浏览器的样子:
before
after
没有注册!
使用 MechancialSoup 版本 0.10.0,我能够正确解析您的 HTML 片段(据我所知)。对于旧版本的 MechanicalSoup,我没有看到任何具体会导致此问题失败的内容,但如果没有其他效果,也许可以尝试更新。
我使用了以下简单代码:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open_fake_page(text) #Here 'text' is the HTML snippet
form = browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form.print_summary()
browser.launch_browser()
print_summary()
方法输出所有表单元素,其中包括您缺少的 TimeIn
和 TimeOut
框:
<input name="JobsSeqNo" type="hidden" value="208138"/>
<input name="LastDate" type="hidden" value="0"/>
<input name="par_restart" type="hidden" value="Y"/>
<input name="par_update" type="hidden" value="Y"/>
<input name="par_submit" type="hidden" value="Y"/>
<input name="par_recall" type="hidden" value="N"/>
<input name="EarnCode" type="hidden" value="RSA"/>
<input name="DateSelected" type="hidden" value="20-FEB-2018"/>
<input name="LineNumber" type="hidden" value="1"/>
<input id="shift_input_id" maxlength="1" name="Shift" size="2" type="text" value="1"/>
<input id="timein_input_id" maxlength="5" name="TimeIn" size="6" type="text"/>
<select name="TimeInAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input id="timeout_input_id" maxlength="5" name="TS_TimeOut" size="6" type="text"/>
<select name="TimeOutAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input name="ButtonSelected" type="submit" value="Time Sheet"/>
<input name="ButtonSelected" type="submit" value="Previous Day"/>
<input name="ButtonSelected" type="submit" value="Next Day"/>
<input name="ButtonSelected" type="submit" value="Add New Line"/>
<input name="ButtonSelected" type="submit" value="Save"/>
<input name="ButtonSelected" type="submit" value="Copy"/>
<input name="ButtonSelected" type="submit" value="Delete"/>
当我使用 launch_browser()
:
时,您缺少的元素也正确显示
我正在尝试使用机械汤自动填充并为我提交时间sheet。
表格如下所示:
timesheet_form
这是页面该部分的相关源代码:
<FORM ACTION="bwpkteci.P_UpdateTimeInOut" METHOD="post">
<INPUT TYPE="hidden" NAME="JobsSeqNo" VALUE="208138">
<INPUT TYPE="hidden" NAME="LastDate" VALUE="0">
<INPUT TYPE="hidden" NAME="par_restart" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_update" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_submit" VALUE="Y">
<INPUT TYPE="hidden" NAME="par_recall" VALUE="N">
<INPUT TYPE="hidden" NAME="EarnCode" VALUE="RSA">
<INPUT TYPE="hidden" NAME="DateSelected" VALUE="20-FEB-2018">
<TABLE CLASS="dataentrytable" SUMMARY="This user enters the time of day for the hours worked into this table in order for the system to calculate the hours.">
<TR>
<TD CLASS="delabel" scope="row" >Date:</TD>
<TD CLASS="dedefault">Tuesday, Feb 20,2018</TD>
</TR>
<TR>
<TD CLASS="delabel" scope="row" >Earnings Code:</TD>
<TD CLASS="dedefault">Regular Student Aide</TD>
</TR>
</TABLE>
<TABLE CLASS="dataentrytable" SUMMARY="This is the detail table where the user enters the time of the day for the hours worked in order for the system to calculate the hours.">
<TR>
<TD CLASS="deheader" scope="col" ><LABEL for=shift_input_id><SPAN class="fieldlabeltext">Shift</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timein_input_id><SPAN class="fieldlabeltext">Time In</SPAN></LABEL></TD>
<TD COLSPAN="2" CLASS="deheader" scope="col" ><LABEL for=timeout_input_id><SPAN class="fieldlabeltext">Time Out</SPAN></LABEL></TD>
<TD CLASS="deheader" scope="col" >Total Hours</TD>
</TR>
<INPUT TYPE="hidden" NAME="LineNumber" VALUE="1">
<TR>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="Shift" SIZE="2" MAXLENGTH="1" VALUE="1" ID="shift_input_id"></TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TimeIn" SIZE="6" MAXLENGTH="5" ID="timein_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeInAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><INPUT TYPE="text" NAME="TS_TimeOut" SIZE="6" MAXLENGTH="5" ID="timeout_input_id"></TD>
<TD CLASS="dedefault">
<SELECT NAME="TimeOutAm" SIZE="1">
<OPTION VALUE="AM" SELECTED>AM
<OPTION VALUE="PM">PM
</SELECT>
</TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
<TR>
<TD COLSPAN="5" CLASS="dedead"> </TD>
<TD CLASS="dedefault"><p class="rightaligntext">0</p></TD>
</TR>
</TABLE>
<P>
<TABLE CLASS="plaintable" SUMMARY="This layout table is used to align buttons.">
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Time Sheet">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Previous Day">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Next Day">
</TD>
</TR>
<TR>
<TD CLASS="pldefault">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Add New Line">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Save">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Copy">
<INPUT TYPE="submit" NAME="ButtonSelected" VALUE="Delete">
</TD>
</TR>
</TABLE>
</FORM>
我正在使用机械汤填写表格并提交。
#navigate to the timesheet page
browser.open(timesheetURL)
browser.get_current_page()
#select the form
browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form = browser.get_current_form()
#add required controls and set values for testing
form.new_control('text', 'shift', '1')
form.new_control('text', 'TimeIn', '08:30')
form.new_control('select', 'TimeInAm', 1) #1 stands for PM
form.new_control('text', 'TS_TimeOut', '09:30')
form.new_control('select', 'TimeOutAm', 1) #1 stands for PM
form.new_control('submit', 'ButtonSelected', 'Save')
form.choose_submit('ButtonSelected')
browser.launch_browser()
browser.submit_selected()
browser.launch_browser()
我必须使用 new_control 函数添加新控件,因为当我通过启动浏览器进行检查时,TimeIn、TimeOut 框没有显示。
这段代码不起作用,我不明白为什么。我想也许我弄乱了我添加的控件的名称或类型(也许它与表单的实际输入类型和名称不匹配 - 我检查了 chrome 扩展名,那也是不是这种情况)因为保存按钮会被点击但测试值并没有真正注册。
check
这是点击保存按钮前浏览器的样子:
before
after
没有注册!
使用 MechancialSoup 版本 0.10.0,我能够正确解析您的 HTML 片段(据我所知)。对于旧版本的 MechanicalSoup,我没有看到任何具体会导致此问题失败的内容,但如果没有其他效果,也许可以尝试更新。
我使用了以下简单代码:
import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open_fake_page(text) #Here 'text' is the HTML snippet
form = browser.select_form('form[action="bwpkteci.P_UpdateTimeInOut"]')
form.print_summary()
browser.launch_browser()
print_summary()
方法输出所有表单元素,其中包括您缺少的 TimeIn
和 TimeOut
框:
<input name="JobsSeqNo" type="hidden" value="208138"/>
<input name="LastDate" type="hidden" value="0"/>
<input name="par_restart" type="hidden" value="Y"/>
<input name="par_update" type="hidden" value="Y"/>
<input name="par_submit" type="hidden" value="Y"/>
<input name="par_recall" type="hidden" value="N"/>
<input name="EarnCode" type="hidden" value="RSA"/>
<input name="DateSelected" type="hidden" value="20-FEB-2018"/>
<input name="LineNumber" type="hidden" value="1"/>
<input id="shift_input_id" maxlength="1" name="Shift" size="2" type="text" value="1"/>
<input id="timein_input_id" maxlength="5" name="TimeIn" size="6" type="text"/>
<select name="TimeInAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input id="timeout_input_id" maxlength="5" name="TS_TimeOut" size="6" type="text"/>
<select name="TimeOutAm" size="1">
<option selected="" value="AM">AM</option><option value="PM">PM</option></select>
<input name="ButtonSelected" type="submit" value="Time Sheet"/>
<input name="ButtonSelected" type="submit" value="Previous Day"/>
<input name="ButtonSelected" type="submit" value="Next Day"/>
<input name="ButtonSelected" type="submit" value="Add New Line"/>
<input name="ButtonSelected" type="submit" value="Save"/>
<input name="ButtonSelected" type="submit" value="Copy"/>
<input name="ButtonSelected" type="submit" value="Delete"/>
当我使用 launch_browser()
: