遍历项目以提取 html table 的内容
Looping over items to extract contents of html table
我对 scrapy 完全陌生。我安装了它,我可以做一些非常简单的事情,比如构建教程。我还可以通过
启动斗志旺盛的 shell
scrapy shell "website"
下面的 table 被其他更高级的东西包围,比如 div。使用scrapy,如何提取以下table?我需要阅读 div 还是可以直接跳到 table 并提取信息?
我正在寻找的是这样的代码 returns 在字典中 table 中的所有项目,最好是完整包含的代码,我可以 运行 并从中学习:
def parse(self, response):
hxs = HtmlXPathSelector(response)
divs = hxs.select('//tr[@class="someclass"]')
for div in divs:
item['var1'] = div.select('//table/tbody/tr[*]/td[2]/p/span[2]
yield item
注意:我删除了末尾重复的内容。
<div class="full arrangeable" data-id="calendar"> <div class="full row" data-row="0"> <div class="full column" data-column="0"> <div class="cell" data-cell="0" data-compid="Calendar"> <a name="Calendar" class="anchor"></a> <div class="flexShell"> <div class="flexBox calendar" id="flexBox_flex_calendar_mainCal" data-more="0" data-checkstate="0" data-initcallback="calendar" data-updatecallback="calendar" data-visiblejs="[]" data-disablejs="[]"> <form action="flex.php" method="post" onsubmit="return Flex.prepareSubmit(this);" data-submit="options"> <input name="s" value="" type="hidden"> <input name="securitytoken" value="guest" type="hidden"> <input name="do" value="saveoptions" type="hidden"> <input name="setdefault" value="no" type="hidden"> <input name="ignoreinput" value="no" type="hidden"> <input name="flex[Calendar_mainCal][idSuffix]" value="" type="hidden"> <input name="flex[Calendar_mainCal][_flexForm_]" value="flexForm" type="hidden"> <input name="flex[Calendar_mainCal][modelData]" value="YToxMDp7czoxMToicGFfY29udHJvbHMiO3M6MTc6ImNhbGVuZGFyfENhbGVuZGFyIjtzOjE2OiJwYV9pbmplY3RyZXZlcnNlIjtiOjA7czoxNDoidmlld2luZ0RlZmF1bHQiO3M6OToiVGhpcyBXZWVrIjtzOjExOiJwcmV2Q2FsTGluayI7czoxNDoiZGF5PW5vdjMwLjIwMTEiO3M6MTE6Im5leHRDYWxMaW5rIjtzOjEzOiJkYXk9ZGVjMi4yMDExIjtzOjc6InByZXZBbHQiO3M6MjY6Ik5vdiAzMCwgMjAxMSAtIERlYyAxLCAyMDExIjtzOjc6Im5leHRBbHQiO3M6MjU6IkRlYyAyLCAyMDExIC0gRGVjIDMsIDIwMTEiO3M6MTA6Im5leHRIaWRkZW4iO2I6MDtzOjEwOiJwcmV2SGlkZGVuIjtiOjA7czo5OiJyaWdodExpbmsiO047fQ==" type="hidden"> <div class="head"> <ul> <li class="left pagination"><a title="Nov 30, 2011 - Dec 1, 2011" class="prev" href="calendar.php?day=nov30.2011"><span><</span></a></li> <li class="left"><a class="highlight light options flexTitle"><span><strong>Dec 1, 2011</strong></span></a></li> <li class="left pagination shadow"><a title="Dec 2, 2011 - Dec 3, 2011" class="next" href="calendar.php?day=dec2.2011"><span><</span></a></li> <li class="loader"></li> <li class="right imagefade noborder"><a class="highlight noborder filters flexFilter"><div class="fade"></div><span>Filter</span></a></li> <li class="right"> <a class="highlight noborder menu"> <span>This Week</span> <span class="dropdown"></span> <div> <div class="title">Default View:</div> <div data-value="yesterday">Yesterday</div> <div data-value="today">Today</div> <div data-value="tomorrow">Tomorrow</div> <div data-value="thisweek">This Week</div> </div> </a> </li> <li class="right shadow"><a class="highlight noborder upnext"><span>Up Next</span></a></li> <li class="layoutcontrols"><div class="pagearrange_homepage_controls"> </div> <div class="pagearrange_controls"> <span data-registered="1" class="onHomepage" title="Copy Block to Your Homepage"></span> </div></li></ul> </div> <div class="options sidebyside"> <div class="half"> <div class="shell flexoptions"> <div class="frame"> <input name="flex[Calendar_mainCal][calendardefault]" id="flex[Calendar_mainCal][calendardefault]" value="thisweek" type="hidden"> <div class="half"> <div class="pad"> <p class="title"><strong>Begin Date</strong></p> <input data-enterhandled="1" data-pickerid="flexDatePicker_1" name="flex[Calendar_mainCal][begindate]" data-container="Calendar_mainCal_begindate" class="bginput flexDatePicker" value="December 1, 2011" data-range="2007,2015" type="text"> <div class="minicalendar" id="flexDatePicker_Calendar_mainCal_begindate"><div class="pickerheader"><table class="menu"><tbody><tr><td><a class="calJump year back">«</a></td><td><a class="calJump month back">‹</a></td><td class="current">December 2011</td><td><a class="calJump month forward">›</a></td><td><a class="calJump year forward">»</a></td></tr></tbody></table></div><div class="pickercontainer"><div class="table"><div class="row header"><div class="day header">Sun</div><div class="day header">Mon</div><div class="day header">Tue</div><div class="day header">Wed</div><div class="day header">Thu</div><div class="day header">Fri</div><div class="day header">Sat</div></div><div class="row"><div data-date="November 27, 2011" class="day other"><a>27</a></div><div data-date="November 28, 2011" class="day other"><a>28</a></div><div data-date="November 29, 2011" class="day other"><a>29</a></div><div data-date="November 30, 2011" class="day other"><a>30</a></div><div data-date="December 1, 2011" class="day active"><a>1</a></div><div data-date="December 2, 2011" class="day"><a>2</a></div><div data-date="December 3, 2011" class="day"><a>3</a></div></div><div class="row"><div data-date="December 4, 2011" class="day"><a>4</a></div><div data-date="December 5, 2011" class="day"><a>5</a></div><div data-date="December 6, 2011" class="day"><a>6</a></div><div data-date="December 7, 2011" class="day"><a>7</a></div><div data-date="December 8, 2011" class="day"><a>8</a></div><div data-date="December 9, 2011" class="day"><a>9</a></div><div data-date="December 10, 2011" class="day"><a>10</a></div></div><div class="row"><div data-date="December 11, 2011" class="day"><a>11</a></div><div data-date="December 12, 2011" class="day"><a>12</a></div><div data-date="December 13, 2011" class="day"><a>13</a></div><div data-date="December 14, 2011" class="day"><a>14</a></div><div data-date="December 15, 2011" class="day"><a>15</a></div><div data-date="December 16, 2011" class="day"><a>16</a></div><div data-date="December 17, 2011" class="day"><a>17</a></div></div><div class="row"><div data-date="December 18, 2011" class="day"><a>18</a></div><div data-date="December 19, 2011" class="day"><a>19</a></div><div data-date="December 20, 2011" class="day"><a>20</a></div><div data-date="December 21, 2011" class="day"><a>21</a></div><div data-date="December 22, 2011" class="day"><a>22</a></div><div data-date="December 23, 2011" class="day"><a>23</a></div><div data-date="December 24, 2011" class="day"><a>24</a></div></div><div class="row"><div data-date="December 25, 2011" class="day"><a>25</a></div><div data-date="December 26, 2011" class="day"><a>26</a></div><div data-date="December 27, 2011" class="day"><a>27</a></div><div data-date="December 28, 2011" class="day"><a>28</a></div><div data-date="December 29, 2011" class="day"><a>29</a></div><div data-date="December 30, 2011" class="day"><a>30</a></div><div data-date="December 31, 2011" class="day"><a>31</a></div></div></div></div></div> </div> </div> <div class="half last"> <div class="pad"> <p class="title"><strong>End Date</strong> (<a data-pickerid="flexDatePicker_2" class="internal noneLink_Calendar_mainCal_enddate">none</a>)</p> <input data-enterhandled="1" data-pickerid="flexDatePicker_2" name="flex[Calendar_mainCal][enddate]" data-container="Calendar_mainCal_enddate" class="bginput flexDatePicker" value="December 1, 2011" data-range="2007,2015" data-nonehandler="noneLink_Calendar_mainCal_enddate" type="text"> <div class="minicalendar" id="flexDatePicker_Calendar_mainCal_enddate"><div class="pickerheader"><table class="menu"><tbody><tr><td><a class="calJump year back">«</a></td><td><a class="calJump month back">‹</a></td><td class="current">December 2011</td><td><a class="calJump month forward">›</a></td><td><a class="calJump year forward">»</a></td></tr></tbody></table></div><div class="pickercontainer"><div class="table"><div class="row header"><div class="day header">Sun</div><div class="day header">Mon</div><div class="day header">Tue</div><div class="day header">Wed</div><div class="day header">Thu</div><div class="day header">Fri</div><div class="day header">Sat</div></div><div class="row"><div data-date="November 27, 2011" class="day other"><a>27</a></div><div data-date="November 28, 2011" class="day other"><a>28</a></div><div data-date="November 29, 2011" class="day other"><a>29</a></div><div data-date="November 30, 2011" class="day other"><a>30</a></div><div data-date="December 1, 2011" class="day active"><a>1</a></div><div data-date="December 2, 2011" class="day"><a>2</a></div><div data-date="December 3, 2011" class="day"><a>3</a></div></div><div class="row"><div data-date="December 4, 2011" class="day"><a>4</a></div><div data-date="December 5, 2011" class="day"><a>5</a></div><div data-date="December 6, 2011" class="day"><a>6</a></div><div data-date="December 7, 2011" class="day"><a>7</a></div><div data-date="December 8, 2011" class="day"><a>8</a></div><div data-date="December 9, 2011" class="day"><a>9</a></div><div data-date="December 10, 2011" class="day"><a>10</a></div></div><div class="row"><div data-date="December 11, 2011" class="day"><a>11</a></div><div data-date="December 12, 2011" class="day"><a>12</a></div><div data-date="December 13, 2011" class="day"><a>13</a></div><div data-date="December 14, 2011" class="day"><a>14</a></div><div data-date="December 15, 2011" class="day"><a>15</a></div><div data-date="December 16, 2011" class="day"><a>16</a></div><div data-date="December 17, 2011" class="day"><a>17</a></div></div><div class="row"><div data-date="December 18, 2011" class="day"><a>18</a></div><div data-date="December 19, 2011" class="day"><a>19</a></div><div data-date="December 20, 2011" class="day"><a>20</a></div><div data-date="December 21, 2011" class="day"><a>21</a></div><div data-date="December 22, 2011" class="day"><a>22</a></div><div data-date="December 23, 2011" class="day"><a>23</a></div><div data-date="December 24, 2011" class="day"><a>24</a></div></div><div class="row"><div data-date="December 25, 2011" class="day"><a>25</a></div><div data-date="December 26, 2011" class="day"><a>26</a></div><div data-date="December 27, 2011" class="day"><a>27</a></div><div data-date="December 28, 2011" class="day"><a>28</a></div><div data-date="December 29, 2011" class="day"><a>29</a></div><div data-date="December 30, 2011" class="day"><a>30</a></div><div data-date="December 31, 2011" class="day"><a>31</a></div></div></div></div></div> </div> </div> <div class="full"> <div class="pad"> <ul class="periodshortcuts"> <li><a class="internal" data-range="April 12, 2015|April 18, 2015">This Week</a></li> <li><a class="internal" data-range="April 19, 2015|April 25, 2015">Next Week</a></li> <li><a class="internal" data-range="April 5, 2015|April 11, 2015">Last Week</a></li> <li><a class="internal" data-range="April 1, 2015|April 30, 2015">This Month</a></li> <li><a class="internal" data-range="May 1, 2015|May 31, 2015">Next Month</a></li> <li><a class="internal" data-range="March 1, 2015|March 31, 2015">Last Month</a></li> </ul> </div> </div> </div> <table> <tbody><tr> <td class="flexOptionsError"></td> <td class="flexSubmitButtons"> <input class="button flexOptionsSubmit" name="flexSettings" value="Apply Settings" type="submit"> <input class="button flexCancelOptions" value="Cancel" type="button"> </td> <td class="flexDefaults"></td> </tr> </tbody></table> </div> </div> <div class="half last"> <div class="shell flexfilters"> <div class="frame"> <table class="pad"> <tbody><tr> <td class="pad expectedimpact" width="65%"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Impact Selected"> <p class="title"> <strong>Expected Impact</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][impacts]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][impacts]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck impacts"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][high]" id="flex[Calendar_mainCal][impacts]_high" value="high" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_high"><span class="impact high" title="High Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][medium]" id="flex[Calendar_mainCal][impacts]_medium" value="medium" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_medium"><span class="impact medium" title="Medium Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][low]" id="flex[Calendar_mainCal][impacts]_low" value="low" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_low"><span class="impact low" title="Low Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][holiday]" id="flex[Calendar_mainCal][impacts]_holiday" value="holiday" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_holiday"><span class="impact holiday" title="Non-Economic"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> </table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> <td class="pad currencies" rowspan="2" width="35%"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Currencies Selected"> <p class="title"> <strong>Currencies</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][currencies]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][currencies]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][currencies][aud]" id="flex[Calendar_mainCal][currencies]_aud" value="aud" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_aud">AUD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][cad]" id="flex[Calendar_mainCal][currencies]_cad" value="cad" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_cad">CAD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][chf]" id="flex[Calendar_mainCal][currencies]_chf" value="chf" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_chf">CHF</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][cny]" id="flex[Calendar_mainCal][currencies]_cny" value="cny" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_cny">CNY</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][eur]" id="flex[Calendar_mainCal][currencies]_eur" value="eur" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_eur">EUR</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][gbp]" id="flex[Calendar_mainCal][currencies]_gbp" value="gbp" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_gbp">GBP</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][jpy]" id="flex[Calendar_mainCal][currencies]_jpy" value="jpy" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_jpy">JPY</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][nzd]" id="flex[Calendar_mainCal][currencies]_nzd" value="nzd" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_nzd">NZD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][usd]" id="flex[Calendar_mainCal][currencies]_usd" value="usd" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_usd">USD</label></td> </tr> </tbody></table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> </tr> <tr> <td class="pad"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Types Selected"> <p class="title"> <strong>Event Types</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][eventtypes]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][eventtypes]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][eventtypes][growth]" id="flex[Calendar_mainCal][eventtypes]_growth" value="growth" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_growth">Growth</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][inflation]" id="flex[Calendar_mainCal][eventtypes]_inflation" value="inflation" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_inflation">Inflation</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][employment]" id="flex[Calendar_mainCal][eventtypes]_employment" value="employment" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_employment">Employment</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][centralbank]" id="flex[Calendar_mainCal][eventtypes]_centralbank" value="centralbank" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_centralbank">Central Bank</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][bonds]" id="flex[Calendar_mainCal][eventtypes]_bonds" value="bonds" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_bonds">Bonds</label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][eventtypes][housing]" id="flex[Calendar_mainCal][eventtypes]_housing" value="housing" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_housing">Housing</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][sentiment]" id="flex[Calendar_mainCal][eventtypes]_sentiment" value="sentiment" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_sentiment">Consumer Surveys</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][pmi]" id="flex[Calendar_mainCal][eventtypes]_pmi" value="pmi" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_pmi">Business Surveys</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][speeches]" id="flex[Calendar_mainCal][eventtypes]_speeches" value="speeches" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_speeches">Speeches</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][misc]" id="flex[Calendar_mainCal][eventtypes]_misc" value="misc" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_misc">Misc</label></td> </tr> </tbody></table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> </tr> </tbody></table> </div> <table> <tbody><tr> <td class="flexFilterError"></td> <td class="flexSubmitButtons"> <input class="button flexFilterSubmit" name="flexFilters" value="Apply Filter" type="submit"> <input class="button flexCancelFilters" value="Cancel" type="button"> </td> <td class="flexDefaults"></td> </tr> </tbody></table> </div> </div> </div> </form> <table> <thead> <tr> <th class="col1">Date</th> <th class="col2"><a href="timezone.php" title="Time Options">9:34pm</a></th> <th class="col3">Currency</th> <th class="col4">Impact</th> <th class="col5"> </th> <th class="col6">Detail</th> <th class="col7">Actual</th> <th class="col8">Forecast</th> <th class="col9">Previous</th> <th class="col10">Graph</th> </tr> </thead> <tbody><tr class="borderfix"><td></td></tr> <tr class="calendar_row newday" data-eventid="36121"> <td class="date"><span class="date">Thu<span>Dec 1</span></span></td> <td class="time">1:30am</td> <td class="currency">AUD</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Commodity Prices y/y</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
18.1%
</td> <td class="forecast"></td> <td class="previous"><span class="revised" title="Revised From 19.4%">19.6%</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="35311"> <td class="date"></td> <td class="time">2:45am</td> <td class="currency">CHF</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>GDP q/q</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
0.2%
</td> <td class="forecast">0.2%</td> <td class="previous"><span class="revised better" title="Revised From 0.4%">0.5%</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr><tr class="details " data-eventid="35311"><td align="center"></td><td colspan="8" class="calendar_detail_cell details nest" align="center"></td><td align="center"></td></tr> <tr class="calendar_row" data-eventid="41782"> <td class="date"></td> <td class="time">4:00am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>ECB President Draghi Speaks</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="43848"> <td class="date"></td> <td class="time">4:15am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Spanish Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
43.8
</td> <td class="forecast"></td> <td class="previous">43.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="35078"> <td class="date"></td> <td class="time">4:30am</td> <td class="currency">CHF</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> <span class="worse">44.8</span> </td> <td class="forecast">46.6</td> <td class="previous">46.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr><tr class="details " data-eventid="35078"><td align="center"></td><td colspan="8" class="calendar_detail_cell details nest" align="center"></td><td align="center"></td></tr> <tr class="calendar_row" data-eventid="43502"> <td class="date"></td> <td class="time">4:45am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Italian Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual"> <span class="better">44.0</span> </td> <td class="forecast">42.8</td> <td class="previous">43.3</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="58942"> <td class="date"></td> <td class="time">4:50am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>French Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
47.3
</td> <td class="forecast">47.6</td> <td class="previous">47.6</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="59001"> <td class="date"></td> <td class="time">4:55am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>German Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
47.9
</td> <td class="forecast">48.0</td> <td class="previous">47.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="33221"> <td class="date"></td> <td class="time">5:00am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
46.4
</td> <td class="forecast">46.4</td> <td class="previous">46.4</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="33165"> <td class="date"></td> <td class="time">5:30am</td> <td class="currency">GBP</td> <td class="impact"> <span title="High Impact Expected" class="high"></span> </td> <td class="event"><span>Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> <span class="better">47.6</span> </td> <td class="forecast">47.1</td> <td class="previous"><span class="revised better" title="Revised From 47.4">47.8</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row nogrid" data-eventid="57061"> <td class="date"></td> <td class="time"></td> <td class="currency">GBP</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>FPC Statement</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="42399"> <td class="date"></td> <td class="time">6:01am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>French 10-y Bond Auction</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
3.18|3.1
</td> <td class="forecast"></td> <td class="previous">3.22|2.2</td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="35087"> <td class="date"></td> <td class="time">6:30am</td> <td class="currency">GBP</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>BOE Financial Stability Report</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="42468"> <td class="date"></td> </tbody></table> <div class="foot"> <ul> <li class="more"> <a href="#" class="flexMore"><span>More</span><span class="loader"></span></a> </li> </ul> </div> </div> </div> </div> </div> </div> </div>
您可以根据里面的th
个元素定位table,例如:
response.xpath("//table[.//th[. = 'Date']]")
您还可以找到 table 检查它是 parents:
response.css("div#flexBox_flex_calendar_mainCal > table")
Scrapy Shell 的工作示例(打印 table 的时间值):
In [1]: for row in response.css("div#flexBox_flex_calendar_mainCal table tr.calendar_row"):
print row.xpath(".//td[@class='time']/text()").extract()
[u'1:30am']
[u'2:45am']
[u'4:00am']
[u'4:15am']
[u'4:30am']
[u'4:45am']
[u'4:50am']
[u'4:55am']
[u'5:00am']
[u'5:30am']
[]
[u'6:01am']
[u'6:30am']
[u'6:36am']
我对 scrapy 完全陌生。我安装了它,我可以做一些非常简单的事情,比如构建教程。我还可以通过
启动斗志旺盛的 shellscrapy shell "website"
下面的 table 被其他更高级的东西包围,比如 div。使用scrapy,如何提取以下table?我需要阅读 div 还是可以直接跳到 table 并提取信息?
我正在寻找的是这样的代码 returns 在字典中 table 中的所有项目,最好是完整包含的代码,我可以 运行 并从中学习:
def parse(self, response):
hxs = HtmlXPathSelector(response)
divs = hxs.select('//tr[@class="someclass"]')
for div in divs:
item['var1'] = div.select('//table/tbody/tr[*]/td[2]/p/span[2]
yield item
注意:我删除了末尾重复的内容。
<div class="full arrangeable" data-id="calendar"> <div class="full row" data-row="0"> <div class="full column" data-column="0"> <div class="cell" data-cell="0" data-compid="Calendar"> <a name="Calendar" class="anchor"></a> <div class="flexShell"> <div class="flexBox calendar" id="flexBox_flex_calendar_mainCal" data-more="0" data-checkstate="0" data-initcallback="calendar" data-updatecallback="calendar" data-visiblejs="[]" data-disablejs="[]"> <form action="flex.php" method="post" onsubmit="return Flex.prepareSubmit(this);" data-submit="options"> <input name="s" value="" type="hidden"> <input name="securitytoken" value="guest" type="hidden"> <input name="do" value="saveoptions" type="hidden"> <input name="setdefault" value="no" type="hidden"> <input name="ignoreinput" value="no" type="hidden"> <input name="flex[Calendar_mainCal][idSuffix]" value="" type="hidden"> <input name="flex[Calendar_mainCal][_flexForm_]" value="flexForm" type="hidden"> <input name="flex[Calendar_mainCal][modelData]" value="YToxMDp7czoxMToicGFfY29udHJvbHMiO3M6MTc6ImNhbGVuZGFyfENhbGVuZGFyIjtzOjE2OiJwYV9pbmplY3RyZXZlcnNlIjtiOjA7czoxNDoidmlld2luZ0RlZmF1bHQiO3M6OToiVGhpcyBXZWVrIjtzOjExOiJwcmV2Q2FsTGluayI7czoxNDoiZGF5PW5vdjMwLjIwMTEiO3M6MTE6Im5leHRDYWxMaW5rIjtzOjEzOiJkYXk9ZGVjMi4yMDExIjtzOjc6InByZXZBbHQiO3M6MjY6Ik5vdiAzMCwgMjAxMSAtIERlYyAxLCAyMDExIjtzOjc6Im5leHRBbHQiO3M6MjU6IkRlYyAyLCAyMDExIC0gRGVjIDMsIDIwMTEiO3M6MTA6Im5leHRIaWRkZW4iO2I6MDtzOjEwOiJwcmV2SGlkZGVuIjtiOjA7czo5OiJyaWdodExpbmsiO047fQ==" type="hidden"> <div class="head"> <ul> <li class="left pagination"><a title="Nov 30, 2011 - Dec 1, 2011" class="prev" href="calendar.php?day=nov30.2011"><span><</span></a></li> <li class="left"><a class="highlight light options flexTitle"><span><strong>Dec 1, 2011</strong></span></a></li> <li class="left pagination shadow"><a title="Dec 2, 2011 - Dec 3, 2011" class="next" href="calendar.php?day=dec2.2011"><span><</span></a></li> <li class="loader"></li> <li class="right imagefade noborder"><a class="highlight noborder filters flexFilter"><div class="fade"></div><span>Filter</span></a></li> <li class="right"> <a class="highlight noborder menu"> <span>This Week</span> <span class="dropdown"></span> <div> <div class="title">Default View:</div> <div data-value="yesterday">Yesterday</div> <div data-value="today">Today</div> <div data-value="tomorrow">Tomorrow</div> <div data-value="thisweek">This Week</div> </div> </a> </li> <li class="right shadow"><a class="highlight noborder upnext"><span>Up Next</span></a></li> <li class="layoutcontrols"><div class="pagearrange_homepage_controls"> </div> <div class="pagearrange_controls"> <span data-registered="1" class="onHomepage" title="Copy Block to Your Homepage"></span> </div></li></ul> </div> <div class="options sidebyside"> <div class="half"> <div class="shell flexoptions"> <div class="frame"> <input name="flex[Calendar_mainCal][calendardefault]" id="flex[Calendar_mainCal][calendardefault]" value="thisweek" type="hidden"> <div class="half"> <div class="pad"> <p class="title"><strong>Begin Date</strong></p> <input data-enterhandled="1" data-pickerid="flexDatePicker_1" name="flex[Calendar_mainCal][begindate]" data-container="Calendar_mainCal_begindate" class="bginput flexDatePicker" value="December 1, 2011" data-range="2007,2015" type="text"> <div class="minicalendar" id="flexDatePicker_Calendar_mainCal_begindate"><div class="pickerheader"><table class="menu"><tbody><tr><td><a class="calJump year back">«</a></td><td><a class="calJump month back">‹</a></td><td class="current">December 2011</td><td><a class="calJump month forward">›</a></td><td><a class="calJump year forward">»</a></td></tr></tbody></table></div><div class="pickercontainer"><div class="table"><div class="row header"><div class="day header">Sun</div><div class="day header">Mon</div><div class="day header">Tue</div><div class="day header">Wed</div><div class="day header">Thu</div><div class="day header">Fri</div><div class="day header">Sat</div></div><div class="row"><div data-date="November 27, 2011" class="day other"><a>27</a></div><div data-date="November 28, 2011" class="day other"><a>28</a></div><div data-date="November 29, 2011" class="day other"><a>29</a></div><div data-date="November 30, 2011" class="day other"><a>30</a></div><div data-date="December 1, 2011" class="day active"><a>1</a></div><div data-date="December 2, 2011" class="day"><a>2</a></div><div data-date="December 3, 2011" class="day"><a>3</a></div></div><div class="row"><div data-date="December 4, 2011" class="day"><a>4</a></div><div data-date="December 5, 2011" class="day"><a>5</a></div><div data-date="December 6, 2011" class="day"><a>6</a></div><div data-date="December 7, 2011" class="day"><a>7</a></div><div data-date="December 8, 2011" class="day"><a>8</a></div><div data-date="December 9, 2011" class="day"><a>9</a></div><div data-date="December 10, 2011" class="day"><a>10</a></div></div><div class="row"><div data-date="December 11, 2011" class="day"><a>11</a></div><div data-date="December 12, 2011" class="day"><a>12</a></div><div data-date="December 13, 2011" class="day"><a>13</a></div><div data-date="December 14, 2011" class="day"><a>14</a></div><div data-date="December 15, 2011" class="day"><a>15</a></div><div data-date="December 16, 2011" class="day"><a>16</a></div><div data-date="December 17, 2011" class="day"><a>17</a></div></div><div class="row"><div data-date="December 18, 2011" class="day"><a>18</a></div><div data-date="December 19, 2011" class="day"><a>19</a></div><div data-date="December 20, 2011" class="day"><a>20</a></div><div data-date="December 21, 2011" class="day"><a>21</a></div><div data-date="December 22, 2011" class="day"><a>22</a></div><div data-date="December 23, 2011" class="day"><a>23</a></div><div data-date="December 24, 2011" class="day"><a>24</a></div></div><div class="row"><div data-date="December 25, 2011" class="day"><a>25</a></div><div data-date="December 26, 2011" class="day"><a>26</a></div><div data-date="December 27, 2011" class="day"><a>27</a></div><div data-date="December 28, 2011" class="day"><a>28</a></div><div data-date="December 29, 2011" class="day"><a>29</a></div><div data-date="December 30, 2011" class="day"><a>30</a></div><div data-date="December 31, 2011" class="day"><a>31</a></div></div></div></div></div> </div> </div> <div class="half last"> <div class="pad"> <p class="title"><strong>End Date</strong> (<a data-pickerid="flexDatePicker_2" class="internal noneLink_Calendar_mainCal_enddate">none</a>)</p> <input data-enterhandled="1" data-pickerid="flexDatePicker_2" name="flex[Calendar_mainCal][enddate]" data-container="Calendar_mainCal_enddate" class="bginput flexDatePicker" value="December 1, 2011" data-range="2007,2015" data-nonehandler="noneLink_Calendar_mainCal_enddate" type="text"> <div class="minicalendar" id="flexDatePicker_Calendar_mainCal_enddate"><div class="pickerheader"><table class="menu"><tbody><tr><td><a class="calJump year back">«</a></td><td><a class="calJump month back">‹</a></td><td class="current">December 2011</td><td><a class="calJump month forward">›</a></td><td><a class="calJump year forward">»</a></td></tr></tbody></table></div><div class="pickercontainer"><div class="table"><div class="row header"><div class="day header">Sun</div><div class="day header">Mon</div><div class="day header">Tue</div><div class="day header">Wed</div><div class="day header">Thu</div><div class="day header">Fri</div><div class="day header">Sat</div></div><div class="row"><div data-date="November 27, 2011" class="day other"><a>27</a></div><div data-date="November 28, 2011" class="day other"><a>28</a></div><div data-date="November 29, 2011" class="day other"><a>29</a></div><div data-date="November 30, 2011" class="day other"><a>30</a></div><div data-date="December 1, 2011" class="day active"><a>1</a></div><div data-date="December 2, 2011" class="day"><a>2</a></div><div data-date="December 3, 2011" class="day"><a>3</a></div></div><div class="row"><div data-date="December 4, 2011" class="day"><a>4</a></div><div data-date="December 5, 2011" class="day"><a>5</a></div><div data-date="December 6, 2011" class="day"><a>6</a></div><div data-date="December 7, 2011" class="day"><a>7</a></div><div data-date="December 8, 2011" class="day"><a>8</a></div><div data-date="December 9, 2011" class="day"><a>9</a></div><div data-date="December 10, 2011" class="day"><a>10</a></div></div><div class="row"><div data-date="December 11, 2011" class="day"><a>11</a></div><div data-date="December 12, 2011" class="day"><a>12</a></div><div data-date="December 13, 2011" class="day"><a>13</a></div><div data-date="December 14, 2011" class="day"><a>14</a></div><div data-date="December 15, 2011" class="day"><a>15</a></div><div data-date="December 16, 2011" class="day"><a>16</a></div><div data-date="December 17, 2011" class="day"><a>17</a></div></div><div class="row"><div data-date="December 18, 2011" class="day"><a>18</a></div><div data-date="December 19, 2011" class="day"><a>19</a></div><div data-date="December 20, 2011" class="day"><a>20</a></div><div data-date="December 21, 2011" class="day"><a>21</a></div><div data-date="December 22, 2011" class="day"><a>22</a></div><div data-date="December 23, 2011" class="day"><a>23</a></div><div data-date="December 24, 2011" class="day"><a>24</a></div></div><div class="row"><div data-date="December 25, 2011" class="day"><a>25</a></div><div data-date="December 26, 2011" class="day"><a>26</a></div><div data-date="December 27, 2011" class="day"><a>27</a></div><div data-date="December 28, 2011" class="day"><a>28</a></div><div data-date="December 29, 2011" class="day"><a>29</a></div><div data-date="December 30, 2011" class="day"><a>30</a></div><div data-date="December 31, 2011" class="day"><a>31</a></div></div></div></div></div> </div> </div> <div class="full"> <div class="pad"> <ul class="periodshortcuts"> <li><a class="internal" data-range="April 12, 2015|April 18, 2015">This Week</a></li> <li><a class="internal" data-range="April 19, 2015|April 25, 2015">Next Week</a></li> <li><a class="internal" data-range="April 5, 2015|April 11, 2015">Last Week</a></li> <li><a class="internal" data-range="April 1, 2015|April 30, 2015">This Month</a></li> <li><a class="internal" data-range="May 1, 2015|May 31, 2015">Next Month</a></li> <li><a class="internal" data-range="March 1, 2015|March 31, 2015">Last Month</a></li> </ul> </div> </div> </div> <table> <tbody><tr> <td class="flexOptionsError"></td> <td class="flexSubmitButtons"> <input class="button flexOptionsSubmit" name="flexSettings" value="Apply Settings" type="submit"> <input class="button flexCancelOptions" value="Cancel" type="button"> </td> <td class="flexDefaults"></td> </tr> </tbody></table> </div> </div> <div class="half last"> <div class="shell flexfilters"> <div class="frame"> <table class="pad"> <tbody><tr> <td class="pad expectedimpact" width="65%"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Impact Selected"> <p class="title"> <strong>Expected Impact</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][impacts]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][impacts]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck impacts"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][high]" id="flex[Calendar_mainCal][impacts]_high" value="high" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_high"><span class="impact high" title="High Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][medium]" id="flex[Calendar_mainCal][impacts]_medium" value="medium" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_medium"><span class="impact medium" title="Medium Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][low]" id="flex[Calendar_mainCal][impacts]_low" value="low" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_low"><span class="impact low" title="Low Impact Expected"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][impacts][holiday]" id="flex[Calendar_mainCal][impacts]_holiday" value="holiday" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][impacts]_holiday"><span class="impact holiday" title="Non-Economic"></span></label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> </table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> <td class="pad currencies" rowspan="2" width="35%"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Currencies Selected"> <p class="title"> <strong>Currencies</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][currencies]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][currencies]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][currencies][aud]" id="flex[Calendar_mainCal][currencies]_aud" value="aud" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_aud">AUD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][cad]" id="flex[Calendar_mainCal][currencies]_cad" value="cad" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_cad">CAD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][chf]" id="flex[Calendar_mainCal][currencies]_chf" value="chf" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_chf">CHF</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][cny]" id="flex[Calendar_mainCal][currencies]_cny" value="cny" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_cny">CNY</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][eur]" id="flex[Calendar_mainCal][currencies]_eur" value="eur" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_eur">EUR</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][gbp]" id="flex[Calendar_mainCal][currencies]_gbp" value="gbp" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_gbp">GBP</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][jpy]" id="flex[Calendar_mainCal][currencies]_jpy" value="jpy" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_jpy">JPY</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][nzd]" id="flex[Calendar_mainCal][currencies]_nzd" value="nzd" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_nzd">NZD</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][currencies][usd]" id="flex[Calendar_mainCal][currencies]_usd" value="usd" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][currencies]_usd">USD</label></td> </tr> </tbody></table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> </tr> <tr> <td class="pad"> <div class="flexErrorCheck" data-type="requireOneCheck" data-error="No Types Selected"> <p class="title"> <strong>Event Types</strong>
(<a class="toggleOptions internal" data-target="flex[Calendar_mainCal][eventtypes]" data-toggle="all">all</a>, <a class="toggleOptions internal" data-target="flex[Calendar_mainCal][eventtypes]" data-toggle="none">none</a>)
</p> <table class="arrayCheckbox requireOneCheck"> <tbody><tr> <td> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][eventtypes][growth]" id="flex[Calendar_mainCal][eventtypes]_growth" value="growth" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_growth">Growth</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][inflation]" id="flex[Calendar_mainCal][eventtypes]_inflation" value="inflation" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_inflation">Inflation</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][employment]" id="flex[Calendar_mainCal][eventtypes]_employment" value="employment" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_employment">Employment</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][centralbank]" id="flex[Calendar_mainCal][eventtypes]_centralbank" value="centralbank" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_centralbank">Central Bank</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][bonds]" id="flex[Calendar_mainCal][eventtypes]_bonds" value="bonds" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_bonds">Bonds</label></td> </tr> </tbody></table> </td> <td valign="top"> <table class="items"> <tbody><tr> <td><input name="flex[Calendar_mainCal][eventtypes][housing]" id="flex[Calendar_mainCal][eventtypes]_housing" value="housing" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_housing">Housing</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][sentiment]" id="flex[Calendar_mainCal][eventtypes]_sentiment" value="sentiment" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_sentiment">Consumer Surveys</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][pmi]" id="flex[Calendar_mainCal][eventtypes]_pmi" value="pmi" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_pmi">Business Surveys</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][speeches]" id="flex[Calendar_mainCal][eventtypes]_speeches" value="speeches" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_speeches">Speeches</label></td> </tr> <tr> <td><input name="flex[Calendar_mainCal][eventtypes][misc]" id="flex[Calendar_mainCal][eventtypes]_misc" value="misc" checked="checked" data-isdefault="true" class="requireOneCheck" type="checkbox"></td> <td class="full"><label for="flex[Calendar_mainCal][eventtypes]_misc">Misc</label></td> </tr> </tbody></table> </td> </tr> </tbody></table> <input name="flex[Calendar_mainCal][_cbarray_]" value="1" type="hidden"> </div> </td> </tr> </tbody></table> </div> <table> <tbody><tr> <td class="flexFilterError"></td> <td class="flexSubmitButtons"> <input class="button flexFilterSubmit" name="flexFilters" value="Apply Filter" type="submit"> <input class="button flexCancelFilters" value="Cancel" type="button"> </td> <td class="flexDefaults"></td> </tr> </tbody></table> </div> </div> </div> </form> <table> <thead> <tr> <th class="col1">Date</th> <th class="col2"><a href="timezone.php" title="Time Options">9:34pm</a></th> <th class="col3">Currency</th> <th class="col4">Impact</th> <th class="col5"> </th> <th class="col6">Detail</th> <th class="col7">Actual</th> <th class="col8">Forecast</th> <th class="col9">Previous</th> <th class="col10">Graph</th> </tr> </thead> <tbody><tr class="borderfix"><td></td></tr> <tr class="calendar_row newday" data-eventid="36121"> <td class="date"><span class="date">Thu<span>Dec 1</span></span></td> <td class="time">1:30am</td> <td class="currency">AUD</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Commodity Prices y/y</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
18.1%
</td> <td class="forecast"></td> <td class="previous"><span class="revised" title="Revised From 19.4%">19.6%</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="35311"> <td class="date"></td> <td class="time">2:45am</td> <td class="currency">CHF</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>GDP q/q</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
0.2%
</td> <td class="forecast">0.2%</td> <td class="previous"><span class="revised better" title="Revised From 0.4%">0.5%</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr><tr class="details " data-eventid="35311"><td align="center"></td><td colspan="8" class="calendar_detail_cell details nest" align="center"></td><td align="center"></td></tr> <tr class="calendar_row" data-eventid="41782"> <td class="date"></td> <td class="time">4:00am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>ECB President Draghi Speaks</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="43848"> <td class="date"></td> <td class="time">4:15am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Spanish Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
43.8
</td> <td class="forecast"></td> <td class="previous">43.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="35078"> <td class="date"></td> <td class="time">4:30am</td> <td class="currency">CHF</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> <span class="worse">44.8</span> </td> <td class="forecast">46.6</td> <td class="previous">46.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr><tr class="details " data-eventid="35078"><td align="center"></td><td colspan="8" class="calendar_detail_cell details nest" align="center"></td><td align="center"></td></tr> <tr class="calendar_row" data-eventid="43502"> <td class="date"></td> <td class="time">4:45am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>Italian Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual"> <span class="better">44.0</span> </td> <td class="forecast">42.8</td> <td class="previous">43.3</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="58942"> <td class="date"></td> <td class="time">4:50am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>French Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
47.3
</td> <td class="forecast">47.6</td> <td class="previous">47.6</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="59001"> <td class="date"></td> <td class="time">4:55am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>German Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
47.9
</td> <td class="forecast">48.0</td> <td class="previous">47.9</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="33221"> <td class="date"></td> <td class="time">5:00am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>Final Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual">
46.4
</td> <td class="forecast">46.4</td> <td class="previous">46.4</td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row" data-eventid="33165"> <td class="date"></td> <td class="time">5:30am</td> <td class="currency">GBP</td> <td class="impact"> <span title="High Impact Expected" class="high"></span> </td> <td class="event"><span>Manufacturing PMI</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> <span class="better">47.6</span> </td> <td class="forecast">47.1</td> <td class="previous"><span class="revised better" title="Revised From 47.4">47.8</span></td> <td class="graph"><a title="Open Graph" class="calendar_chart"></a></td> </tr> <tr class="calendar_row nogrid" data-eventid="57061"> <td class="date"></td> <td class="time"></td> <td class="currency">GBP</td> <td class="impact"> <span title="Low Impact Expected" class="low"></span> </td> <td class="event"><span>FPC Statement</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="42399"> <td class="date"></td> <td class="time">6:01am</td> <td class="currency">EUR</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>French 10-y Bond Auction</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level0" data-level="0"></a></td> <td class="actual">
3.18|3.1
</td> <td class="forecast"></td> <td class="previous">3.22|2.2</td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="35087"> <td class="date"></td> <td class="time">6:30am</td> <td class="currency">GBP</td> <td class="impact"> <span title="Medium Impact Expected" class="medium"></span> </td> <td class="event"><span>BOE Financial Stability Report</span></td> <td class="detail"><a title="Open Detail" class="calendar_detail level1" data-level="1"></a></td> <td class="actual"> </td> <td class="forecast"></td> <td class="previous"></td> <td class="graph"></td> </tr> <tr class="calendar_row" data-eventid="42468"> <td class="date"></td> </tbody></table> <div class="foot"> <ul> <li class="more"> <a href="#" class="flexMore"><span>More</span><span class="loader"></span></a> </li> </ul> </div> </div> </div> </div> </div> </div> </div>
您可以根据里面的th
个元素定位table,例如:
response.xpath("//table[.//th[. = 'Date']]")
您还可以找到 table 检查它是 parents:
response.css("div#flexBox_flex_calendar_mainCal > table")
Scrapy Shell 的工作示例(打印 table 的时间值):
In [1]: for row in response.css("div#flexBox_flex_calendar_mainCal table tr.calendar_row"):
print row.xpath(".//td[@class='time']/text()").extract()
[u'1:30am']
[u'2:45am']
[u'4:00am']
[u'4:15am']
[u'4:30am']
[u'4:45am']
[u'4:50am']
[u'4:55am']
[u'5:00am']
[u'5:30am']
[]
[u'6:01am']
[u'6:30am']
[u'6:36am']