从 Table 获取搜索数据 - 难以捉摸

Question

看似简单的问题变成了彻头彻尾的噩梦。我想从使用 BeautifulSoup 的人那里提取 gitscore。数据位于具有 tr-td 结构的 table 中，其中所需数据位于 td 标记中。

所需格式为：

[3499th of 9999, 90, 20, 70, 0]

这是html:

<div id="results-pane" class="pane">
<h2 id="results-position">3499th of 9999</h2>
<div id="results-score">90</div>
<div id="results-close">×</div>
<table id="results-details">
  <tbody><tr>
    <th>Reputation</th>
    <th>Contribution</th>
    <th>Gist</th>
  </tr>
  <tr>
    <td id="social-score" class="detail-score">20</td>
    <td id="repo-score" class="detail-score">70</td>
    <td id="gist-score" class="detail-score">0</td>
  </tr>
</tbody></table>

我已经尝试了几种解决方法。最后是：

scores = sopa.find("table", {"id": "results-details"})
for s in scores.find_all("td"):
    print s

输出为：

<td class="detail-score" id="social-score"></td>
<td class="detail-score" id="repo-score"></td>
<td class="detail-score" id="gist-score"></td>
>>>

如果我没理解错的话就是没有数据

此外，当我将“.text”添加到 for 循环时，我收到以下错误消息： AttributeError: 'ResultSet' 对象没有属性 'text'

以防万一你想检查网站是：http://www.gitscore.com/user/name

我该如何解决？提前致谢。

Answer 1

这里不需要BeautifulSoup。向 http://www.gitscore.com/user/name/calculate 端点发出 GET 请求并解析结果 JSON:

import requests

headers = {
    'X-Requested-With': 'With:XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.76 Safari/537.36'
}
response = requests.get('http://www.gitscore.com/user/name/calculate', headers=headers)

result = response.json()
scores = result['scores']
print([
    '%s of %s' % (result['position'], result['totalScores']),
    scores['total'],
    scores['repo'],
    scores['user'],
    scores['gist'],
])

这是它为 octocat 打印的内容：

['2 of 9999', 128351, 127316, 914, 121]

Answer 2

我想你想要：

scores = sopa.find("table", {"id": "results-details"})
for s in scores.find_all("td"):
    print s.string

从 Table 获取搜索数据 - 难以捉摸

Getting Search Data from Table - Elusive

html

python

beautifulsoup