Python lxml 未拾取标签

Python lxml not picking up tags

你好,我正在尝试通过网络抓取这个选举季的 CNN 初选结果,并用它做一些机器学习。我正在使用 Python 3.5,所以在研究了一下之后,我看到我可以使用 lxml 和 BeautifulSoup 以及请求来完成它。在使用 BeautifulSoup 失败后(我尝试使用 XPath 但它没有拾取它),我尝试使用 lxml。在爱荷华州的主要页面(以及目前的每个州),CNN 根据每个候选人的县和选票百分比对其进行细分。查看 html 页面后,我看到每个县名的存储方式是,县名是 h2 标签的一部分,紧跟在 div 标签之后(连同 class 属性),并且每个县依此类推。因此,我使用 CSSSelector 来尝试捕获(因为 h2 总是在县 div 之后)。 html 部分如下所示:

<div class="race-results__county-header race-results__county-name section-header__column" data-reactid=".0.4.3.0.0.0.0.[=11=].0.[=11=]">
    <h2 class="section-heading" data-reactid=".0.4.3.0.0.0.0.[=11=].0.[=11=].0">Adair</h2>
</div>

代码如下所示:

from lxml import html
import requests

page = requests.get('http://www.cnn.com/election/primaries/counties/ia/Rep').text
doc = html.fromstring(page)
link = doc.cssselect("div h2")
print(link)

但是,当我尝试打印 link 时,什么也没有(只是一个空数组 [])。这是 html 布局、代码或解析器的问题吗?我正在使用 JetBeans 的 PyCharm,但我认为这与它没有任何关系。我对这些东西很陌生,所以任何其他方法将不胜感激。

问题是,该页面不包含您期望的结果,因为它们可能是通过 JavaScript.

呈现的

当我从给定 url 下载内容时,没有 <h2> 元素,但我发现有一条消息:请启用 JavaScript 以查看 CNN's 2016 年选举中心。

您没有获取数据,因为它们不在页面上。

不要被事实搞糊涂了,您的浏览器可能会向您显示 <h2> 元素 - 那是因为 JavaScript 已经将它放入其中。

提示:检查一下,页面加载的是什么JSON文件。很可能,某些文件会为您的任务提供随时可用的数据。在我的网络浏览器中使用 F12(然后刷新页面)我看到许多 JSON 文件,其中一些提供了有关候选人的数据。

例如url: http://data.cnn.com/ELECTION/2016primary/candidates/can1187.json return 以下内容(缩写):

{
  "candidateInfo": {
    "id": 1187,
    "fname": "Mike",
    "lname": "Huckabee",
    "party": "Rep",
    "rd": "1",
    "pd": "0",
    "td": "1",
    "d_nom": 1237,
    "inrace": true,
    "nominee": false,
    "rd_k": "1460",
    "td_k": 2472,
    "dpct": 0,
    "dpct_nom": 50,
    "states": [
      {
        "state": "Alabama",
        "code": "AL",
        "electiondate": "20160301",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Alaska",
        "code": "AK",
        "electiondate": "20160301",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "Arizona",
        "code": "AZ",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Arkansas",
        "code": "AR",
        "electiondate": "20160301",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Iowa",
        "code": "IA",
        "electiondate": "20160201",
        "primarytype": "caucus",
        "candidates": [
          {
            "id": 1187,
            "rd": "1",
            "pd": "0",
            "td": "1",
            "winner": false
          }
        ]
      },
      {
        "state": "Kansas",
        "code": "KS",
        "electiondate": "20160305",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "Kentucky",
        "code": "KY",
        "electiondate": "20160305",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "Louisiana",
        "code": "LA",
        "electiondate": "20160305",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Maine",
        "code": "ME",
        "electiondate": "20160305",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "Maryland",
        "code": "MD",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Massachusetts",
        "code": "MA",
        "electiondate": "20160301",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Michigan",
        "code": "MI",
        "electiondate": "20160308",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Minnesota",
        "code": "MN",
        "electiondate": "20160301",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "Mississippi",
        "code": "MS",
        "electiondate": "20160308",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Missouri",
        "code": "MO",
        "electiondate": "20160315",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Montana",
        "code": "MT",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Nebraska",
        "code": "NE",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Nevada",
        "code": "NV",
        "electiondate": "20160223",
        "primarytype": "caucus",
        "candidates": []
      },
      {
        "state": "New Hampshire",
        "code": "NH",
        "electiondate": "20160209",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "New Jersey",
        "code": "NJ",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "New Mexico",
        "code": "NM",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "New York",
        "code": "NY",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "North Carolina",
        "code": "NC",
        "electiondate": "20160315",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "North Dakota",
        "code": "ND",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Ohio",
        "code": "OH",
        "electiondate": "20160315",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Oklahoma",
        "code": "OK",
        "electiondate": "20160301",
        "primarytype": "primary",
        "candidates": []
      },
      {
        "state": "Oregon",
        "code": "OR",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Virgin Islands",
        "code": "VI",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      },
      {
        "state": "Northern Marianas",
        "code": "MP",
        "electiondate": "",
        "primarytype": "",
        "candidates": []
      }
    ],
    "races": [
      {
        "status": "called",
        "code": "AR",
        "state": "Arkansas",
        "polltype": "exit",
        "primarytype": "primary",
        "cresults": true,
        "cmap": true,
        "xpoll": true,
        "electiondate": "20160301",
        "pctsrep": 100,
        "ts": 1457130949809,
        "racerank": 6,
        "winner": false,
        "vpct": 1,
        "pctDecimal": "1.2",
        "inc": false,
        "votes": 4703,
        "cvotes": "4,703",
        "rd": "0",
        "pd": "0",
        "sd": "0",
        "td": "0",
        "position": 13
      },
      {
        "status": "called",
        "code": "GA",
        "state": "Georgia",
        "polltype": "exit",
        "primarytype": "primary",
        "cresults": true,
        "cmap": true,
        "xpoll": true,
        "electiondate": "20160301",
        "pctsrep": 92,
        "ts": 1457130978961,
        "racerank": 8,
        "winner": false,
        "vpct": 0,
        "pctDecimal": "0.2",
        "inc": false,
        "votes": 2615,
        "cvotes": "2,615",
        "rd": "0",
        "pd": "0",
        "sd": "0",
        "td": "0",
        "position": 13
      },
      {
        "status": "called",
        "code": "TN",
        "state": "Tennessee",
        "polltype": "exit",
        "primarytype": "primary",
        "cresults": true,
        "cmap": true,
        "xpoll": true,
        "electiondate": "20160301",
        "pctsrep": 100,
        "ts": 1457131086792,
        "racerank": 7,
        "winner": false,
        "vpct": 0,
        "pctDecimal": "0.3",
        "inc": false,
        "votes": 2404,
        "cvotes": "2,404",
        "rd": "0",
        "pd": "0",
        "sd": "0",
        "td": "0",
        "position": 15
      },
      {
        "status": "called",
        "code": "IA",
        "state": "Iowa",
        "polltype": "entrance",
        "primarytype": "caucus",
        "cresults": true,
        "cmap": true,
        "xpoll": true,
        "electiondate": "20160201",
        "pctsrep": 99,
        "ts": 1454997428611,
        "racerank": 9,
        "winner": false,
        "vpct": 2,
        "pctDecimal": "1.8",
        "inc": false,
        "votes": 3345,
        "cvotes": "3,345",
        "rd": "1",
        "pd": "0",
        "sd": "1",
        "td": "1",
        "position": 14
      },
      {
        "status": "called",
        "code": "AL",
        "state": "Alabama",
        "polltype": "exit",
        "primarytype": "primary",
        "cresults": true,
        "cmap": true,
        "xpoll": true,
        "electiondate": "20160301",
        "pctsrep": 100,
        "ts": 1456958822650,
        "racerank": 8,
        "winner": false,
        "vpct": 0,
        "pctDecimal": "0.3",
        "inc": false,
        "votes": 2535,
        "cvotes": "2,535",
        "rd": "0",
        "pd": "0",
        "sd": "0",
        "td": "0",
        "position": 13
      }
    ],
    "lts": 1458233488340
  }
}