强制 xpath 为 return 字符串 lxml
force xpath to return a string lxml
我正在使用 lxml
,我有一个来自 Google 学者的废弃页面。以下是一个最小的工作示例和我尝试过的事情。
In [56]: seed = "https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:machine_learning"
In [60]: page = urllib2.urlopen(seed).read()
In [63]: tree = html.fromstring(page)
In [64]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[1]'
In [65]: tree.xpath(xpath)
#first element returns as list
Out[65]: ["window.location='/citations?view_op\x3dsearch_authors\x26hl\x3den\x26oe\x3dASCII\x26mauthors\x3dlabel:machine_learning\x26after_author\x3dVCoCALPY_v8J\x26astart\x3d10'"]
In [66]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[2]'
#there is no second element
In [67]: tree.xpath(xpath)
Out[67]: []
In [70]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)'
#The list contains only one element
In [71]: tree.xpath(xpath)
Out[71]: ["window.location='/citations?view_op\x3dsearch_authors\x26hl\x3den\x26oe\x3dASCII\x26mauthors\x3dlabel:machine_learning\x26after_author\x3dVCoCALPY_v8J\x26astart\x3d10'"]
根据文档 here,return 值可以是智能字符串,但我无法从 xpath 函数获得字符串输出。如何编写 xpath 以便从 xpath
获得字符串输出
您可以使用 XPath 表达式 string(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)
,在这种情况下您会得到一个简单的字符串值。
我正在使用 lxml
,我有一个来自 Google 学者的废弃页面。以下是一个最小的工作示例和我尝试过的事情。
In [56]: seed = "https://scholar.google.com/citations?view_op=search_authors&hl=en&mauthors=label:machine_learning"
In [60]: page = urllib2.urlopen(seed).read()
In [63]: tree = html.fromstring(page)
In [64]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[1]'
In [65]: tree.xpath(xpath)
#first element returns as list
Out[65]: ["window.location='/citations?view_op\x3dsearch_authors\x26hl\x3den\x26oe\x3dASCII\x26mauthors\x3dlabel:machine_learning\x26after_author\x3dVCoCALPY_v8J\x26astart\x3d10'"]
In [66]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)[2]'
#there is no second element
In [67]: tree.xpath(xpath)
Out[67]: []
In [70]: xpath = '(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)'
#The list contains only one element
In [71]: tree.xpath(xpath)
Out[71]: ["window.location='/citations?view_op\x3dsearch_authors\x26hl\x3den\x26oe\x3dASCII\x26mauthors\x3dlabel:machine_learning\x26after_author\x3dVCoCALPY_v8J\x26astart\x3d10'"]
根据文档 here,return 值可以是智能字符串,但我无法从 xpath 函数获得字符串输出。如何编写 xpath 以便从 xpath
获得字符串输出您可以使用 XPath 表达式 string(/html/body/div[1]/div[4]/div[2]/div/span/button[2]/@onclick)
,在这种情况下您会得到一个简单的字符串值。