Lxml 返回一个空列表
Lxml is returning an empty list
我正在使用 lxml 尝试获得当前 spotify (https://spotifycharts.com/regional) 上的前 10 名点击率。当我 运行 程序时,它 returns 一个空列表 [] 而不是返回 ['song 1'、'song 2' 等]。
import requests
import lxml.html
html = requests.get("https://spotifycharts.com/regional")
doc = lxml.html.fromstring(html.content)
songs = doc.xpath('//div[@id="content"]')[0]
titles = songs.xpath('.//div[@class="chart-table-track"]/text()')
print(titles)
我不确定这是否是 xpath 问题,但是当我去网站上寻找另一个 id 时,没有。此外,id "content" 包含我需要的文本。 "chart-table-track" 也一样。我不确定我是否写错了语法,但我们将不胜感激。
谢谢,
您可以对您的第二个 XPath 进行小的修复,更改为:
titles = songs.xpath('.//div[@class="chart-table-track"]/text()')
对此:
titles = songs.xpath('.//td[@class="chart-table-track"]/*/text()')
并让自己得到那些歌曲名称和艺术家,用它们做点什么:
['Blinding Lights',
'by The Weeknd',
'The Box',
'by Roddy Ricch',
'Dance Monkey',
'by Tones And I',
"Don't Start Now",
'by Dua Lipa',
...
使用requests-html
要简单得多(如果我没记错的话,它在内部使用lxml
):
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://spotifycharts.com/regional')
songs_elements = r.html.find('td.chart-table-track')[:10]
titles = [songs_element.text for songs_element in songs_elements]
print('\n'.join(titles))
输出为:
Blinding Lights by The Weeknd
Dance Monkey by Tones And I
Don't Start Now by Dua Lipa
Roses - Imanbek Remix by SAINt JHN
In Your Eyes by The Weeknd
death bed (coffee for your head) (feat. beabadoobee) by Powfu
Say So by Doja Cat
Intentions (feat. Quavo) by Justin Bieber
Falling by Trevor Daniel
requests-html
也是 Kenneth Reitz 的作品,就像 requests
.
您可以像下面这样尝试从该网页获得前十次点击(rank
和 name
)。我使用 BeautifulSoup
而不是 lxml
库来获取内容。
import requests
from bs4 import BeautifulSoup
html = requests.get("https://spotifycharts.com/regional")
doc = BeautifulSoup(html.content,"lxml")
for items in doc.select('table.chart-table tr')[1:11]:
rank = items.select_one("td.chart-table-position").get_text(strip=True)
name = items.select_one("td.chart-table-track > strong").get_text(strip=True)
print(rank,name)
输出:
1 Blinding Lights
2 The Box
3 Dance Monkey
4 Don't Start Now
5 Roses - Imanbek Remix
6 In Your Eyes
7 death bed (coffee for your head) (feat. beabadoobee)
8 Say So
9 Intentions (feat. Quavo)
10 Falling
我正在使用 lxml 尝试获得当前 spotify (https://spotifycharts.com/regional) 上的前 10 名点击率。当我 运行 程序时,它 returns 一个空列表 [] 而不是返回 ['song 1'、'song 2' 等]。
import requests
import lxml.html
html = requests.get("https://spotifycharts.com/regional")
doc = lxml.html.fromstring(html.content)
songs = doc.xpath('//div[@id="content"]')[0]
titles = songs.xpath('.//div[@class="chart-table-track"]/text()')
print(titles)
我不确定这是否是 xpath 问题,但是当我去网站上寻找另一个 id 时,没有。此外,id "content" 包含我需要的文本。 "chart-table-track" 也一样。我不确定我是否写错了语法,但我们将不胜感激。
谢谢,
您可以对您的第二个 XPath 进行小的修复,更改为:
titles = songs.xpath('.//div[@class="chart-table-track"]/text()')
对此:
titles = songs.xpath('.//td[@class="chart-table-track"]/*/text()')
并让自己得到那些歌曲名称和艺术家,用它们做点什么:
['Blinding Lights',
'by The Weeknd',
'The Box',
'by Roddy Ricch',
'Dance Monkey',
'by Tones And I',
"Don't Start Now",
'by Dua Lipa',
...
使用requests-html
要简单得多(如果我没记错的话,它在内部使用lxml
):
from requests_html import HTMLSession
session = HTMLSession()
r = session.get('https://spotifycharts.com/regional')
songs_elements = r.html.find('td.chart-table-track')[:10]
titles = [songs_element.text for songs_element in songs_elements]
print('\n'.join(titles))
输出为:
Blinding Lights by The Weeknd
Dance Monkey by Tones And I
Don't Start Now by Dua Lipa
Roses - Imanbek Remix by SAINt JHN
In Your Eyes by The Weeknd
death bed (coffee for your head) (feat. beabadoobee) by Powfu
Say So by Doja Cat
Intentions (feat. Quavo) by Justin Bieber
Falling by Trevor Daniel
requests-html
也是 Kenneth Reitz 的作品,就像 requests
.
您可以像下面这样尝试从该网页获得前十次点击(rank
和 name
)。我使用 BeautifulSoup
而不是 lxml
库来获取内容。
import requests
from bs4 import BeautifulSoup
html = requests.get("https://spotifycharts.com/regional")
doc = BeautifulSoup(html.content,"lxml")
for items in doc.select('table.chart-table tr')[1:11]:
rank = items.select_one("td.chart-table-position").get_text(strip=True)
name = items.select_one("td.chart-table-track > strong").get_text(strip=True)
print(rank,name)
输出:
1 Blinding Lights
2 The Box
3 Dance Monkey
4 Don't Start Now
5 Roses - Imanbek Remix
6 In Your Eyes
7 death bed (coffee for your head) (feat. beabadoobee)
8 Say So
9 Intentions (feat. Quavo)
10 Falling