通过 XPATH 通过 LXML 获取元素
Getting an element via LXML by XPATH
我正在库中为 python、discord.py.
编写 Discord 机器人
我不需要这方面的帮助,但需要从网站上抓取一些信息。
@commands.command(aliases=["rubyuserinfo"])
async def rubyinfo(self, ctx, input):
HEADERS = {
'User-Agent' : 'Magic Browser'
}
url = f'https://rubyrealms.com/user/{input}/'
async with aiohttp.request("GET", url, headers=HEADERS) as response:
if response.status == 200:
print("Site is working!")
content = await response.text()
soup = BeautifulSoup(content, "html.parser")
page = requests.get(url)
tree = html.fromstring(page.content)
stuff = tree.xpath('/html/body/div[4]/div/div[3]/div[3]/div/div[2]/div[1]/div[2]/div/p')
print(stuff)
else:
print(f"The request was invalid\nStatus code: {response.status}")
我正在寻找的网站是“https://rubyrealms.com/user/{input}/”,在 运行 h!rubyinfo USERNAME 更改 link 到 https://rubyrealms.com/user/username/.
在网站上,我想得到的是他们的 BIO,其 XPATH 为
"//*[@id="content-wrap"]/div[3]/div[3]/div/div[2]/div[1]/div[2]/div/p"
元素所在的位置:
<p class="margin-none font-color">
Hey! My name is KOMKO190, you maybe know me from the forums or discord. I am a programmer, I know a bit of JavaScript, small portion of C++, Python and html/css. Mostly python. My user ID is 7364. ||| 5th owner of Space Helmet :) </p>
任何关于我如何抓取的帮助?我的机器人给出的唯一回应是“[]”
下面怎么样,使用.select()方法
from bs4 import BeautifulSoup
html = '<p class="margin-none font-color">Hey! My name is KOMKO190 :) </p>'
soup = BeautifulSoup(html, features="lxml")
element = soup.select('p.margin-none')[0]
print(element.text)
打印出来
Hey! My name is KOMKO190 :)
from bs4 import BeautifulSoup as bs
url = 'https://rubyrealms.com/user/username/'
session = requests.Session()
request = session.get(url=url)
if request.status_code == 200:
soup = bs(request.text, 'lxml')
print(soup.find('p', class_='margin-none font-color').text)
else:
print(request.status_code)
您需要安装
pip install lxml
pip install beautifulsoup4
将您的 XPath 表达式更改为相关表达式:
from lxml import html
import requests
page = requests.get('https://www.rubyrealms.com/user/KOMKO190/')
tree = html.fromstring(page.content)
stuff = tree.xpath('normalize-space(//h3[.="Bio"]/following-sibling::p/text())')
print (stuff)
输出:
Hey! My name is KOMKO190, you maybe know me from the forums or discord. I am a programmer, I know a bit of JavaScript, small portion of C++, Python and html/css. Mostly python. My user ID is 7364. ||| 5th owner of Space Helmet :)
我正在库中为 python、discord.py.
编写 Discord 机器人我不需要这方面的帮助,但需要从网站上抓取一些信息。
@commands.command(aliases=["rubyuserinfo"])
async def rubyinfo(self, ctx, input):
HEADERS = {
'User-Agent' : 'Magic Browser'
}
url = f'https://rubyrealms.com/user/{input}/'
async with aiohttp.request("GET", url, headers=HEADERS) as response:
if response.status == 200:
print("Site is working!")
content = await response.text()
soup = BeautifulSoup(content, "html.parser")
page = requests.get(url)
tree = html.fromstring(page.content)
stuff = tree.xpath('/html/body/div[4]/div/div[3]/div[3]/div/div[2]/div[1]/div[2]/div/p')
print(stuff)
else:
print(f"The request was invalid\nStatus code: {response.status}")
我正在寻找的网站是“https://rubyrealms.com/user/{input}/”,在 运行 h!rubyinfo USERNAME 更改 link 到 https://rubyrealms.com/user/username/.
在网站上,我想得到的是他们的 BIO,其 XPATH 为
"//*[@id="content-wrap"]/div[3]/div[3]/div/div[2]/div[1]/div[2]/div/p"
元素所在的位置:
<p class="margin-none font-color">
Hey! My name is KOMKO190, you maybe know me from the forums or discord. I am a programmer, I know a bit of JavaScript, small portion of C++, Python and html/css. Mostly python. My user ID is 7364. ||| 5th owner of Space Helmet :) </p>
任何关于我如何抓取的帮助?我的机器人给出的唯一回应是“[]”
下面怎么样,使用.select()方法
from bs4 import BeautifulSoup
html = '<p class="margin-none font-color">Hey! My name is KOMKO190 :) </p>'
soup = BeautifulSoup(html, features="lxml")
element = soup.select('p.margin-none')[0]
print(element.text)
打印出来
Hey! My name is KOMKO190 :)
from bs4 import BeautifulSoup as bs
url = 'https://rubyrealms.com/user/username/'
session = requests.Session()
request = session.get(url=url)
if request.status_code == 200:
soup = bs(request.text, 'lxml')
print(soup.find('p', class_='margin-none font-color').text)
else:
print(request.status_code)
您需要安装
pip install lxml
pip install beautifulsoup4
将您的 XPath 表达式更改为相关表达式:
from lxml import html
import requests
page = requests.get('https://www.rubyrealms.com/user/KOMKO190/')
tree = html.fromstring(page.content)
stuff = tree.xpath('normalize-space(//h3[.="Bio"]/following-sibling::p/text())')
print (stuff)
输出:
Hey! My name is KOMKO190, you maybe know me from the forums or discord. I am a programmer, I know a bit of JavaScript, small portion of C++, Python and html/css. Mostly python. My user ID is 7364. ||| 5th owner of Space Helmet :)