处理 BeautifulSoup 响应中的十六进制值?
Dealing with hexadecimal values in BeautifulSoup response?
我正在使用漂亮的汤来抓取一些数据:
url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")
然后,我使用以下方法提取特定信息:
height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
print(height)
按预期工作,正在打印
1,74 m
但是当我尝试使用此函数计算该字符串时:
def format_height(height_string):
return int(height_string.split(" ")[0].replace(',',''))
我收到以下错误:
format_height(height)
Traceback (most recent call last):
File "get_player_info.py", line 73, in <module>
player_info = get_player_info(url)
File "get_player_info.py", line 39, in get_player_info
format_height(height)
File "/Users/kompella/Documents/la-segunda/util.py", line 49, in format_height
return int(height_string.split(" ")[0].replace(',',''))
ValueError: invalid literal for int() with base 10: '174\xa0m'
我想知道我应该如何评估我得到的十六进制值?
一切都很好,解构它们,然后你可以做任何你想做的事情。
import requests
import re
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")
height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
numerals = [int(s) for s in re.findall(r'\b\d+\b', height)]
print (numerals)
#output: [1, 74]
print ("Height is: " + str(numerals[0]) +"."+ str(numerals[1]) +"m")
#output: Height is: 1.75m
print ("Height is: " + str(numerals[0]) + str(numerals[1]) +"cm")
#output: Height is: 175cm
无论如何,这个话题讨论了同样的问题。你可以看看:
ValueError: invalid literal for int() with base 10: ''
使用 attribute=value 选择器来定位高度,然后按原样使用函数
import requests
from bs4 import BeautifulSoup as bs
def format_height(height_string):
return int(height_string.split(" ")[0].replace(',',''))
r = requests.get('https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447', headers = {'User-Agent':'Mozilla.0'})
soup = bs(r.content,'lxml')
height_string = soup.select_one('[itemprop=height]').text
print(format_height(height_string))
我正在使用漂亮的汤来抓取一些数据:
url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")
然后,我使用以下方法提取特定信息:
height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
print(height)
按预期工作,正在打印
1,74 m
但是当我尝试使用此函数计算该字符串时:
def format_height(height_string):
return int(height_string.split(" ")[0].replace(',',''))
我收到以下错误:
format_height(height)
Traceback (most recent call last):
File "get_player_info.py", line 73, in <module>
player_info = get_player_info(url)
File "get_player_info.py", line 39, in get_player_info
format_height(height)
File "/Users/kompella/Documents/la-segunda/util.py", line 49, in format_height
return int(height_string.split(" ")[0].replace(',',''))
ValueError: invalid literal for int() with base 10: '174\xa0m'
我想知道我应该如何评估我得到的十六进制值?
一切都很好,解构它们,然后你可以做任何你想做的事情。
import requests
import re
from bs4 import BeautifulSoup
url = "https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447"
heads = {'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36'}
response = requests.get(url,headers=heads)
soup = BeautifulSoup(response.text, "lxml")
height = soup.find_all("th", string=re.compile("Height:"))[0].findNext("td").text
numerals = [int(s) for s in re.findall(r'\b\d+\b', height)]
print (numerals)
#output: [1, 74]
print ("Height is: " + str(numerals[0]) +"."+ str(numerals[1]) +"m")
#output: Height is: 1.75m
print ("Height is: " + str(numerals[0]) + str(numerals[1]) +"cm")
#output: Height is: 175cm
无论如何,这个话题讨论了同样的问题。你可以看看: ValueError: invalid literal for int() with base 10: ''
使用 attribute=value 选择器来定位高度,然后按原样使用函数
import requests
from bs4 import BeautifulSoup as bs
def format_height(height_string):
return int(height_string.split(" ")[0].replace(',',''))
r = requests.get('https://www.transfermarkt.co.uk/jorge-molina/profil/spieler/94447', headers = {'User-Agent':'Mozilla.0'})
soup = bs(r.content,'lxml')
height_string = soup.select_one('[itemprop=height]').text
print(format_height(height_string))