我如何从带有美丽汤的词汤中得到特定的词组?
How do i get a specific word phrase out of a word soup with beautiful soup?
我已经用 BeautifulSoup 对我的代码进行了排序,结果是:
<bound method Tag.prettify of <script type="text/javascript">var LifeTimeStats = [{"Key":"Top 3","Value":"31"},{"Key":"Top 5s","Value":"36"},{"Key":"Top 3s","Value":"13"},{"Key":"Top 6s","Value":"27"},{"Key":"Top 12s","Value":"76"},{"Key":"Top 25s","Value":"58"},{"Key":"Score","Value":"99,788"},{"Key":"Matches Played","Value":"502"},{"Key":"Wins","Value":"9"},{"Key":"Win%","Value":"2%"},{"Key":"Kills","Value":"730"},{"Key":"K/d","Value":"1.48"}];</script>>
我正在尝试获取特定值 "730"
来自:
{"Key":"Kills","Value":"730"}
因为没有 HTML 个标签我可以排序。我不知道如何获得这个特定值。你有什么想法吗?
也许还有另一种解决方案...
这是完整的代码:
#----WEB INPUT BASIC----
#import bs4
from urllib.request import urlopen as uReq
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
#setting my url
url = 'https://fortnitetracker.com/profile/psn/Rehgum'
#making my https page work
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
urlopen(req).close()
#html parsing
page_soup = soup(webpage, "html.parser")
lifetime = page_soup.findAll("script",{"type":"text/javascript"})
stats = lifetime[3]
specific = stats.prettify
value = specific.text
#from here there is just code to put that value in a .txt file
这只是您可以做什么的想法:
- 将JS代码提取到Python变量中。
- 进行正则表达式操作以提取变量的值。
- "JSONify"这样的变量值。
- 提取您需要的数据。
摘录:
a = '''var LifeTimeStats = [{"Key":"Top 3","Value":"31"},{"Key":"Top 5s","Value":"36"},{"Key":"Top 3s","Value":"13"},{"Key":"Top 6s","Value":"27"},{"Key":"Top 12s","Value":"76"},{"Key":"Top 25s","Value":"58"},{"Key":"Score","Value":"99,788"},{"Key":"Matches Played","Value":"502"},{"Key":"Wins","Value":"9"},{"Key":"Win%","Value":"2%"},{"Key":"Kills","Value":"730"},{"Key":"K/d","Value":"1.48"}];'''
b = re.findall(r'var.*?=\s*(.*?);', a)[0]
c = json.loads(b)
看我写的dummy full code
更新
看到完整代码后...This 可能是您问题的解决方案。
我终于成功了!
导致我出错的是 "def loop():" 部分。
这是最终的工作代码:
def loop():
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import json
import re
import time
#setting my url
url = 'https://fortnitetracker.com/profile/psn/Rehgum'
#making my https page work
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
urlopen(req).close()
#html parsing
page_soup = soup(webpage, "html.parser")
lifetime = page_soup.findAll("script",{"type":"text/javascript"})
stats = lifetime[3]
stats_var = re.findall(r'var.*?=\s*(.*?);', stats.text)[0]
vals = json.loads(stats_var)
for val in vals:
if val['Key'] == 'Kills':
num_kills = val['Value']
break
print('Num kills = {}'.format(num_kills))
with open('lifetime_wins.txt', 'w') as fd:
fd.write(str(num_kills))
time.sleep(30)
loop()
for i in range(1,2):
loop()
while i<1:
print ("Ende")
大 "Thank you" @kazbeel。你拯救了我的一天! +代表
我已经用 BeautifulSoup 对我的代码进行了排序,结果是:
<bound method Tag.prettify of <script type="text/javascript">var LifeTimeStats = [{"Key":"Top 3","Value":"31"},{"Key":"Top 5s","Value":"36"},{"Key":"Top 3s","Value":"13"},{"Key":"Top 6s","Value":"27"},{"Key":"Top 12s","Value":"76"},{"Key":"Top 25s","Value":"58"},{"Key":"Score","Value":"99,788"},{"Key":"Matches Played","Value":"502"},{"Key":"Wins","Value":"9"},{"Key":"Win%","Value":"2%"},{"Key":"Kills","Value":"730"},{"Key":"K/d","Value":"1.48"}];</script>>
我正在尝试获取特定值 "730" 来自:
{"Key":"Kills","Value":"730"}
因为没有 HTML 个标签我可以排序。我不知道如何获得这个特定值。你有什么想法吗?
也许还有另一种解决方案... 这是完整的代码:
#----WEB INPUT BASIC----
#import bs4
from urllib.request import urlopen as uReq
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
#setting my url
url = 'https://fortnitetracker.com/profile/psn/Rehgum'
#making my https page work
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
urlopen(req).close()
#html parsing
page_soup = soup(webpage, "html.parser")
lifetime = page_soup.findAll("script",{"type":"text/javascript"})
stats = lifetime[3]
specific = stats.prettify
value = specific.text
#from here there is just code to put that value in a .txt file
这只是您可以做什么的想法:
- 将JS代码提取到Python变量中。
- 进行正则表达式操作以提取变量的值。
- "JSONify"这样的变量值。
- 提取您需要的数据。
摘录:
a = '''var LifeTimeStats = [{"Key":"Top 3","Value":"31"},{"Key":"Top 5s","Value":"36"},{"Key":"Top 3s","Value":"13"},{"Key":"Top 6s","Value":"27"},{"Key":"Top 12s","Value":"76"},{"Key":"Top 25s","Value":"58"},{"Key":"Score","Value":"99,788"},{"Key":"Matches Played","Value":"502"},{"Key":"Wins","Value":"9"},{"Key":"Win%","Value":"2%"},{"Key":"Kills","Value":"730"},{"Key":"K/d","Value":"1.48"}];'''
b = re.findall(r'var.*?=\s*(.*?);', a)[0]
c = json.loads(b)
看我写的dummy full code
更新
看到完整代码后...This 可能是您问题的解决方案。
我终于成功了! 导致我出错的是 "def loop():" 部分。
这是最终的工作代码:
def loop():
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as soup
import json
import re
import time
#setting my url
url = 'https://fortnitetracker.com/profile/psn/Rehgum'
#making my https page work
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
urlopen(req).close()
#html parsing
page_soup = soup(webpage, "html.parser")
lifetime = page_soup.findAll("script",{"type":"text/javascript"})
stats = lifetime[3]
stats_var = re.findall(r'var.*?=\s*(.*?);', stats.text)[0]
vals = json.loads(stats_var)
for val in vals:
if val['Key'] == 'Kills':
num_kills = val['Value']
break
print('Num kills = {}'.format(num_kills))
with open('lifetime_wins.txt', 'w') as fd:
fd.write(str(num_kills))
time.sleep(30)
loop()
for i in range(1,2):
loop()
while i<1:
print ("Ende")
大 "Thank you" @kazbeel。你拯救了我的一天! +代表