使用 BeautifulSoup 在 CData 中抓取变量
Scrape variable inside CData with BeautifulSoup
我有一个包含以下数据的网页,我想在该网页的 CData 部分抓取这些数据。
<script type="text/javascript">//<![CDATA[
car.app =
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]
...
...
//]]></script>
我想获取 CData 中的 car.app 变量,但我不确定如何在 python 中解析它。
import bs4 as bs
import urllib.request
class AppURLopener(urllib.request.FancyURLopener):
version = "Mozilla/5.0"
opener = AppURLopener()
response = opener.open(url)
c = response.read()
soup = bs.BeautifulSoup(c, "html.parser")
print(soup)
我认为解决您的问题的唯一方法是使用 BeautifulSoup 解析该特定标记,然后进行一些字符串操作以实现您的目标。
代码:
import bs4 as bs
import urllib.request
c = '''
<script type="text/javascript">//<![CDATA[
car.app =
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]
...
...
//]]></script>
'''
soup = bs.BeautifulSoup(c, "html.parser")
script = soup.find('script')
print(str(script.text).split('car.app =')[1].split('...')[0].replace('\n', ''))
输出:
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]
我有一个包含以下数据的网页,我想在该网页的 CData 部分抓取这些数据。
<script type="text/javascript">//<![CDATA[
car.app =
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]
...
...
//]]></script>
我想获取 CData 中的 car.app 变量,但我不确定如何在 python 中解析它。
import bs4 as bs
import urllib.request
class AppURLopener(urllib.request.FancyURLopener):
version = "Mozilla/5.0"
opener = AppURLopener()
response = opener.open(url)
c = response.read()
soup = bs.BeautifulSoup(c, "html.parser")
print(soup)
我认为解决您的问题的唯一方法是使用 BeautifulSoup 解析该特定标记,然后进行一些字符串操作以实现您的目标。
代码:
import bs4 as bs
import urllib.request
c = '''
<script type="text/javascript">//<![CDATA[
car.app =
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]
...
...
//]]></script>
'''
soup = bs.BeautifulSoup(c, "html.parser")
script = soup.find('script')
print(str(script.text).split('car.app =')[1].split('...')[0].replace('\n', ''))
输出:
{"lat":26.175625,"lon":-80.13808,"zoom":"13","yellow":"\/img\/icons\/yellow.png","cars":[{"CAR_ID":"715383","ID":"538070521","UID":"0","CARNAME":"MAZDA","TYPE_COLOR":"0","LAT":"26.13437","LON":"-80.11906","COURSE":"100","SPEED":"0","LENGTH":"12","STATE":"OH"}]