如何从 json-ld 代码段中抓取数据?
How can I webscrape data from json-ld piece of code?
我正在尝试从这些 json-ld 代码中获取坐标('latitude' 和 'longitude')。
> <script type="application/ld+json">
> {"@context":"http://schema.org","@graph":[
> {"@type":"Place","address":
> {"@type":"PostalAddress","streetAddress":"XX, XX"},"geo":
> {"@type":"GeoCoordinates","latitude":50.08872,"longitude":20.0297}}]}
> </script>
离我最近的是:
req = requests.get(link)
soup = BeautifulSoup(req.text, 'html.parser')
text_ = json.loads("".join(soup.find("script", {"type":"application/ld+json"}).contents)
但即使是这个脚本也给了我之前的 json-ld 代码块(第一个完整的 html 代码)。
即使将 json-ld 块像字符串一样,我也很感激。
谢谢
import json
from bs4 import BeautifulSoup
data = """<script type="application/ld+json">
{"@context":"http://schema.org","@graph":[
{"@type":"Place","address":
{"@type":"PostalAddress","streetAddress":"XX, XX"},"geo":
{"@type":"GeoCoordinates","latitude":50.08872,"longitude":20.0297}}]}
</script>"""
soup = BeautifulSoup(data, 'html.parser')
goal = soup.select_one("script").string
match = json.loads(goal)
print(type(match))
print(match)
<class 'dict'>
{'@context': 'http://schema.org', '@graph': [{'@type': 'Place', 'address': {'@type': 'PostalAddress', 'streetAddress': 'XX, XX'}, 'geo': {'@type': 'GeoCoordinates', 'latitude': 50.08872, 'longitude': 20.0297}}]}
我正在尝试从这些 json-ld 代码中获取坐标('latitude' 和 'longitude')。
> <script type="application/ld+json">
> {"@context":"http://schema.org","@graph":[
> {"@type":"Place","address":
> {"@type":"PostalAddress","streetAddress":"XX, XX"},"geo":
> {"@type":"GeoCoordinates","latitude":50.08872,"longitude":20.0297}}]}
> </script>
离我最近的是:
req = requests.get(link)
soup = BeautifulSoup(req.text, 'html.parser')
text_ = json.loads("".join(soup.find("script", {"type":"application/ld+json"}).contents)
但即使是这个脚本也给了我之前的 json-ld 代码块(第一个完整的 html 代码)。
即使将 json-ld 块像字符串一样,我也很感激。
谢谢
import json
from bs4 import BeautifulSoup
data = """<script type="application/ld+json">
{"@context":"http://schema.org","@graph":[
{"@type":"Place","address":
{"@type":"PostalAddress","streetAddress":"XX, XX"},"geo":
{"@type":"GeoCoordinates","latitude":50.08872,"longitude":20.0297}}]}
</script>"""
soup = BeautifulSoup(data, 'html.parser')
goal = soup.select_one("script").string
match = json.loads(goal)
print(type(match))
print(match)
<class 'dict'>
{'@context': 'http://schema.org', '@graph': [{'@type': 'Place', 'address': {'@type': 'PostalAddress', 'streetAddress': 'XX, XX'}, 'geo': {'@type': 'GeoCoordinates', 'latitude': 50.08872, 'longitude': 20.0297}}]}