如何用BeautifulSoup获取Python中的具体内容?
How to get the specific content in Python with BeautifulSoup?
我是 Python 的新手,我正在使用 BeautifulSoup 在 Python 中编写一个小爬虫,以便从网页中获取地址。我附上了它的照片
enter image description here
</div>
</div>
<div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
我使用 BeautifulSoup 获得了完整的内容,但我不知道如何提取 "full_address" 的内容。我看到它在 "div" 中,但我不知道下一步该怎么做。
links = soup.find_all('div')
非常感谢!
可以使用json
解析数据:
#!/usr/bin/env python
from bs4 import BeautifulSoup
import json
data = '''
</div>
</div>
<div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
'''
soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
info = json.loads(i.get('data-payload'))
for i in info['props']['locations']:
print i['address']
我是 Python 的新手,我正在使用 BeautifulSoup 在 Python 中编写一个小爬虫,以便从网页中获取地址。我附上了它的照片 enter image description here
</div>
</div>
<div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
我使用 BeautifulSoup 获得了完整的内容,但我不知道如何提取 "full_address" 的内容。我看到它在 "div" 中,但我不知道下一步该怎么做。
links = soup.find_all('div')
非常感谢!
可以使用json
解析数据:
#!/usr/bin/env python
from bs4 import BeautifulSoup
import json
data = '''
</div>
</div>
<div data-integration-name="redux-container" data-payload='{"name":"LocationsMapList","props":{"locations":[{"id":17305,"company_id":106906,"description":"","city":"New York","country":"United States","address":"5 Crosby St 3rd Floor","state":"New York","region":"","latitude":40.719753,"longitude":-74.0001954,"hq":true,"created_at":"2015-01-19T01:32:16.317Z","updated_at":"2016-05-05T07:57:19.282Z","zip_code":"10013","country_code":"US","full_address":"5 Crosby St 3rd Floor, New York, 10013, New York, USA","dirty":false,"to_params":"new-york-us"}]},"storeName":null}' data-rwr-element="true">
'''
soup = BeautifulSoup(data, 'html.parser')
for i in soup.find_all('div', attrs={'data-integration-name':'redux-container'}):
info = json.loads(i.get('data-payload'))
for i in info['props']['locations']:
print i['address']