BeautifulSoup 援助
BeautifulSoup Assistance
我正在尝试抓取以下网站 (https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0),最终有兴趣在每个 'li class="search-result-item"' 中存储一些数据以执行进一步分析。
一个“搜索结果项”的示例
我想在 python 字典中捕获 <h3>
、<span class="plaque-role">
和 <span class="plaque-location">
:
<li class="search-result-item"><a href="/visit/blue-plaques/helen-gwynne-vaughan/"><img class="search-result-image max-width" src="/siteassets/home/visit/blue-plaques/find-a-plaque/blue-plaques-f-j/helen-gwynne-vaughan-plaque.jpg?w=732&h=465&mode=crop&scale=both&cache=always&quality=60&anchor=&WebsiteVersion=20220516171525" alt="" title=""><div class="search-result-info"><h3>GWYNNE-VAUGHAN, Dame Helen (1879-1967)</h3><span class="plaque-role">Botanist and Military Officer</span><span class="plaque-location">Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden</span></div></a></li>
到目前为止,我正在尝试隔离所有“搜索结果项”,但我当前的代码绝对没有打印任何内容。如果有人能帮我解决这个问题并指出正确的方向将每个数据元素存储到 python 字典中,我将不胜感激。
from bs4 import BeautifulSoup
import requests
url = 'https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.prettify())
print(soup.find_all(class_='search-result-item')).get_text()
您没有得到任何结果,因为搜索结果是由 JavaScript 生成的。使用他们从中获取数据的 API 端点。
例如:
import requests
api_url = "https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0&_=1653043005731"
plaques = requests.get(api_url).json()["plaques"]
for plaque in plaques:
print(plaque["title"])
print(plaque["address"])
print(f"https://www.english-heritage.org.uk{plaque['path']}")
print("-" * 80)
输出:
GWYNNE-VAUGHAN, Dame Helen (1879-1967)
Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden
https://www.english-heritage.org.uk/visit/blue-plaques/helen-gwynne-vaughan/
--------------------------------------------------------------------------------
READING, Lady Stella (1894-1971)
41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/stella-lady-reading/
--------------------------------------------------------------------------------
32 SOHO SQUARE
32 Soho Square, Soho, London, W1D 3AP, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/soho-square/
--------------------------------------------------------------------------------
14 BUCKINGHAM STREET
14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/buckingham-street/
--------------------------------------------------------------------------------
ABRAHAMS, Harold (1899-1978)
Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet
https://www.english-heritage.org.uk/visit/blue-plaques/abrahams-harold/
--------------------------------------------------------------------------------
ADAM, ROBERT and HOOD, THOMAS and GALSWORTHY, JOHN and BARRIE, SIR JAMES
1-3 Robert Street, Adelphi, Charing Cross, London, WC2N 6BN, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adam-hood-galsworthy-barrie/
--------------------------------------------------------------------------------
ADAMS, Henry Brooks (1838-1918)
98 Portland Place, Marylebone, London, W1B 1ET, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/united-states-embassy/
--------------------------------------------------------------------------------
ADELPHI, The
The Adelphi Terrace, Charing Cross, London, WC2N 6BJ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adelphi/
--------------------------------------------------------------------------------
ALDRIDGE, Ira (1807-1867)
5 Hamlet Road, Upper Norwood, London, SE19 2AP, London Borough of Bromley
https://www.english-heritage.org.uk/visit/blue-plaques/aldridge-ira/
--------------------------------------------------------------------------------
ALEXANDER, Sir George (1858-1918)
57 Pont Street, Chelsea, London, SW1X 0BD, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/george-alexander/
--------------------------------------------------------------------------------
ALLENBY, Field Marshal Edmund Henry Hynman, Viscount Allenby (1861-1936)
24 Wetherby Gardens, South Kensington, London, SW5 0JR, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/field-marshal-viscount-allenby/
--------------------------------------------------------------------------------
ALMA-TADEMA, Sir Lawrence, O.M. (1836-1912)
44 Grove End Road, St John's Wood, London, NW8 9NE, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/lawrence-alma-tadema/
--------------------------------------------------------------------------------
内容由 JavaScript
动态生成,因此您无法使用 BeautifulSoup
找到您要查找的元素/信息,而是使用他们的 API.
例子
import requests
url = 'https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url).json()
data = []
for e in page['plaques']:
data.append(dict((k,v) for k,v in e.items() if k in ['title','professions','address']))
data
输出
[{'title': 'GWYNNE-VAUGHAN, Dame Helen (1879-1967)', 'address': 'Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden', 'professions': 'Botanist and Military Officer'}, {'title': 'READING, Lady Stella (1894-1971)', 'address': '41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster', 'professions': "Founder of the Women's Voluntary Service"}, {'title': '32 SOHO SQUARE', 'address': '32 Soho Square, Soho, London, W1D 3AP, City Of Westminster', 'professions': 'Botanists'}, {'title': '14 BUCKINGHAM STREET', 'address': '14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster', 'professions': 'Statesman, Diarist, Naval Official, Painter'}, {'title': 'ABRAHAMS, Harold (1899-1978)', 'address': 'Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet', 'professions': 'Athlete'}, ...]
我正在尝试抓取以下网站 (https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0),最终有兴趣在每个 'li class="search-result-item"' 中存储一些数据以执行进一步分析。
一个“搜索结果项”的示例
我想在 python 字典中捕获 <h3>
、<span class="plaque-role">
和 <span class="plaque-location">
:
<li class="search-result-item"><a href="/visit/blue-plaques/helen-gwynne-vaughan/"><img class="search-result-image max-width" src="/siteassets/home/visit/blue-plaques/find-a-plaque/blue-plaques-f-j/helen-gwynne-vaughan-plaque.jpg?w=732&h=465&mode=crop&scale=both&cache=always&quality=60&anchor=&WebsiteVersion=20220516171525" alt="" title=""><div class="search-result-info"><h3>GWYNNE-VAUGHAN, Dame Helen (1879-1967)</h3><span class="plaque-role">Botanist and Military Officer</span><span class="plaque-location">Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden</span></div></a></li>
到目前为止,我正在尝试隔离所有“搜索结果项”,但我当前的代码绝对没有打印任何内容。如果有人能帮我解决这个问题并指出正确的方向将每个数据元素存储到 python 字典中,我将不胜感激。
from bs4 import BeautifulSoup
import requests
url = 'https://www.english-heritage.org.uk/visit/blue-plaques/#?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.prettify())
print(soup.find_all(class_='search-result-item')).get_text()
您没有得到任何结果,因为搜索结果是由 JavaScript 生成的。使用他们从中获取数据的 API 端点。
例如:
import requests
api_url = "https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0&_=1653043005731"
plaques = requests.get(api_url).json()["plaques"]
for plaque in plaques:
print(plaque["title"])
print(plaque["address"])
print(f"https://www.english-heritage.org.uk{plaque['path']}")
print("-" * 80)
输出:
GWYNNE-VAUGHAN, Dame Helen (1879-1967)
Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden
https://www.english-heritage.org.uk/visit/blue-plaques/helen-gwynne-vaughan/
--------------------------------------------------------------------------------
READING, Lady Stella (1894-1971)
41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/stella-lady-reading/
--------------------------------------------------------------------------------
32 SOHO SQUARE
32 Soho Square, Soho, London, W1D 3AP, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/soho-square/
--------------------------------------------------------------------------------
14 BUCKINGHAM STREET
14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/buckingham-street/
--------------------------------------------------------------------------------
ABRAHAMS, Harold (1899-1978)
Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet
https://www.english-heritage.org.uk/visit/blue-plaques/abrahams-harold/
--------------------------------------------------------------------------------
ADAM, ROBERT and HOOD, THOMAS and GALSWORTHY, JOHN and BARRIE, SIR JAMES
1-3 Robert Street, Adelphi, Charing Cross, London, WC2N 6BN, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adam-hood-galsworthy-barrie/
--------------------------------------------------------------------------------
ADAMS, Henry Brooks (1838-1918)
98 Portland Place, Marylebone, London, W1B 1ET, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/united-states-embassy/
--------------------------------------------------------------------------------
ADELPHI, The
The Adelphi Terrace, Charing Cross, London, WC2N 6BJ, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/adelphi/
--------------------------------------------------------------------------------
ALDRIDGE, Ira (1807-1867)
5 Hamlet Road, Upper Norwood, London, SE19 2AP, London Borough of Bromley
https://www.english-heritage.org.uk/visit/blue-plaques/aldridge-ira/
--------------------------------------------------------------------------------
ALEXANDER, Sir George (1858-1918)
57 Pont Street, Chelsea, London, SW1X 0BD, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/george-alexander/
--------------------------------------------------------------------------------
ALLENBY, Field Marshal Edmund Henry Hynman, Viscount Allenby (1861-1936)
24 Wetherby Gardens, South Kensington, London, SW5 0JR, London Borough of Kensington And Chelsea
https://www.english-heritage.org.uk/visit/blue-plaques/field-marshal-viscount-allenby/
--------------------------------------------------------------------------------
ALMA-TADEMA, Sir Lawrence, O.M. (1836-1912)
44 Grove End Road, St John's Wood, London, NW8 9NE, City Of Westminster
https://www.english-heritage.org.uk/visit/blue-plaques/lawrence-alma-tadema/
--------------------------------------------------------------------------------
内容由 JavaScript
动态生成,因此您无法使用 BeautifulSoup
找到您要查找的元素/信息,而是使用他们的 API.
例子
import requests
url = 'https://www.english-heritage.org.uk/api/BluePlaqueSearch/GetMatchingBluePlaques?pageBP=1&sizeBP=12&borBP=0&keyBP=&catBP=0'
page = requests.get(url).json()
data = []
for e in page['plaques']:
data.append(dict((k,v) for k,v in e.items() if k in ['title','professions','address']))
data
输出
[{'title': 'GWYNNE-VAUGHAN, Dame Helen (1879-1967)', 'address': 'Flat 93, Bedford Court Mansions, Fitzrovia, London, WC1B 3AE, London Borough of Camden', 'professions': 'Botanist and Military Officer'}, {'title': 'READING, Lady Stella (1894-1971)', 'address': '41 Tothill Street, London, City of Westminster, SW1H 9LQ, City Of Westminster', 'professions': "Founder of the Women's Voluntary Service"}, {'title': '32 SOHO SQUARE', 'address': '32 Soho Square, Soho, London, W1D 3AP, City Of Westminster', 'professions': 'Botanists'}, {'title': '14 BUCKINGHAM STREET', 'address': '14 Buckingham Street, Covent Garden, London, WC2N 6DF, City Of Westminster', 'professions': 'Statesman, Diarist, Naval Official, Painter'}, {'title': 'ABRAHAMS, Harold (1899-1978)', 'address': 'Hodford Lodge, 2 Hodford Road, Golders Green, London, NW11 8NP, London Borough of Barnet', 'professions': 'Athlete'}, ...]