如何使用 BeautifulSoup 从 python 网页中抓取数据?
How do I scrape data from this webpage with python using BeautifulSoup?
如何使用 BeautifulSoup 从 python 网页抓取数据?我特别想获取最后一个 table,其中包括所有社区的数据。
请关注这个link我认为它会解决你的问题。
https://towardsdatascience.com/web-scraping-scraping-table-data-1665b6b2271c
您可以使用此示例加载数据:
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0",
"Accept-Language": "en-US,en;q=0.5",
}
r = requests.get(
"https://www.zolo.ca/toronto-real-estate/neighbourhoods", headers=headers
)
df = pd.read_html(r.text)[-1]
df.iloc[:, 0] = df.iloc[:, 0].str.replace(r"(\d+)(.*)", r" ")
print(df)
打印:
Neighbourhood (# Rank out of 143) Sold under 10d Sold above asking Average sale price Active listings
0 19 Agincourt North 72% 73% 1K 25
1 71 Agincourt South-Malvern West 53% 60% 2K 42
2 59 Alderwood 63% 53% .3M 30
3 141 Annex 31% 20% .5M 129
4 107 Banbury-Don Mills 46% 36% .3M 55
5 80 Bathurst Manor 51% 51% .3M 16
6 142 Bay Street Corridor 28% 19% 7K 154
7 113 Bayview Village 42% 38% 4K 72
8 66 Bayview Woods-Steeles 60% 48% .3M 9
9 91 Bedford Park-Nortown 54% 34% .4M 57
10 5 Beechborough-Greenbrook 86% 86% 2K 5
11 64 Bendale 58% 58% 6K 30
12 63 Birchcliffe-Cliffside 58% 59% .0M 40
13 112 Black Creek 40% 45% 0K 6
14 10 Blake-Jones 82% 73% .3M 6
15 70 Briar Hill-Belgravia 56% 50% 2K 13
16 140 Bridle Path-Sunnybrook-York Mills 31% 21% .4M 53
17 42 Broadview North 67% 57% .1M 6
18 134 Brookhaven-Amesbury 33% 33% 3K 11
19 83 Cabbagetown-South St. James Town 53% 45% .2M 20
20 45 Caledonia-Fairbank 63% 67% 8K 12
21 108 Casa Loma 45% 34% .7M 36
22 11 Centennial Scarborough 80% 75% .1M 8
23 135 Church-Yonge Corridor 35% 28% 6K 172
24 9 Clairlea-Birchmount 78% 89% 9K 13
25 96 Clanton Park 47% 47% .0M 36
26 35 Cliffcrest 68% 66% .3M 19
27 47 Corso Italia-Davenport 63% 63% .2M 18
28 114 Crescent Town 44% 32% 6K 16
29 8 Danforth 84% 74% .3M 6
30 6 Danforth Village-East York 85% 76% .3M 16
31 105 Don Valley Village 49% 33% 4K 25
32 73 Dorset Park 51% 60% 1K 20
33 38 Dovercourt-Wallace Emerson-Junction 69% 57% .1M 41
34 89 Downsview-Roding-CFB 50% 48% 9K 64
35 30 Dufferin Grove 70% 67% .1M 9
36 24 East End-Danforth 73% 67% .1M 13
37 7 East York 84% 76% .4M 12
38 136 Edenbridge-Humber Valley 37% 22% .5M 31
39 51 Eglinton East 60% 71% 6K 14
40 143 Elms-Old Rexdale 15% 30% 1K 10
41 77 Englemount-Lawrence 55% 42% .2M 27
42 44 Eringate-Centennial-West Deane 66% 58% .1M 15
43 103 Etobicoke West Mall 45% 45% 5K 16
44 132 Flemingdon Park 33% 36% 1K 31
45 68 Forest Hill North 60% 40% .8M 9
46 121 Forest Hill South 41% 34% .3M 29
47 40 Glenfield-Jane Heights 65% 63% 1K 14
48 20 Greenwood-Coxwell 71% 74% .3M 21
49 18 Guildwood 72% 76% 6K 16
50 119 Henry Farm 42% 35% 3K 40
51 81 High Park North 53% 44% .2M 16
52 94 High Park-Swansea 48% 47% .2M 23
53 27 Highland Creek 73% 64% .4M 20
54 16 Hillcrest Village 76% 76% .1M 25
55 85 Humber Heights 55% 36% 9K 22
56 87 Humber Summit 50% 50% 4K 9
57 1 Humberlea-Pelmo Park W4 100% 100% 2K 5
58 32 Humberlea-Pelmo Park W5 68% 71% 2K 10
59 95 Humbermede 45% 55% 3K 8
60 53 Humewood-Cedarvale 65% 55% .5M 4
61 50 Ionview 57% 79% 4K 7
62 Islington 0% 0% [=11=] 2
63 111 Islington-City Centre West 42% 41% 2K 65
64 34 Junction Area 67% 72% .0M 9
65 86 Keelesdale-Eglinton West 48% 57% 6K 13
66 54 Kennedy Park 62% 59% 9K 16
67 139 Kensington-Chinatown 29% 32% 9K 37
68 106 Kingsview Village-The Westway 46% 40% 2K 33
69 118 Kingsway South 42% 35% .3M 21
70 37 L'Amoreaux 65% 72% 3K 36
71 23 Lambton Baby Point 71% 71% .7M 4
72 131 Lansing-Westgate 35% 31% .2M 30
73 61 Lawrence Park North 61% 53% .9M 10
74 41 Lawrence Park South 71% 45% .1M 31
75 72 Leaside 57% 43% .1M 28
76 97 Little Portugal 48% 43% 9K 33
77 69 Long Branch 58% 45% 9K 23
78 58 Malvern 58% 65% 2K 51
79 101 Maple Leaf 47% 41% .2M 11
80 98 Markland Wood 48% 41% 0K 10
81 33 Milliken 66% 75% 0K 27
82 99 Mimico 48% 40% 0K 131
83 46 Morningside 59% 76% 7K 10
84 93 Moss Park 50% 43% 1K 53
85 110 Mount Dennis 35% 61% 6K 18
86 128 Mount Olive-Silverstone-Jamestown 33% 39% 7K 21
87 76 Mount Pleasant East 55% 45% .4M 34
88 133 Mount Pleasant West 35% 29% 0K 66
89 48 New Toronto 63% 63% .2M 11
90 120 Newtonbrook East 45% 25% .6M 44
91 88 Newtonbrook West 52% 43% .1M 60
92 92 Niagara 49% 44% 4K 78
93 21 North Riverdale 70% 78% .8M 7
94 125 North St. James Town 36% 36% 0K 8
95 43 O'Connor-Parkview 67% 57% .4M 12
96 36 Oakridge 69% 62% 8K 8
97 39 Oakwood-Vaughan 69% 54% .1M 25
98 60 Palmerston-Little Italy 67% 39% .3M 9
99 74 Parkwoods-Donalda 52% 55% 1K 29
100 14 Playter Estates-Danforth 80% 70% .6M 1
101 67 Pleasant View 56% 60% 7K 11
102 28 Princess-Rosethorn 73% 64% .8M 8
103 90 Regent Park 51% 43% 4K 19
104 26 Rexdale-Kipling 70% 73% 6K 9
105 79 Rockcliffe-Smythe 51% 54% 1K 22
106 29 Roncesvalles 72% 65% .4M 22
107 124 Rosedale-Moore Park 42% 24% .1M 73
108 2 Rouge E10 94% 94% .0M 7
109 22 Rouge E11 68% 81% 4K 24
110 3 Runnymede-Bloor West Village 91% 82% .6M 7
111 12 Rustic 71% 100% .0M 8
112 84 Scarborough Village 47% 61% 0K 28
113 100 South Parkdale 46% 46% .0M 5
114 25 South Riverdale 72% 69% .1M 52
115 102 St. Andrew-Windfields 48% 36% .6M 55
116 49 Steeles 63% 63% .1M 24
117 78 Stonegate-Queensway 53% 48% .3M 33
118 56 Tam O'Shanter-Sullivan 61% 62% 7K 18
119 15 The Beaches 80% 70% .7M 43
120 65 Thistletown-Beaumonde Heights 57% 57% .1M 8
121 138 Thorncliffe Park 36% 18% 2K 2
122 104 Trinity-Bellwoods 44% 46% .6M 21
123 116 University 44% 29% .2M 17
124 31 Victoria Village 71% 63% 0K 11
125 123 Waterfront Communities C1 41% 29% 7K 377
126 130 Waterfront Communities C8 35% 31% 2K 87
127 57 West Hill 60% 65% 9K 35
128 117 West Humber-Clairville 41% 38% 5K 27
129 115 Westminster-Branson 41% 40% 6K 27
130 62 Weston 58% 60% 0K 16
131 75 Weston-Pellam Park 53% 53% 9K 10
132 55 Wexford-Maryvale 62% 60% 0K 21
133 126 Willowdale East 36% 32% 3K 139
134 122 Willowdale West 42% 32% .1M 44
135 52 Willowridge-Martingrove-Richview 63% 63% .1M 19
136 82 Woburn 50% 54% 6K 36
137 13 Woodbine Corridor 79% 79% .2M 4
138 4 Woodbine-Lumsden 86% 86% .1M 8
139 17 Wychwood 78% 63% .4M 8
140 109 Yonge-Eglinton 44% 38% .5M 15
141 137 Yonge-St. Clair 36% 23% .6M 33
142 129 York University Heights 34% 33% 3K 23
143 127 Yorkdale-Glen Park 36% 33% 2K 43
如何使用 BeautifulSoup 从 python 网页抓取数据?我特别想获取最后一个 table,其中包括所有社区的数据。
请关注这个link我认为它会解决你的问题。
https://towardsdatascience.com/web-scraping-scraping-table-data-1665b6b2271c
您可以使用此示例加载数据:
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0",
"Accept-Language": "en-US,en;q=0.5",
}
r = requests.get(
"https://www.zolo.ca/toronto-real-estate/neighbourhoods", headers=headers
)
df = pd.read_html(r.text)[-1]
df.iloc[:, 0] = df.iloc[:, 0].str.replace(r"(\d+)(.*)", r" ")
print(df)
打印:
Neighbourhood (# Rank out of 143) Sold under 10d Sold above asking Average sale price Active listings
0 19 Agincourt North 72% 73% 1K 25
1 71 Agincourt South-Malvern West 53% 60% 2K 42
2 59 Alderwood 63% 53% .3M 30
3 141 Annex 31% 20% .5M 129
4 107 Banbury-Don Mills 46% 36% .3M 55
5 80 Bathurst Manor 51% 51% .3M 16
6 142 Bay Street Corridor 28% 19% 7K 154
7 113 Bayview Village 42% 38% 4K 72
8 66 Bayview Woods-Steeles 60% 48% .3M 9
9 91 Bedford Park-Nortown 54% 34% .4M 57
10 5 Beechborough-Greenbrook 86% 86% 2K 5
11 64 Bendale 58% 58% 6K 30
12 63 Birchcliffe-Cliffside 58% 59% .0M 40
13 112 Black Creek 40% 45% 0K 6
14 10 Blake-Jones 82% 73% .3M 6
15 70 Briar Hill-Belgravia 56% 50% 2K 13
16 140 Bridle Path-Sunnybrook-York Mills 31% 21% .4M 53
17 42 Broadview North 67% 57% .1M 6
18 134 Brookhaven-Amesbury 33% 33% 3K 11
19 83 Cabbagetown-South St. James Town 53% 45% .2M 20
20 45 Caledonia-Fairbank 63% 67% 8K 12
21 108 Casa Loma 45% 34% .7M 36
22 11 Centennial Scarborough 80% 75% .1M 8
23 135 Church-Yonge Corridor 35% 28% 6K 172
24 9 Clairlea-Birchmount 78% 89% 9K 13
25 96 Clanton Park 47% 47% .0M 36
26 35 Cliffcrest 68% 66% .3M 19
27 47 Corso Italia-Davenport 63% 63% .2M 18
28 114 Crescent Town 44% 32% 6K 16
29 8 Danforth 84% 74% .3M 6
30 6 Danforth Village-East York 85% 76% .3M 16
31 105 Don Valley Village 49% 33% 4K 25
32 73 Dorset Park 51% 60% 1K 20
33 38 Dovercourt-Wallace Emerson-Junction 69% 57% .1M 41
34 89 Downsview-Roding-CFB 50% 48% 9K 64
35 30 Dufferin Grove 70% 67% .1M 9
36 24 East End-Danforth 73% 67% .1M 13
37 7 East York 84% 76% .4M 12
38 136 Edenbridge-Humber Valley 37% 22% .5M 31
39 51 Eglinton East 60% 71% 6K 14
40 143 Elms-Old Rexdale 15% 30% 1K 10
41 77 Englemount-Lawrence 55% 42% .2M 27
42 44 Eringate-Centennial-West Deane 66% 58% .1M 15
43 103 Etobicoke West Mall 45% 45% 5K 16
44 132 Flemingdon Park 33% 36% 1K 31
45 68 Forest Hill North 60% 40% .8M 9
46 121 Forest Hill South 41% 34% .3M 29
47 40 Glenfield-Jane Heights 65% 63% 1K 14
48 20 Greenwood-Coxwell 71% 74% .3M 21
49 18 Guildwood 72% 76% 6K 16
50 119 Henry Farm 42% 35% 3K 40
51 81 High Park North 53% 44% .2M 16
52 94 High Park-Swansea 48% 47% .2M 23
53 27 Highland Creek 73% 64% .4M 20
54 16 Hillcrest Village 76% 76% .1M 25
55 85 Humber Heights 55% 36% 9K 22
56 87 Humber Summit 50% 50% 4K 9
57 1 Humberlea-Pelmo Park W4 100% 100% 2K 5
58 32 Humberlea-Pelmo Park W5 68% 71% 2K 10
59 95 Humbermede 45% 55% 3K 8
60 53 Humewood-Cedarvale 65% 55% .5M 4
61 50 Ionview 57% 79% 4K 7
62 Islington 0% 0% [=11=] 2
63 111 Islington-City Centre West 42% 41% 2K 65
64 34 Junction Area 67% 72% .0M 9
65 86 Keelesdale-Eglinton West 48% 57% 6K 13
66 54 Kennedy Park 62% 59% 9K 16
67 139 Kensington-Chinatown 29% 32% 9K 37
68 106 Kingsview Village-The Westway 46% 40% 2K 33
69 118 Kingsway South 42% 35% .3M 21
70 37 L'Amoreaux 65% 72% 3K 36
71 23 Lambton Baby Point 71% 71% .7M 4
72 131 Lansing-Westgate 35% 31% .2M 30
73 61 Lawrence Park North 61% 53% .9M 10
74 41 Lawrence Park South 71% 45% .1M 31
75 72 Leaside 57% 43% .1M 28
76 97 Little Portugal 48% 43% 9K 33
77 69 Long Branch 58% 45% 9K 23
78 58 Malvern 58% 65% 2K 51
79 101 Maple Leaf 47% 41% .2M 11
80 98 Markland Wood 48% 41% 0K 10
81 33 Milliken 66% 75% 0K 27
82 99 Mimico 48% 40% 0K 131
83 46 Morningside 59% 76% 7K 10
84 93 Moss Park 50% 43% 1K 53
85 110 Mount Dennis 35% 61% 6K 18
86 128 Mount Olive-Silverstone-Jamestown 33% 39% 7K 21
87 76 Mount Pleasant East 55% 45% .4M 34
88 133 Mount Pleasant West 35% 29% 0K 66
89 48 New Toronto 63% 63% .2M 11
90 120 Newtonbrook East 45% 25% .6M 44
91 88 Newtonbrook West 52% 43% .1M 60
92 92 Niagara 49% 44% 4K 78
93 21 North Riverdale 70% 78% .8M 7
94 125 North St. James Town 36% 36% 0K 8
95 43 O'Connor-Parkview 67% 57% .4M 12
96 36 Oakridge 69% 62% 8K 8
97 39 Oakwood-Vaughan 69% 54% .1M 25
98 60 Palmerston-Little Italy 67% 39% .3M 9
99 74 Parkwoods-Donalda 52% 55% 1K 29
100 14 Playter Estates-Danforth 80% 70% .6M 1
101 67 Pleasant View 56% 60% 7K 11
102 28 Princess-Rosethorn 73% 64% .8M 8
103 90 Regent Park 51% 43% 4K 19
104 26 Rexdale-Kipling 70% 73% 6K 9
105 79 Rockcliffe-Smythe 51% 54% 1K 22
106 29 Roncesvalles 72% 65% .4M 22
107 124 Rosedale-Moore Park 42% 24% .1M 73
108 2 Rouge E10 94% 94% .0M 7
109 22 Rouge E11 68% 81% 4K 24
110 3 Runnymede-Bloor West Village 91% 82% .6M 7
111 12 Rustic 71% 100% .0M 8
112 84 Scarborough Village 47% 61% 0K 28
113 100 South Parkdale 46% 46% .0M 5
114 25 South Riverdale 72% 69% .1M 52
115 102 St. Andrew-Windfields 48% 36% .6M 55
116 49 Steeles 63% 63% .1M 24
117 78 Stonegate-Queensway 53% 48% .3M 33
118 56 Tam O'Shanter-Sullivan 61% 62% 7K 18
119 15 The Beaches 80% 70% .7M 43
120 65 Thistletown-Beaumonde Heights 57% 57% .1M 8
121 138 Thorncliffe Park 36% 18% 2K 2
122 104 Trinity-Bellwoods 44% 46% .6M 21
123 116 University 44% 29% .2M 17
124 31 Victoria Village 71% 63% 0K 11
125 123 Waterfront Communities C1 41% 29% 7K 377
126 130 Waterfront Communities C8 35% 31% 2K 87
127 57 West Hill 60% 65% 9K 35
128 117 West Humber-Clairville 41% 38% 5K 27
129 115 Westminster-Branson 41% 40% 6K 27
130 62 Weston 58% 60% 0K 16
131 75 Weston-Pellam Park 53% 53% 9K 10
132 55 Wexford-Maryvale 62% 60% 0K 21
133 126 Willowdale East 36% 32% 3K 139
134 122 Willowdale West 42% 32% .1M 44
135 52 Willowridge-Martingrove-Richview 63% 63% .1M 19
136 82 Woburn 50% 54% 6K 36
137 13 Woodbine Corridor 79% 79% .2M 4
138 4 Woodbine-Lumsden 86% 86% .1M 8
139 17 Wychwood 78% 63% .4M 8
140 109 Yonge-Eglinton 44% 38% .5M 15
141 137 Yonge-St. Clair 36% 23% .6M 33
142 129 York University Heights 34% 33% 3K 23
143 127 Yorkdale-Glen Park 36% 33% 2K 43