如何使用美汤进入下一页?
How to go to next page using beautiful soup?
我必须从一个网站的 5 个页面中提取信息。
在每一页的末尾都有 "NEXT PAGE" 按钮。这是下一个按钮的 html 代码 -
<li class="pagination__next" data-reactid=".0.3.0.0.1.1.1.3.2">
<span class="icon-arrowright-thin--pagination" data-reactid=".0.3.0.0.1.1.1.3.2.0">
::before
</span>
</li>
我正在使用 beautifulsoup4 提取信息。如何导航到下一页。
我可以使用 mechanize 来导航这种类型吗
BeautifulSoup 是一个 HTML 解析器,不是网络浏览器,它不能导航或下载页面。为此,您通常会使用像 urllib
或 request
这样的 HTTP 库从特定的 URL 中获取 HTML,以便将其提供给 BeautifulSoup。在您的情况下,mechanize
可用于执行此操作。
很遗憾,您的分页按钮提供的 HTML 不是 link,因此它没有 href
属性。如果是这样,您就可以轻松地从中解析 URL 并告诉您的 HTTP 库去获取它。
相反,您需要使用 mechanize 来模拟该按钮上的点击事件,等待一小段时间,然后假定新页面已加载,然后将生成的 HTML 传递给 BeautifulSoup.
如果"next page"涉及到javascript,那么是的,只能机械化。你可以用 selenium
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
client = webbrowser.get('firefox')
browser = webdriver.Chrome('./chromedriver')
url = "www.example.com"
browser.get(url)
###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
############## do your things with the first page
content = browser.page_source.encode('ascii','ignore').decode("utf-8")
#### Now if you are sure there is next page
next_button_class = 'icon-arrowright-thin--pagination' ###here insert the class of 'next button'
browser.find_element_by_class_name(next_button_class).click()
time.sleep(3)
###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
content = browser.page_source.encode('ascii','ignore').decode("utf-8")
您可以模仿 post 到 https://colleges.niche.com/entity-search/,但更简单的方法是从第一页获取总页数,然后在范围 2 到页数之间循环。添加到开始 url 的所有内容是 &page=page_number:
import requests
from bs4 import BeautifulSoup
start = "https://colleges.niche.com/?degree=4-year&sort=best"
url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
soup = BeautifulSoup(requests.get(start).content)
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
print([a.text for a in soup.select("a.search__results__list__item__entity")])
for page in range(2, pages):
soup = BeautifulSoup(requests.get(url.format(page)).content)
print([a.text for a in soup.select("a.search__results__list__item__entity")])
如果我们运行将代码迭代几次,您可以看到我们得到的每一页:
In [1]: import requests
...: from bs4 import BeautifulSoup
...: start = "https://colleges.niche.com/?degree=4-year&sort=best"
...: url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
...: soup = BeautifulSoup(requests.get(start).content, "html.parser")
...: pages = int(soup.select("select.pagination__pages__selector option")[-1]
...: .text.split(None, 1)[1])
...: print([a.text for a in soup.select("a.search__results__list__item__entit
...: y")])
...: for page in range(2, pages):
...: soup = BeautifulSoup(requests.get(url.format(page)).content, "html.p
...: arser")
...: print([a.text for a in soup.select("a.search__results__list__item__e
...: ntity")])
...:
[u'Stanford University', u'Massachusetts Institute of Technology', u'Yale University', u'Harvard University', u'Princeton University', u'Rice University', u'Bowdoin College', u'University of Pennsylvania', u'Washington University in St. Louis', u'Brown University', u'Duke University', u'Columbia University', u'Dartmouth College', u'Vanderbilt University', u'Pomona College', u'California Institute of Technology', u'University of Southern California', u'University of Notre Dame', u'University of Chicago', u'Washington & Lee University', u'Carleton College', u'Colgate University', u'University of Michigan - Ann Arbor', u'Northwestern University', u'Tufts University']
[u'Williams College', u'Georgetown University', u'Amherst College', u'Cornell University', u'Thomas Jefferson University', u'University of Texas - Health Science Center at Houston', u'Barnard College', u'Haverford College', u'Carnegie Mellon University', u'Emory University', u'University of California - Los Angeles', u'Harvey Mudd College', u'Medical University of South Carolina', u'Franklin W. Olin College of Engineering', u'Claremont McKenna College', u'Middlebury College', u'Swarthmore College', u'Bates College', u'University of Virginia', u'University of Texas - Austin', u'University of California - Berkeley', u'Virginia Tech', u'University of North Carolina at Chapel Hill', u'University of Texas - Medical Branch at Galveston', u'Davidson College']
[u'Colby College', u'Hamilton College', u'Samuel Merritt University', u'Georgia Institute of Technology', u'University of Richmond', u'Lehigh University', u'Grinnell College', u'Northeastern University', u'University of Illinois at Urbana-Champaign', u'New York University', u'University of Wisconsin', u'Wake Forest University', u'Reed College', u'Bucknell University', u'Oregon Health & Science University', u'Johns Hopkins University', u'Lafayette College', u'University of Texas - Health Science Center at San Antonio', u'Smith College', u'Wellesley College', u'University of Rochester', u'Scripps College', u'College of William & Mary', u'University of Florida', u'The Curtis Institute of Music']
[u'United States Coast Guard Academy', u'College of the Holy Cross', u'Penn State', u'Bryn Mawr College', u'Wesleyan University', u'Ohio State University', u'Colorado School of Mines', u'Texas A&M University', u'University of Maryland - Baltimore', u'Purdue University', u'University of California - Santa Barbara', u'University of Georgia', u'University of Miami', u'Tulane University', u'University of Tulsa', u'Boston College', u'The Juilliard School', u'Texas Tech University Health Sciences Center', u'Worcester Polytechnic Institute', u'Franklin & Marshall College', u'Brigham Young University', u'Southern Methodist University', u'Mount Holyoke College', u'Kenyon College', u'University of Washington']
如果你要模仿 post,下面的方法就可以了。根据您想要的数据,当您返回 json 时,这实际上可能更可取:
import requests
from bs4 import BeautifulSoup
start = "https://colleges.niche.com/?degree=4-year&sort=best"
post = "https://colleges.niche.com/entity-search/"
data = {"degreeType": ["4-year"], "sort": "best", "page": 1, "vertical": "colleges"}
soup = BeautifulSoup(requests.get(start).content, "html.parser")
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
for page in range(1, pages+ 1):
data["page"] = page
r = requests.post(post, json=data)
print(r.json())
这为您提供如下数据:
{u'count': 2854, u'results': [{u'reviewCount': 258, u'netPrice': 20315, u'reviewAvg': 3.7713178294573644, u'totalStudents': 2034, u'grade': 4.33, u'tagline': u'4 Year · Williamstown, MA', u'SATRange': u'1350-1560', u'label': u'Williams College', u'url': u'https://colleges.niche.com/williams-college/', u'ACTRange': u'31-34', u'location': {u'lat': 42.7117, u'lng': -73.2059}, u'guid': u'465D4A73-875C-498E-9C8F-E47568E156F2', u'type': u'College'}, {u'reviewCount': 1081, u'netPrice': 25786, u'reviewAvg': 3.698427382053654, u'totalStudents': 7226, u'grade': 4.33, u'tagline': u'4 Year · Washington, DC', u'SATRange': u'1320-1520', u'label': u'Georgetown University', u'url': u'https://colleges.niche.com/georgetown-university/', u'ACTRange': u'30-33', u'location': {u'lat': 38.9088, u'lng': -77.0735}, u'guid': u'34AF6312-6F20-4D90-B512-AC5CD720AB25', u'type': u'College'}, {u'reviewCount': 247, u'netPrice': 14687, u'reviewAvg': 3.8259109311740893, u'totalStudents': 1792, u'grade': 4.33, u'tagline': u'4 Year · Amherst, MA', u'SATRange': u'1350-1548', u'label': u'Amherst College', u'url': u'https://colleges.niche.com/amherst-college/', u'ACTRange': u'30-34', u'location': {u'lat': 42.3725, u'lng': -72.5185}, u'guid': u'127EC524-4BAC-4A5C-A7F5-1EAD9C309F44', u'type': u'College'}, {u'reviewCount': 1730, u'netPrice': 28537, u'reviewAvg': 3.654913294797688, u'totalStudents': 14269, u'grade': 4.33, u'tagline': u'4 Year · Ithaca, NY', u'SATRange': u'1330-1510', u'label': u'Cornell University', u'url': u'https://colleges.niche.com/cornell-university/', u'ACTRange': u'30-34', u'location': {u'lat': 42.4453, u'lng': -76.4827}, u'guid': u'C35E497B-10BC-4482-92E5-F27941433B02', u'type': u'College'}, {u'reviewCount': 254, u'netPrice': None, u'reviewAvg': 3.8149606299212597, u'totalStudents': 649, u'grade': 4.33, u'tagline': u'4 Year · Philadelphia, PA', u'SATRange': None, u'label': u'Thomas Jefferson University', u'url': u'https://colleges.niche.com/thomas-jefferson-university/', u'ACTRange': None, u'location': {u'lat': 39.9491, u'lng': -75.1581}, u'guid': u'E8C9EBC6-90C5-4CDF-A324-2CCE16060B61', u'type': u'College'}, {u'reviewCount': 131, u'netPrice': None, u'reviewAvg': 3.740458015267176, u'totalStudents': 539, u'grade': 4.33, u'tagline': u'4 Year · Houston, TX', u'SATRange': None, u'label': u'University of Texas - Health Science Center at Houston', u'url': u'https://colleges.niche.com/university-of-texas----health-science-center-at-houston/', u'ACTRange': None, u'location': {u'lat': 29.7029, u'lng': -95.4032}, u'guid': u'43EEDD7D-8204-4014-961B-BEDDBD4C6417', u'type': u'College'}, {u'reviewCount': 390, u'netPrice': 21791, u'reviewAvg': 3.776923076923077, u'totalStudents': 2537, u'grade': 4.33, u'tagline': u'4 Year · New York, NY', u'SATRange': u'1250-1440', u'label': u'Barnard College', u'url': u'https://colleges.niche.com/barnard-college/', u'ACTRange': u'28-32', u'location': {u'lat': 40.8091, u'lng': -73.964}, u'guid': u'DD4FCD82-8E4E-4F4C-A7DC-FADCEBB49681', u'type': u'College'}, {u'reviewCount': 190, u'netPrice': 22409, u'reviewAvg': 3.789473684210526, u'totalStudents': 1189, u'grade': 4.33, u'tagline': u'4 Year · Haverford, PA', u'SATRange': u'1330-1490', u'label': u'Haverford College', u'url': u'https://colleges.niche.com/haverford-college/', u'ACTRange': u'31-34', u'location': {u'lat': 40.0134, u'lng': -75.3026}, u'guid': u'271075B3-07A0-450B-B4F3-78EB1FC7C03A', u'type': u'College'}, {u'reviewCount': 1310, u'netPrice': 33670, u'reviewAvg': 3.6068702290076335, u'totalStudents': 5699, u'grade': 4.33, u'tagline': u'4 Year · Pittsburgh, PA', u'SATRange': u'1340-1540', u'label': u'Carnegie Mellon University', u'url': u'https://colleges.niche.com/carnegie-mellon-university/', u'ACTRange': u'30-34', u'location': {u'lat': 40.4446, u'lng': -79.9429}, u'guid': u'D8A17C0F-CC25-4D2A-B231-0303EA016427', u'type': u'College'}, {u'reviewCount': 1392, u'netPrice': 28203, u'reviewAvg': 3.757183908045977, u'totalStudents': 7732, u'grade': 4.33, u'tagline': u'4 Year · Atlanta, GA', u'SATRange': u'1280-1460', u'label': u'Emory University', u'url': u'https://colleges.niche.com/emory-university/', u'ACTRange': u'29-32', u'location': {u'lat': 33.7988, u'lng': -84.3258}, u'guid': u'86AD5853-ED72-4EFD-855C-4746FF698941', u'type': u'College'}, {u'reviewCount': 4465, u'netPrice': 12510, u'reviewAvg': 3.838521836506159, u'totalStudents': 29033, u'grade': 4.33, u'tagline': u'4 Year · Los Angeles, CA', u'SATRange': u'1190-1460', u'label': u'University of California - Los Angeles', u'url': u'https://colleges.niche.com/university-of-california----los-angeles/', u'ACTRange': u'27-33', u'location': {u'lat': 34.0689, u'lng': -118.444}, u'guid': u'1D1D82CF-C659-49F0-A526-7AFB85BD3A4F', u'type': u'College'}, {u'reviewCount': 122, u'netPrice': 33137, u'reviewAvg': 3.6639344262295084, u'totalStudents': 802, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1418-1570', u'label': u'Harvey Mudd College', u'url': u'https://colleges.niche.com/harvey-mudd-college/', u'ACTRange': u'33-35', u'location': {u'lat': 34.1061, u'lng': -117.711}, u'guid': u'20D662BE-8428-4DE2-BF0D-72D22F0A04B5', u'type': u'College'}, {u'reviewCount': 71, u'netPrice': None, u'reviewAvg': 4.014084507042253, u'totalStudents': 281, u'grade': 4.33, u'tagline': u'4 Year · Charleston, SC', u'SATRange': None, u'label': u'Medical University of South Carolina', u'url': u'https://colleges.niche.com/medical-university-of-south-carolina/', u'ACTRange': None, u'location': {u'lat': 32.786, u'lng': -79.9469}, u'guid': u'7CD7C977-D16A-4399-8D7E-3B1FA0DFAB7D', u'type': u'College'}, {u'reviewCount': 115, u'netPrice': 29979, u'reviewAvg': 4.095652173913043, u'totalStudents': 350, u'grade': 4.33, u'tagline': u'4 Year · Needham, MA', u'SATRange': u'1410-1550', u'label': u'Franklin W. Olin College of Engineering', u'url': u'https://colleges.niche.com/franklin-w-olin-college-of-engineering/', u'ACTRange': u'32-34', u'location': {u'lat': 42.2928, u'lng': -71.264}, u'guid': u'88A3438F-9304-481E-8022-0AE353991161', u'type': u'College'}, {u'reviewCount': 399, u'netPrice': 23982, u'reviewAvg': 3.87468671679198, u'totalStudents': 1298, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1350-1520', u'label': u'Claremont McKenna College', u'url': u'https://colleges.niche.com/claremont-mckenna-college/', u'ACTRange': u'30-33', u'location': {u'lat': 34.1023, u'lng': -117.707}, u'guid': u'DAE7241A-4D00-4C50-B1A5-F33BAF3A6C3B', u'type': u'College'}, {u'reviewCount': 458, u'netPrice': 20903, u'reviewAvg': 3.7139737991266375, u'totalStudents': 2492, u'grade': 4.33, u'tagline': u'4 Year · Middlebury, VT', u'SATRange': u'1260-1470', u'label': u'Middlebury College', u'url': u'https://colleges.niche.com/middlebury-college/', u'ACTRange': u'30-33', u'location': {u'lat': 44.0091, u'lng': -73.1761}, u'guid': u'0E72BF23-A3CF-4995-9585-33B5BD0F9222', u'type': u'College'}, {u'reviewCount': 401, u'netPrice': 22557, u'reviewAvg': 3.56857855361596, u'totalStudents': 1534, u'grade': 4.33, u'tagline': u'4 Year · Swarthmore, PA', u'SATRange': u'1360-1540', u'label': u'Swarthmore College', u'url': u'https://colleges.niche.com/swarthmore-college/', u'ACTRange': u'29-34', u'location': {u'lat': 39.9041, u'lng': -75.3561}, u'guid': u'891F20E2-4B6F-4626-83F3-15D502B2E7C1', u'type': u'College'}, {u'reviewCount': 320, u'netPrice': 22062, u'reviewAvg': 3.878125, u'totalStudents': 1773, u'grade': 4.33, u'tagline': u'4 Year · Lewiston, ME', u'SATRange': None, u'label': u'Bates College', u'url': u'https://colleges.niche.com/bates-college/', u'ACTRange': None, u'location': {u'lat': 44.1053, u'lng': -70.2033}, u'guid': u'2C036559-5EBB-4C00-B3B8-6679A91FB040', u'type': u'College'}, {u'reviewCount': 1995, u'netPrice': 14069, u'reviewAvg': 3.800501253132832, u'totalStudents': 15622, u'grade': 4.33, u'tagline': u'4 Year · Charlottesville, VA', u'SATRange': u'1250-1460', u'label': u'University of Virginia', u'url': u'https://colleges.niche.com/university-of-virginia/', u'ACTRange': u'28-33', u'location': {u'lat': 38.0365, u'lng': -78.5026}, u'guid': u'9EA86CB5-E8A6-47E6-A219-FDCABC31AE51', u'type': u'College'}, {u'reviewCount': 5513, u'netPrice': 16832, u'reviewAvg': 3.8824596408489027, u'totalStudents': 36309, u'grade': 4.33, u'tagline': u'4 Year · Austin, TX', u'SATRange': u'1170-1410', u'label': u'University of Texas - Austin', u'url': u'https://colleges.niche.com/university-of-texas----austin/', u'ACTRange': u'26-32', u'location': {u'lat': 30.2847, u'lng': -97.7373}, u'guid': u'BC90E2B6-E112-43ED-AC5C-3548829EA3DD', u'type': u'College'}, {u'reviewCount': 3718, u'netPrice': 16655, u'reviewAvg': 3.5922538999462077, u'totalStudents': 26320, u'grade': 4.33, u'tagline': u'4 Year · Berkeley, CA', u'SATRange': u'1240-1500', u'label': u'University of California - Berkeley', u'url': u'https://colleges.niche.com/university-of-california----berkeley/', u'ACTRange': u'29-34', u'location': {u'lat': 37.8715, u'lng': -122.26}, u'guid': u'09E8CD9A-F401-4C8B-A79C-F02E10AC0201', u'type': u'College'}, {u'reviewCount': 3382, u'netPrice': 18398, u'reviewAvg': 3.8793613246599645, u'totalStudents': 23685, u'grade': 4.33, u'tagline': u'4 Year · Blacksburg, VA', u'SATRange': u'1110-1320', u'label': u'Virginia Tech', u'url': u'https://colleges.niche.com/virginia-tech/', u'ACTRange': None, u'location': {u'lat': 37.2286, u'lng': -80.4233}, u'guid': u'EEB0E829-996A-45B1-9671-3EF4AF096423', u'type': u'College'}, {u'reviewCount': 2138, u'netPrice': 10936, u'reviewAvg': 3.7787652011225443, u'totalStudents': 17570, u'grade': 4.33, u'tagline': u'4 Year · Chapel Hill, NC', u'SATRange': u'1220-1420', u'label': u'University of North Carolina at Chapel Hill', u'url': u'https://colleges.niche.com/university-of-north-carolina-at-chapel-hill/', u'ACTRange': u'28-32', u'location': {u'lat': 35.9122, u'lng': -79.051}, u'guid': u'5712B0C1-3A40-4EA1-A324-9C4F76FEFD10', u'type': u'College'}, {u'reviewCount': 110, u'netPrice': None, u'reviewAvg': 3.8545454545454545, u'totalStudents': 586, u'grade': 4.33, u'tagline': u'4 Year · Galveston, TX', u'SATRange': None, u'label': u'University of Texas - Medical Branch at Galveston', u'url': u'https://colleges.niche.com/university-of-texas----medical-branch-at-galveston/', u'ACTRange': None, u'location': {u'lat': 29.3113, u'lng': -94.7764}, u'guid': u'5FEEDB69-A566-4671-B821-28304A74F474', u'type': u'College'}, {u'reviewCount': 264, u'netPrice': 22457, u'reviewAvg': 3.8333333333333335, u'totalStudents': 1770, u'grade': 4.33, u'tagline': u'4 Year · Davidson, NC', u'SATRange': u'1230-1440', u'label': u'Davidson College', u'url': u'https://colleges.niche.com/davidson-college/', u'ACTRange': u'28-32', u'location': {u'lat': 35.5, u'lng': -80.8452}, u'guid': u'1AD50A05-6325-4392-B428-A08C944E61EF', u'type': u'College'}], u'page': 1, u'pageSize': 25, u'pageCount': 40}
其中可能包含您不会在返回的源中获得的动态创建的内容。
对于评论 url https://colleges.niche.com/williams-college/reviews,您需要从源中解析一个标记,然后像以前一样执行 post:
import requests
import re
patt = re.compile('"entityGuid":"(.*?)"')
url = "https://colleges.niche.com/williams-college/reviews/"
soup = BeautifulSoup(requests.get(url).content)
data_tag = patt.search(soup.select_one("#dataLayerTag").text).group(1)
params = {"e": data_tag, "page": 2, "limit": "20"}
url = "https://niche.com/api/entity-reviews/"
resp = requests.get(url, params=params)
print(resp.json())
这给你:
{u'reviews': [{u'body': u'I enjoy being in classes here, but the work gets overwhelming. People are great but very cliquy.', u'rating': 4, u'guid': u'35b6faeb-95b2-4385-b3ee-19e6c7984e1b', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u'The alumni network is great. Easy to use. But the career center sucks.', u'rating': 4, u'guid': u'beddcae1-d860-4a8a-a431-45bf7e7087e6', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u"It's hard for sophomores to get good housing. Even as a senior, the good housings are far away from campus. But almost everyone has singles, even freshman.", u'rating': 3, u'guid': u'fff99560-0b4f-499d-a95b-7b3b3f9826f0', u'created': u'2016-04-20T22:19:27Z', u'author': u'College Sophomore'}, {u'body': u"We don't have greek life.", u'rating': 1, u'guid': u'69e60cf0-ff3c-4b34-acf1-6315d878c205', u'created': u'2016-04-20T22:17:35Z', u'author': u'College Sophomore'}, {u'body': u"There's not a lot of team spirit here. Athletes are nice, but they tend to hang among themselves.", u'rating': 3, u'guid': u'b31ee366-1b68-4c0f-b262-ff628243887c', u'created': u'2016-04-20T22:17:02Z', u'author': u'College Sophomore'}, {u'body': u'Williams offer a lot of chances to study abroad, but the social scene is very very limited.', u'rating': 4, u'guid': u'11a3feb2-21fa-45d9-8ee0-e6e1e8cea0c0', u'created': u'2016-04-20T22:15:35Z', u'author': u'College Sophomore'}, {u'body': u"Most people will live on campus all four years. It's not a bad deal!", u'rating': 4, u'guid': u'4a845124-7cfd-4059-8d63-cb1d414ce0cc', u'created': u'2016-04-08T13:58:30Z', u'author': u'College Senior'}, {u'body': u'The facilities have everything you could need as a varsity or non-varsity athlete. With our new football/lacrosse field and track, we have it made! Still, with an active there is always competition for prime field time, and IM sports are relegated either to early/late hours or ungroomed fields.', u'rating': 4, u'guid': u'31c89c4d-91ee-4b92-a198-3e12c304d7e1', u'created': u'2016-04-08T13:55:12Z', u'author': u'College Senior'}, {u'body': u'I have loved my time at Williams! The best part of my experience has been the people here, and as a senior trying to figure out post graduate plans, I am comforted by the willingness to help and commitment to the College from alumni. Go Ephs!', u'rating': 4, u'guid': u'4458ed87-4183-4784-908a-6ae67582e82c', u'created': u'2016-04-08T13:51:51Z', u'author': u'College Senior'}, {u'body': u'Could be better but overall good.', u'rating': 4, u'guid': u'08327955-2698-4fe6-ac1f-13108327cc21', u'created': u'2016-01-01T22:51:16Z', u'author': u'College Junior'}, {u'body': u'Better this year than past years.', u'rating': 3, u'guid': u'1892de02-eb45-42b5-b728-34912499e5eb', u'created': u'2016-01-01T22:43:54Z', u'author': u'College Junior'}, {u'body': u'Could have better facilities. Otherwise, great.', u'rating': 4, u'guid': u'2dc48cb2-d21f-4fd6-a9c7-19a5e513e6d6', u'created': u'2016-01-01T22:40:45Z', u'author': u'College Junior'}, {u'body': u'Awesome experience. Very community-oriented school. I love this place. Great people. Everyone wants to help you, the professors are amazing.', u'rating': 5, u'guid': u'5fa28a31-9391-4db7-b70d-5e2aa58708b3', u'created': u'2016-01-01T22:39:06Z', u'author': u'College Junior'}, {u'body': u"Williams has been the perfect place for me. My professors have been incredible mentors--I've gone to three professors' houses for dinner. The location is beautiful, and perfect for focusing on academics. I've been able to get very involved in all my clubs and really find what makes me passionate. But best of all is the people. They're all smart and talented and wonderful. I am so lucky.", u'rating': 5, u'guid': u'81ff499b-4721-4625-bee1-acf1e9b21916', u'created': u'2015-08-25T13:08:28Z', u'author': u'College Junior'}, {u'body': u"I don't know much, only seniors can live off campus.", u'rating': 3, u'guid': u'd9dc2e2f-a08d-4a01-8fe2-410623f93d7a', u'created': u'2015-04-27T19:31:06Z', u'author': u'College Freshman'}, {u'body': u"Everything closes really early, but there's some good food. No chains really.", u'rating': 3, u'guid': u'5993a99e-a936-40c8-ae0d-4581c8d089ef', u'created': u'2015-04-27T19:30:01Z', u'author': u'College Freshman'}, {u'body': u"It's kind of sad. There's never more than a handful of things happening on fridays or satudays and there's nothing for the rest of the week", u'rating': 3, u'guid': u'65c83983-2f6f-4b08-b870-06c35fd2b0e9', u'created': u'2015-04-27T19:27:34Z', u'author': u'College Freshman'}, {u'body': u"Having visitors is pretty easy. One of the officers is the worst but otherwise they're generally lenient about weed and alcohol.", u'rating': 4, u'guid': u'bcd95788-22b7-4a23-b942-2493206d1734', u'created': u'2015-04-27T19:21:34Z', u'author': u'College Freshman'}, {u'body': u"They usually give you a good package, but a lot of it is work-study and students don't have the free time for that here.", u'rating': 3, u'guid': u'1a87483c-952c-479b-9a57-65fb09895e75', u'created': u'2015-04-27T19:19:35Z', u'author': u'College Freshman'}, {u'body': u"Food is kind of repetitive. Pretty much all the kitchens are very wasteful. We can't use meal plans anywhere off campus.", u'rating': 3, u'guid': u'361b725f-bedc-4452-843d-5dc284c18dcd', u'created': u'2015-04-27T19:17:22Z', u'author': u'College Freshman'}], u'total': 246, u'limit': 20, u'page': 2}
您应该能够根据答案的其他部分自行判断。
我必须从一个网站的 5 个页面中提取信息。 在每一页的末尾都有 "NEXT PAGE" 按钮。这是下一个按钮的 html 代码 -
<li class="pagination__next" data-reactid=".0.3.0.0.1.1.1.3.2">
<span class="icon-arrowright-thin--pagination" data-reactid=".0.3.0.0.1.1.1.3.2.0">
::before
</span>
</li>
我正在使用 beautifulsoup4 提取信息。如何导航到下一页。 我可以使用 mechanize 来导航这种类型吗
BeautifulSoup 是一个 HTML 解析器,不是网络浏览器,它不能导航或下载页面。为此,您通常会使用像 urllib
或 request
这样的 HTTP 库从特定的 URL 中获取 HTML,以便将其提供给 BeautifulSoup。在您的情况下,mechanize
可用于执行此操作。
很遗憾,您的分页按钮提供的 HTML 不是 link,因此它没有 href
属性。如果是这样,您就可以轻松地从中解析 URL 并告诉您的 HTTP 库去获取它。
相反,您需要使用 mechanize 来模拟该按钮上的点击事件,等待一小段时间,然后假定新页面已加载,然后将生成的 HTML 传递给 BeautifulSoup.
如果"next page"涉及到javascript,那么是的,只能机械化。你可以用 selenium
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
client = webbrowser.get('firefox')
browser = webdriver.Chrome('./chromedriver')
url = "www.example.com"
browser.get(url)
###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
############## do your things with the first page
content = browser.page_source.encode('ascii','ignore').decode("utf-8")
#### Now if you are sure there is next page
next_button_class = 'icon-arrowright-thin--pagination' ###here insert the class of 'next button'
browser.find_element_by_class_name(next_button_class).click()
time.sleep(3)
###### Wait until you see some element that signals the page is completely loaded
WebDriverWait(browser, timeout=10).until(lambda x: x.find_element_by_class_name('Even'))
content = browser.page_source.encode('ascii','ignore').decode("utf-8")
您可以模仿 post 到 https://colleges.niche.com/entity-search/,但更简单的方法是从第一页获取总页数,然后在范围 2 到页数之间循环。添加到开始 url 的所有内容是 &page=page_number:
import requests
from bs4 import BeautifulSoup
start = "https://colleges.niche.com/?degree=4-year&sort=best"
url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
soup = BeautifulSoup(requests.get(start).content)
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
print([a.text for a in soup.select("a.search__results__list__item__entity")])
for page in range(2, pages):
soup = BeautifulSoup(requests.get(url.format(page)).content)
print([a.text for a in soup.select("a.search__results__list__item__entity")])
如果我们运行将代码迭代几次,您可以看到我们得到的每一页:
In [1]: import requests
...: from bs4 import BeautifulSoup
...: start = "https://colleges.niche.com/?degree=4-year&sort=best"
...: url = "https://colleges.niche.com/?degree=4-year&sort=best&page={}"
...: soup = BeautifulSoup(requests.get(start).content, "html.parser")
...: pages = int(soup.select("select.pagination__pages__selector option")[-1]
...: .text.split(None, 1)[1])
...: print([a.text for a in soup.select("a.search__results__list__item__entit
...: y")])
...: for page in range(2, pages):
...: soup = BeautifulSoup(requests.get(url.format(page)).content, "html.p
...: arser")
...: print([a.text for a in soup.select("a.search__results__list__item__e
...: ntity")])
...:
[u'Stanford University', u'Massachusetts Institute of Technology', u'Yale University', u'Harvard University', u'Princeton University', u'Rice University', u'Bowdoin College', u'University of Pennsylvania', u'Washington University in St. Louis', u'Brown University', u'Duke University', u'Columbia University', u'Dartmouth College', u'Vanderbilt University', u'Pomona College', u'California Institute of Technology', u'University of Southern California', u'University of Notre Dame', u'University of Chicago', u'Washington & Lee University', u'Carleton College', u'Colgate University', u'University of Michigan - Ann Arbor', u'Northwestern University', u'Tufts University']
[u'Williams College', u'Georgetown University', u'Amherst College', u'Cornell University', u'Thomas Jefferson University', u'University of Texas - Health Science Center at Houston', u'Barnard College', u'Haverford College', u'Carnegie Mellon University', u'Emory University', u'University of California - Los Angeles', u'Harvey Mudd College', u'Medical University of South Carolina', u'Franklin W. Olin College of Engineering', u'Claremont McKenna College', u'Middlebury College', u'Swarthmore College', u'Bates College', u'University of Virginia', u'University of Texas - Austin', u'University of California - Berkeley', u'Virginia Tech', u'University of North Carolina at Chapel Hill', u'University of Texas - Medical Branch at Galveston', u'Davidson College']
[u'Colby College', u'Hamilton College', u'Samuel Merritt University', u'Georgia Institute of Technology', u'University of Richmond', u'Lehigh University', u'Grinnell College', u'Northeastern University', u'University of Illinois at Urbana-Champaign', u'New York University', u'University of Wisconsin', u'Wake Forest University', u'Reed College', u'Bucknell University', u'Oregon Health & Science University', u'Johns Hopkins University', u'Lafayette College', u'University of Texas - Health Science Center at San Antonio', u'Smith College', u'Wellesley College', u'University of Rochester', u'Scripps College', u'College of William & Mary', u'University of Florida', u'The Curtis Institute of Music']
[u'United States Coast Guard Academy', u'College of the Holy Cross', u'Penn State', u'Bryn Mawr College', u'Wesleyan University', u'Ohio State University', u'Colorado School of Mines', u'Texas A&M University', u'University of Maryland - Baltimore', u'Purdue University', u'University of California - Santa Barbara', u'University of Georgia', u'University of Miami', u'Tulane University', u'University of Tulsa', u'Boston College', u'The Juilliard School', u'Texas Tech University Health Sciences Center', u'Worcester Polytechnic Institute', u'Franklin & Marshall College', u'Brigham Young University', u'Southern Methodist University', u'Mount Holyoke College', u'Kenyon College', u'University of Washington']
如果你要模仿 post,下面的方法就可以了。根据您想要的数据,当您返回 json 时,这实际上可能更可取:
import requests
from bs4 import BeautifulSoup
start = "https://colleges.niche.com/?degree=4-year&sort=best"
post = "https://colleges.niche.com/entity-search/"
data = {"degreeType": ["4-year"], "sort": "best", "page": 1, "vertical": "colleges"}
soup = BeautifulSoup(requests.get(start).content, "html.parser")
pages = int(soup.select("select.pagination__pages__selector option")[-1].text.split(None, 1)[1])
for page in range(1, pages+ 1):
data["page"] = page
r = requests.post(post, json=data)
print(r.json())
这为您提供如下数据:
{u'count': 2854, u'results': [{u'reviewCount': 258, u'netPrice': 20315, u'reviewAvg': 3.7713178294573644, u'totalStudents': 2034, u'grade': 4.33, u'tagline': u'4 Year · Williamstown, MA', u'SATRange': u'1350-1560', u'label': u'Williams College', u'url': u'https://colleges.niche.com/williams-college/', u'ACTRange': u'31-34', u'location': {u'lat': 42.7117, u'lng': -73.2059}, u'guid': u'465D4A73-875C-498E-9C8F-E47568E156F2', u'type': u'College'}, {u'reviewCount': 1081, u'netPrice': 25786, u'reviewAvg': 3.698427382053654, u'totalStudents': 7226, u'grade': 4.33, u'tagline': u'4 Year · Washington, DC', u'SATRange': u'1320-1520', u'label': u'Georgetown University', u'url': u'https://colleges.niche.com/georgetown-university/', u'ACTRange': u'30-33', u'location': {u'lat': 38.9088, u'lng': -77.0735}, u'guid': u'34AF6312-6F20-4D90-B512-AC5CD720AB25', u'type': u'College'}, {u'reviewCount': 247, u'netPrice': 14687, u'reviewAvg': 3.8259109311740893, u'totalStudents': 1792, u'grade': 4.33, u'tagline': u'4 Year · Amherst, MA', u'SATRange': u'1350-1548', u'label': u'Amherst College', u'url': u'https://colleges.niche.com/amherst-college/', u'ACTRange': u'30-34', u'location': {u'lat': 42.3725, u'lng': -72.5185}, u'guid': u'127EC524-4BAC-4A5C-A7F5-1EAD9C309F44', u'type': u'College'}, {u'reviewCount': 1730, u'netPrice': 28537, u'reviewAvg': 3.654913294797688, u'totalStudents': 14269, u'grade': 4.33, u'tagline': u'4 Year · Ithaca, NY', u'SATRange': u'1330-1510', u'label': u'Cornell University', u'url': u'https://colleges.niche.com/cornell-university/', u'ACTRange': u'30-34', u'location': {u'lat': 42.4453, u'lng': -76.4827}, u'guid': u'C35E497B-10BC-4482-92E5-F27941433B02', u'type': u'College'}, {u'reviewCount': 254, u'netPrice': None, u'reviewAvg': 3.8149606299212597, u'totalStudents': 649, u'grade': 4.33, u'tagline': u'4 Year · Philadelphia, PA', u'SATRange': None, u'label': u'Thomas Jefferson University', u'url': u'https://colleges.niche.com/thomas-jefferson-university/', u'ACTRange': None, u'location': {u'lat': 39.9491, u'lng': -75.1581}, u'guid': u'E8C9EBC6-90C5-4CDF-A324-2CCE16060B61', u'type': u'College'}, {u'reviewCount': 131, u'netPrice': None, u'reviewAvg': 3.740458015267176, u'totalStudents': 539, u'grade': 4.33, u'tagline': u'4 Year · Houston, TX', u'SATRange': None, u'label': u'University of Texas - Health Science Center at Houston', u'url': u'https://colleges.niche.com/university-of-texas----health-science-center-at-houston/', u'ACTRange': None, u'location': {u'lat': 29.7029, u'lng': -95.4032}, u'guid': u'43EEDD7D-8204-4014-961B-BEDDBD4C6417', u'type': u'College'}, {u'reviewCount': 390, u'netPrice': 21791, u'reviewAvg': 3.776923076923077, u'totalStudents': 2537, u'grade': 4.33, u'tagline': u'4 Year · New York, NY', u'SATRange': u'1250-1440', u'label': u'Barnard College', u'url': u'https://colleges.niche.com/barnard-college/', u'ACTRange': u'28-32', u'location': {u'lat': 40.8091, u'lng': -73.964}, u'guid': u'DD4FCD82-8E4E-4F4C-A7DC-FADCEBB49681', u'type': u'College'}, {u'reviewCount': 190, u'netPrice': 22409, u'reviewAvg': 3.789473684210526, u'totalStudents': 1189, u'grade': 4.33, u'tagline': u'4 Year · Haverford, PA', u'SATRange': u'1330-1490', u'label': u'Haverford College', u'url': u'https://colleges.niche.com/haverford-college/', u'ACTRange': u'31-34', u'location': {u'lat': 40.0134, u'lng': -75.3026}, u'guid': u'271075B3-07A0-450B-B4F3-78EB1FC7C03A', u'type': u'College'}, {u'reviewCount': 1310, u'netPrice': 33670, u'reviewAvg': 3.6068702290076335, u'totalStudents': 5699, u'grade': 4.33, u'tagline': u'4 Year · Pittsburgh, PA', u'SATRange': u'1340-1540', u'label': u'Carnegie Mellon University', u'url': u'https://colleges.niche.com/carnegie-mellon-university/', u'ACTRange': u'30-34', u'location': {u'lat': 40.4446, u'lng': -79.9429}, u'guid': u'D8A17C0F-CC25-4D2A-B231-0303EA016427', u'type': u'College'}, {u'reviewCount': 1392, u'netPrice': 28203, u'reviewAvg': 3.757183908045977, u'totalStudents': 7732, u'grade': 4.33, u'tagline': u'4 Year · Atlanta, GA', u'SATRange': u'1280-1460', u'label': u'Emory University', u'url': u'https://colleges.niche.com/emory-university/', u'ACTRange': u'29-32', u'location': {u'lat': 33.7988, u'lng': -84.3258}, u'guid': u'86AD5853-ED72-4EFD-855C-4746FF698941', u'type': u'College'}, {u'reviewCount': 4465, u'netPrice': 12510, u'reviewAvg': 3.838521836506159, u'totalStudents': 29033, u'grade': 4.33, u'tagline': u'4 Year · Los Angeles, CA', u'SATRange': u'1190-1460', u'label': u'University of California - Los Angeles', u'url': u'https://colleges.niche.com/university-of-california----los-angeles/', u'ACTRange': u'27-33', u'location': {u'lat': 34.0689, u'lng': -118.444}, u'guid': u'1D1D82CF-C659-49F0-A526-7AFB85BD3A4F', u'type': u'College'}, {u'reviewCount': 122, u'netPrice': 33137, u'reviewAvg': 3.6639344262295084, u'totalStudents': 802, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1418-1570', u'label': u'Harvey Mudd College', u'url': u'https://colleges.niche.com/harvey-mudd-college/', u'ACTRange': u'33-35', u'location': {u'lat': 34.1061, u'lng': -117.711}, u'guid': u'20D662BE-8428-4DE2-BF0D-72D22F0A04B5', u'type': u'College'}, {u'reviewCount': 71, u'netPrice': None, u'reviewAvg': 4.014084507042253, u'totalStudents': 281, u'grade': 4.33, u'tagline': u'4 Year · Charleston, SC', u'SATRange': None, u'label': u'Medical University of South Carolina', u'url': u'https://colleges.niche.com/medical-university-of-south-carolina/', u'ACTRange': None, u'location': {u'lat': 32.786, u'lng': -79.9469}, u'guid': u'7CD7C977-D16A-4399-8D7E-3B1FA0DFAB7D', u'type': u'College'}, {u'reviewCount': 115, u'netPrice': 29979, u'reviewAvg': 4.095652173913043, u'totalStudents': 350, u'grade': 4.33, u'tagline': u'4 Year · Needham, MA', u'SATRange': u'1410-1550', u'label': u'Franklin W. Olin College of Engineering', u'url': u'https://colleges.niche.com/franklin-w-olin-college-of-engineering/', u'ACTRange': u'32-34', u'location': {u'lat': 42.2928, u'lng': -71.264}, u'guid': u'88A3438F-9304-481E-8022-0AE353991161', u'type': u'College'}, {u'reviewCount': 399, u'netPrice': 23982, u'reviewAvg': 3.87468671679198, u'totalStudents': 1298, u'grade': 4.33, u'tagline': u'4 Year · Claremont, CA', u'SATRange': u'1350-1520', u'label': u'Claremont McKenna College', u'url': u'https://colleges.niche.com/claremont-mckenna-college/', u'ACTRange': u'30-33', u'location': {u'lat': 34.1023, u'lng': -117.707}, u'guid': u'DAE7241A-4D00-4C50-B1A5-F33BAF3A6C3B', u'type': u'College'}, {u'reviewCount': 458, u'netPrice': 20903, u'reviewAvg': 3.7139737991266375, u'totalStudents': 2492, u'grade': 4.33, u'tagline': u'4 Year · Middlebury, VT', u'SATRange': u'1260-1470', u'label': u'Middlebury College', u'url': u'https://colleges.niche.com/middlebury-college/', u'ACTRange': u'30-33', u'location': {u'lat': 44.0091, u'lng': -73.1761}, u'guid': u'0E72BF23-A3CF-4995-9585-33B5BD0F9222', u'type': u'College'}, {u'reviewCount': 401, u'netPrice': 22557, u'reviewAvg': 3.56857855361596, u'totalStudents': 1534, u'grade': 4.33, u'tagline': u'4 Year · Swarthmore, PA', u'SATRange': u'1360-1540', u'label': u'Swarthmore College', u'url': u'https://colleges.niche.com/swarthmore-college/', u'ACTRange': u'29-34', u'location': {u'lat': 39.9041, u'lng': -75.3561}, u'guid': u'891F20E2-4B6F-4626-83F3-15D502B2E7C1', u'type': u'College'}, {u'reviewCount': 320, u'netPrice': 22062, u'reviewAvg': 3.878125, u'totalStudents': 1773, u'grade': 4.33, u'tagline': u'4 Year · Lewiston, ME', u'SATRange': None, u'label': u'Bates College', u'url': u'https://colleges.niche.com/bates-college/', u'ACTRange': None, u'location': {u'lat': 44.1053, u'lng': -70.2033}, u'guid': u'2C036559-5EBB-4C00-B3B8-6679A91FB040', u'type': u'College'}, {u'reviewCount': 1995, u'netPrice': 14069, u'reviewAvg': 3.800501253132832, u'totalStudents': 15622, u'grade': 4.33, u'tagline': u'4 Year · Charlottesville, VA', u'SATRange': u'1250-1460', u'label': u'University of Virginia', u'url': u'https://colleges.niche.com/university-of-virginia/', u'ACTRange': u'28-33', u'location': {u'lat': 38.0365, u'lng': -78.5026}, u'guid': u'9EA86CB5-E8A6-47E6-A219-FDCABC31AE51', u'type': u'College'}, {u'reviewCount': 5513, u'netPrice': 16832, u'reviewAvg': 3.8824596408489027, u'totalStudents': 36309, u'grade': 4.33, u'tagline': u'4 Year · Austin, TX', u'SATRange': u'1170-1410', u'label': u'University of Texas - Austin', u'url': u'https://colleges.niche.com/university-of-texas----austin/', u'ACTRange': u'26-32', u'location': {u'lat': 30.2847, u'lng': -97.7373}, u'guid': u'BC90E2B6-E112-43ED-AC5C-3548829EA3DD', u'type': u'College'}, {u'reviewCount': 3718, u'netPrice': 16655, u'reviewAvg': 3.5922538999462077, u'totalStudents': 26320, u'grade': 4.33, u'tagline': u'4 Year · Berkeley, CA', u'SATRange': u'1240-1500', u'label': u'University of California - Berkeley', u'url': u'https://colleges.niche.com/university-of-california----berkeley/', u'ACTRange': u'29-34', u'location': {u'lat': 37.8715, u'lng': -122.26}, u'guid': u'09E8CD9A-F401-4C8B-A79C-F02E10AC0201', u'type': u'College'}, {u'reviewCount': 3382, u'netPrice': 18398, u'reviewAvg': 3.8793613246599645, u'totalStudents': 23685, u'grade': 4.33, u'tagline': u'4 Year · Blacksburg, VA', u'SATRange': u'1110-1320', u'label': u'Virginia Tech', u'url': u'https://colleges.niche.com/virginia-tech/', u'ACTRange': None, u'location': {u'lat': 37.2286, u'lng': -80.4233}, u'guid': u'EEB0E829-996A-45B1-9671-3EF4AF096423', u'type': u'College'}, {u'reviewCount': 2138, u'netPrice': 10936, u'reviewAvg': 3.7787652011225443, u'totalStudents': 17570, u'grade': 4.33, u'tagline': u'4 Year · Chapel Hill, NC', u'SATRange': u'1220-1420', u'label': u'University of North Carolina at Chapel Hill', u'url': u'https://colleges.niche.com/university-of-north-carolina-at-chapel-hill/', u'ACTRange': u'28-32', u'location': {u'lat': 35.9122, u'lng': -79.051}, u'guid': u'5712B0C1-3A40-4EA1-A324-9C4F76FEFD10', u'type': u'College'}, {u'reviewCount': 110, u'netPrice': None, u'reviewAvg': 3.8545454545454545, u'totalStudents': 586, u'grade': 4.33, u'tagline': u'4 Year · Galveston, TX', u'SATRange': None, u'label': u'University of Texas - Medical Branch at Galveston', u'url': u'https://colleges.niche.com/university-of-texas----medical-branch-at-galveston/', u'ACTRange': None, u'location': {u'lat': 29.3113, u'lng': -94.7764}, u'guid': u'5FEEDB69-A566-4671-B821-28304A74F474', u'type': u'College'}, {u'reviewCount': 264, u'netPrice': 22457, u'reviewAvg': 3.8333333333333335, u'totalStudents': 1770, u'grade': 4.33, u'tagline': u'4 Year · Davidson, NC', u'SATRange': u'1230-1440', u'label': u'Davidson College', u'url': u'https://colleges.niche.com/davidson-college/', u'ACTRange': u'28-32', u'location': {u'lat': 35.5, u'lng': -80.8452}, u'guid': u'1AD50A05-6325-4392-B428-A08C944E61EF', u'type': u'College'}], u'page': 1, u'pageSize': 25, u'pageCount': 40}
其中可能包含您不会在返回的源中获得的动态创建的内容。
对于评论 url https://colleges.niche.com/williams-college/reviews,您需要从源中解析一个标记,然后像以前一样执行 post:
import requests
import re
patt = re.compile('"entityGuid":"(.*?)"')
url = "https://colleges.niche.com/williams-college/reviews/"
soup = BeautifulSoup(requests.get(url).content)
data_tag = patt.search(soup.select_one("#dataLayerTag").text).group(1)
params = {"e": data_tag, "page": 2, "limit": "20"}
url = "https://niche.com/api/entity-reviews/"
resp = requests.get(url, params=params)
print(resp.json())
这给你:
{u'reviews': [{u'body': u'I enjoy being in classes here, but the work gets overwhelming. People are great but very cliquy.', u'rating': 4, u'guid': u'35b6faeb-95b2-4385-b3ee-19e6c7984e1b', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u'The alumni network is great. Easy to use. But the career center sucks.', u'rating': 4, u'guid': u'beddcae1-d860-4a8a-a431-45bf7e7087e6', u'created': u'2016-04-20T22:24:56Z', u'author': u'College Sophomore'}, {u'body': u"It's hard for sophomores to get good housing. Even as a senior, the good housings are far away from campus. But almost everyone has singles, even freshman.", u'rating': 3, u'guid': u'fff99560-0b4f-499d-a95b-7b3b3f9826f0', u'created': u'2016-04-20T22:19:27Z', u'author': u'College Sophomore'}, {u'body': u"We don't have greek life.", u'rating': 1, u'guid': u'69e60cf0-ff3c-4b34-acf1-6315d878c205', u'created': u'2016-04-20T22:17:35Z', u'author': u'College Sophomore'}, {u'body': u"There's not a lot of team spirit here. Athletes are nice, but they tend to hang among themselves.", u'rating': 3, u'guid': u'b31ee366-1b68-4c0f-b262-ff628243887c', u'created': u'2016-04-20T22:17:02Z', u'author': u'College Sophomore'}, {u'body': u'Williams offer a lot of chances to study abroad, but the social scene is very very limited.', u'rating': 4, u'guid': u'11a3feb2-21fa-45d9-8ee0-e6e1e8cea0c0', u'created': u'2016-04-20T22:15:35Z', u'author': u'College Sophomore'}, {u'body': u"Most people will live on campus all four years. It's not a bad deal!", u'rating': 4, u'guid': u'4a845124-7cfd-4059-8d63-cb1d414ce0cc', u'created': u'2016-04-08T13:58:30Z', u'author': u'College Senior'}, {u'body': u'The facilities have everything you could need as a varsity or non-varsity athlete. With our new football/lacrosse field and track, we have it made! Still, with an active there is always competition for prime field time, and IM sports are relegated either to early/late hours or ungroomed fields.', u'rating': 4, u'guid': u'31c89c4d-91ee-4b92-a198-3e12c304d7e1', u'created': u'2016-04-08T13:55:12Z', u'author': u'College Senior'}, {u'body': u'I have loved my time at Williams! The best part of my experience has been the people here, and as a senior trying to figure out post graduate plans, I am comforted by the willingness to help and commitment to the College from alumni. Go Ephs!', u'rating': 4, u'guid': u'4458ed87-4183-4784-908a-6ae67582e82c', u'created': u'2016-04-08T13:51:51Z', u'author': u'College Senior'}, {u'body': u'Could be better but overall good.', u'rating': 4, u'guid': u'08327955-2698-4fe6-ac1f-13108327cc21', u'created': u'2016-01-01T22:51:16Z', u'author': u'College Junior'}, {u'body': u'Better this year than past years.', u'rating': 3, u'guid': u'1892de02-eb45-42b5-b728-34912499e5eb', u'created': u'2016-01-01T22:43:54Z', u'author': u'College Junior'}, {u'body': u'Could have better facilities. Otherwise, great.', u'rating': 4, u'guid': u'2dc48cb2-d21f-4fd6-a9c7-19a5e513e6d6', u'created': u'2016-01-01T22:40:45Z', u'author': u'College Junior'}, {u'body': u'Awesome experience. Very community-oriented school. I love this place. Great people. Everyone wants to help you, the professors are amazing.', u'rating': 5, u'guid': u'5fa28a31-9391-4db7-b70d-5e2aa58708b3', u'created': u'2016-01-01T22:39:06Z', u'author': u'College Junior'}, {u'body': u"Williams has been the perfect place for me. My professors have been incredible mentors--I've gone to three professors' houses for dinner. The location is beautiful, and perfect for focusing on academics. I've been able to get very involved in all my clubs and really find what makes me passionate. But best of all is the people. They're all smart and talented and wonderful. I am so lucky.", u'rating': 5, u'guid': u'81ff499b-4721-4625-bee1-acf1e9b21916', u'created': u'2015-08-25T13:08:28Z', u'author': u'College Junior'}, {u'body': u"I don't know much, only seniors can live off campus.", u'rating': 3, u'guid': u'd9dc2e2f-a08d-4a01-8fe2-410623f93d7a', u'created': u'2015-04-27T19:31:06Z', u'author': u'College Freshman'}, {u'body': u"Everything closes really early, but there's some good food. No chains really.", u'rating': 3, u'guid': u'5993a99e-a936-40c8-ae0d-4581c8d089ef', u'created': u'2015-04-27T19:30:01Z', u'author': u'College Freshman'}, {u'body': u"It's kind of sad. There's never more than a handful of things happening on fridays or satudays and there's nothing for the rest of the week", u'rating': 3, u'guid': u'65c83983-2f6f-4b08-b870-06c35fd2b0e9', u'created': u'2015-04-27T19:27:34Z', u'author': u'College Freshman'}, {u'body': u"Having visitors is pretty easy. One of the officers is the worst but otherwise they're generally lenient about weed and alcohol.", u'rating': 4, u'guid': u'bcd95788-22b7-4a23-b942-2493206d1734', u'created': u'2015-04-27T19:21:34Z', u'author': u'College Freshman'}, {u'body': u"They usually give you a good package, but a lot of it is work-study and students don't have the free time for that here.", u'rating': 3, u'guid': u'1a87483c-952c-479b-9a57-65fb09895e75', u'created': u'2015-04-27T19:19:35Z', u'author': u'College Freshman'}, {u'body': u"Food is kind of repetitive. Pretty much all the kitchens are very wasteful. We can't use meal plans anywhere off campus.", u'rating': 3, u'guid': u'361b725f-bedc-4452-843d-5dc284c18dcd', u'created': u'2015-04-27T19:17:22Z', u'author': u'College Freshman'}], u'total': 246, u'limit': 20, u'page': 2}
您应该能够根据答案的其他部分自行判断。