如何使用 python 从网站的链接页面中提取数据
how do I extract data from linked pages in websites using python
我一直在尝试从网页中抓取数据用于数据分析项目,并且成功地从单个页面中获取数据。
import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('key')
results = client.get(url = "https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate").text
print(results)
以网站“https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate”为例,我需要在每门课程中导航并获取来自该页面的称为持续时间的单个数据。
试试下面的方法:
client = ScraperAPIClient('key')
results = []
for i in range(10):
results.append(client.get(url = f"https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={i}1").text)
print(results)
遍历 10 个结果页面并将每个文本回复放入结果列表
import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('key')
total_pages = 12
for page_no in range(total_pages):
# you control this page_no variable.
# go to the website and see how the api go to the next page
# it depends on the 'start_rank' at the end of the URL
# for example start_rank=10, start_rank=20 will get you one page after another
rank = page_no * 10
results = client.get(url="https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={0}".format(rank)).text
print(results)
我一直在尝试从网页中抓取数据用于数据分析项目,并且成功地从单个页面中获取数据。
import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('key')
results = client.get(url = "https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate").text
print(results)
以网站“https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate”为例,我需要在每门课程中导航并获取来自该页面的称为持续时间的单个数据。
试试下面的方法:
client = ScraperAPIClient('key')
results = []
for i in range(10):
results.append(client.get(url = f"https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={i}1").text)
print(results)
遍历 10 个结果页面并将每个文本回复放入结果列表
import requests
from bs4 import BeautifulSoup
import concurrent.futures
from urllib.parse import urlencode
from scraper_api import ScraperAPIClient
client = ScraperAPIClient('key')
total_pages = 12
for page_no in range(total_pages):
# you control this page_no variable.
# go to the website and see how the api go to the next page
# it depends on the 'start_rank' at the end of the URL
# for example start_rank=10, start_rank=20 will get you one page after another
rank = page_no * 10
results = client.get(url="https://www.essex.ac.uk/course-search?query=&f.Level%7CcourseLevel=Undergraduate&start_rank={0}".format(rank)).text
print(results)