网页抓取 Python BeautifulSoup
Webscraping Python BeautifulSoup
我正在编写一个 python 代码来寻找欧洲之星的最低票价。我对 BeautifoulSoup 很陌生,所以我不太了解它。出于某种原因,代码在理论上应该从“ul”tables 中检索信息。
代码如下:
input_parser = InputParser()
input_parser.inputDestinations("London","Paris")
input_parser.adults=2
input_parser.inputDates("2021-10-08","2021-10-10")
URL = input_parser.createURL()
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("ul", {"class": "train-table"})
class 输入解析器基本上 return 是基于特定数据的 URL:
class InputParser():
def __init__(self):
self.mapOfDestinations = {"London": "7015400", "Paris": "8727100", "Brussels": "8814001"}
self.destinations = []
self.adults = 0
self.departureDate = ""
self.arrivalDate = ""
def inputDestinations(self, departureDestination, arrivalDestination):
self.destinations.append(self.mapOfDestinations[departureDestination])
self.destinations.append(self.mapOfDestinations[arrivalDestination])
def inputDates(self, departureDate, arrivalDate):
self.departureDate = departureDate
self.arrivalDate = arrivalDate
def inputAdults(self, numberOfAdults):
self.adults = numberOfAdults
def createURL(self):
default_URL = "https://booking.eurostar.com/uk-en/train-search?origin={0}&destination={1}&adult={2}&outbound-date={3}&inbound-date={4}". \
format(self.destinations[0], self.destinations[1], self.adults, self.departureDate, self.arrivalDate)
return default_URL
我的代码应该 return“ul”table 链接到“train-table”,但它 returns None。知道我做错了什么吗?
如果您想查看源代码,代码给出以下内容 URL:https://booking.eurostar.com/uk-en/train-search?origin=7015400&destination=8727100&adult=1&outbound-date=2021-10-08&inbound-date=2021-10-10
非常感谢!
您看到的数据是从外部 URL 加载的,因此 beautifulsoup
看不到它。但是你可以使用 requests
模块来模拟这个查询:
import json
import requests
origin = "7015400"
destination = "8727100"
api_url = f"https://api.prod.eurostar.com/bpa/train-search/uk-en/{origin}/{destination}"
params = {
"outbound-date": "2021-10-08",
"inbound-date": "2021-10-10",
"adult": "1",
"booking-type": "standard",
}
headers = {"X-apikey": "0aa3d4b7e805493c8e310cfb871c4344"}
data = requests.get(api_url, params=params, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for j in data["outbound"]["journey"]:
for c in j["class"]:
if "price" in c:
print(
"{:<10} {:<10} {:<10} {:<10}".format(
j["departureTime"],
j["arrivalTime"],
c["remaining"],
c["price"]["adult"],
)
)
打印:
07:01 10:17 150 134.5
07:01 10:17 20 149.5
07:01 10:17 47 245
08:01 11:17 70 134.5
08:01 11:17 2 179.5
08:01 11:17 30 245
10:24 13:47 27 134.5
10:24 13:47 10 179.5
10:24 13:47 31 245
12:24 15:47 70 134.5
12:24 15:47 50 219.5
12:24 15:47 13 245
16:31 19:47 7 134.5
16:31 19:47 41 219.5
16:31 19:47 31 245
19:01 22:17 45 134.5
19:01 22:17 8 149.5
19:01 22:17 42 245
20:01 23:17 35 74.5
20:01 23:17 19 119.5
20:01 23:17 51 245
我正在编写一个 python 代码来寻找欧洲之星的最低票价。我对 BeautifoulSoup 很陌生,所以我不太了解它。出于某种原因,代码在理论上应该从“ul”tables 中检索信息。
代码如下:
input_parser = InputParser()
input_parser.inputDestinations("London","Paris")
input_parser.adults=2
input_parser.inputDates("2021-10-08","2021-10-10")
URL = input_parser.createURL()
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find_all("ul", {"class": "train-table"})
class 输入解析器基本上 return 是基于特定数据的 URL:
class InputParser():
def __init__(self):
self.mapOfDestinations = {"London": "7015400", "Paris": "8727100", "Brussels": "8814001"}
self.destinations = []
self.adults = 0
self.departureDate = ""
self.arrivalDate = ""
def inputDestinations(self, departureDestination, arrivalDestination):
self.destinations.append(self.mapOfDestinations[departureDestination])
self.destinations.append(self.mapOfDestinations[arrivalDestination])
def inputDates(self, departureDate, arrivalDate):
self.departureDate = departureDate
self.arrivalDate = arrivalDate
def inputAdults(self, numberOfAdults):
self.adults = numberOfAdults
def createURL(self):
default_URL = "https://booking.eurostar.com/uk-en/train-search?origin={0}&destination={1}&adult={2}&outbound-date={3}&inbound-date={4}". \
format(self.destinations[0], self.destinations[1], self.adults, self.departureDate, self.arrivalDate)
return default_URL
我的代码应该 return“ul”table 链接到“train-table”,但它 returns None。知道我做错了什么吗?
如果您想查看源代码,代码给出以下内容 URL:https://booking.eurostar.com/uk-en/train-search?origin=7015400&destination=8727100&adult=1&outbound-date=2021-10-08&inbound-date=2021-10-10
非常感谢!
您看到的数据是从外部 URL 加载的,因此 beautifulsoup
看不到它。但是你可以使用 requests
模块来模拟这个查询:
import json
import requests
origin = "7015400"
destination = "8727100"
api_url = f"https://api.prod.eurostar.com/bpa/train-search/uk-en/{origin}/{destination}"
params = {
"outbound-date": "2021-10-08",
"inbound-date": "2021-10-10",
"adult": "1",
"booking-type": "standard",
}
headers = {"X-apikey": "0aa3d4b7e805493c8e310cfb871c4344"}
data = requests.get(api_url, params=params, headers=headers).json()
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for j in data["outbound"]["journey"]:
for c in j["class"]:
if "price" in c:
print(
"{:<10} {:<10} {:<10} {:<10}".format(
j["departureTime"],
j["arrivalTime"],
c["remaining"],
c["price"]["adult"],
)
)
打印:
07:01 10:17 150 134.5
07:01 10:17 20 149.5
07:01 10:17 47 245
08:01 11:17 70 134.5
08:01 11:17 2 179.5
08:01 11:17 30 245
10:24 13:47 27 134.5
10:24 13:47 10 179.5
10:24 13:47 31 245
12:24 15:47 70 134.5
12:24 15:47 50 219.5
12:24 15:47 13 245
16:31 19:47 7 134.5
16:31 19:47 41 219.5
16:31 19:47 31 245
19:01 22:17 45 134.5
19:01 22:17 8 149.5
19:01 22:17 42 245
20:01 23:17 35 74.5
20:01 23:17 19 119.5
20:01 23:17 51 245