如何从该网站上删除日期链接:https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345
How do I scrape the date links off this website: https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345
我正在尝试打印出现在本网站底部的链接列表中的日期。我不知道出了什么问题,因为没有错误出现。我已经尝试过更简单的方法,这些方法适用于纽约时报等网站来检索它们的所有 href。但是这些都不起作用,所以我查看了用户代理。
import urllib
import lxml.html
import urllib2
from urllib import URLopener
URLopener.version
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
MyOpener.version
myopener = MyOpener()
page = myopener.open('https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345')
page.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
for line in soup.find_all('a'):
print(line.get('href'))
执行下面的脚本。它会给你所有需要的链接:
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
page_url = "https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345"
page = requests.get(page_url).text
soup = BeautifulSoup(page, "lxml")
for items in soup.select(".dates"):
print(urljoin(page_url,items['href']))
部分输出:
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-14
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-09
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-08
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-05
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-10-31
我正在尝试打印出现在本网站底部的链接列表中的日期。我不知道出了什么问题,因为没有错误出现。我已经尝试过更简单的方法,这些方法适用于纽约时报等网站来检索它们的所有 href。但是这些都不起作用,所以我查看了用户代理。
import urllib
import lxml.html
import urllib2
from urllib import URLopener
URLopener.version
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
MyOpener.version
myopener = MyOpener()
page = myopener.open('https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345')
page.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
for line in soup.find_all('a'):
print(line.get('href'))
执行下面的脚本。它会给你所有需要的链接:
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
page_url = "https://flight-data.adsbexchange.com/activity?inputSelect=registration®istration=N12345"
page = requests.get(page_url).text
soup = BeautifulSoup(page, "lxml")
for items in soup.select(".dates"):
print(urljoin(page_url,items['href']))
部分输出:
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-14
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-09
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-08
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-11-05
https://flight-data.adsbexchange.com/map?icao=A061D9&date=2017-10-31