beautifulsoup webscraper 问题:无法在网页上找到表格
beautifulsoup webscraper problem: can't find tables on webpage
我想使用以下代码从 this 网站获取表格:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.flashscore.pl/pilka-nozna/'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all('table', {'class': 'soccer'})
print(len(containers))
但是当我尝试检查 print(len(containers))
得到了多少表时,我得到了 0。
有什么解决办法吗?
编辑:
页面可能是动态的。您可以像我在这里所做的那样使用 requests-html which allows you to let the page render before pulling the html, or you can use Selenium。
这导致 table class="soccer"
的 42 个元素
import bs4
from selenium import webdriver
url = 'https://www.flashscore.pl/pilka-nozna/'
browser = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
browser.get(url)
html = browser.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
containers = soup.find_all('table', {'class': 'soccer'})
browser.close()
In [11]: print(len(containers))
42
我想使用以下代码从 this 网站获取表格:
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.flashscore.pl/pilka-nozna/'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all('table', {'class': 'soccer'})
print(len(containers))
但是当我尝试检查 print(len(containers))
得到了多少表时,我得到了 0。
有什么解决办法吗?
编辑:
页面可能是动态的。您可以像我在这里所做的那样使用 requests-html which allows you to let the page render before pulling the html, or you can use Selenium。
这导致 table class="soccer"
的 42 个元素import bs4
from selenium import webdriver
url = 'https://www.flashscore.pl/pilka-nozna/'
browser = webdriver.Chrome('C:\chromedriver_win32\chromedriver.exe')
browser.get(url)
html = browser.page_source
soup = bs4.BeautifulSoup(html,'html.parser')
containers = soup.find_all('table', {'class': 'soccer'})
browser.close()
In [11]: print(len(containers))
42