Selenium 等待 javascript 超时
Selenium wait for javascript timingout
我想做的是抓取以下站点 https://wiki.openstreetmap.org/wiki/Key:office,特别是包含所有标签的 table,因此所有内容都包含在其中:
<table class="wikitable taginfo-taglist">...<\table>
因为里面的所有东西:
<div class="taglist" ...> ... <\div>
(table 的父项)由 JavaScript 生成 我认为这段代码可以工作:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_argument("--headless")
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
driver = webdriver.Firefox(options=options, capabilities=caps, executable_path='../statics/geckodriver')
def get_tag_soup(url):
driver.get(url)
try:
table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME , "wikitable taginfo-taglist")))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')
except Exception as e:
soup = e
return soup
get_tag_soup('https://wiki.openstreetmap.org/wiki/Key:office')
但是当我 运行 这段代码时,如果我 WebDriverWait
对于 "wikitable taginfo-taglist"
的父级和 EC.presence_of_element_located((By.CLASS_NAME , "taglist"))
它有效。
要提取包含所有标签的 table 而不是 you have to induce WebDriverWait for the and you can use the following :
使用CSS_SELECTOR
:
driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.wikitable.taginfo-taglist"))).text)
使用XPATH
:
driver.get("https://wiki.openstreetmap.org/wiki/Key:office")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='wikitable taginfo-taglist']"))).text)
控制台输出:
Key Value Element Description Map rendering Image Count
office accountant An office for an accountant.
6 895
1 967
14
office advertising_agency A service-based business dedicated to creating, planning, and handling advertising.
3 916
580
3
office architect An office for an architect or group of architects.
5 715
1 239
12
office association An office of a non-profit organisation, society, e.g. student, sport, consumer, automobile, bike association, etc.
13 054
3 286
50
office charity An office of a charitable organization
696
384
7
office company An office of a private company
129 801
36 951
608
office consulting An office for a consulting firm, providing expert professional advice to other companies or organisations.
1 341
162
4
office coworking An office where people can go to work (might require a fee); not limited to a single employer
1 297
320
7
office diplomatic
6 634
4 065
95
office educational_institution An office for an educational institution.
14 172
8 563
175
office employment_agency An office for an employment service.
7 300
1 771
43
office energy_supplier An office for a energy supplier.
2 237
1 112
19
office engineer An office for an engineer or group of engineers.
454
98
2
office estate_agent A place where you can rent or buy a house.
44 813
8 042
39
office financial An office of a company in the financial sector
4 891
1 588
24
office forestry A forestry office
523
741
9
office foundation An office of a foundation
1 757
542
10
office government An office of a (supra)national, regional or local government agency or department
98 289
70 569
2 300
office guide An office for tour guides, mountain guides, dive guides, etc.
587
168
1
office insurance An office at which you can take out insurance policies.
34 693
6 475
91
office it An office for an IT specialist.
9 486
2 039
51
office lawyer An office for a lawyer.
22 881
4 841
22
office logistics An office for a forwarder / hauler.
2 796
677
8
office moving_company An office which offers a relocation service.
605
252
4
office newspaper An office of a newspaper
3 511
1 450
27
office ngo An office for a non-profit, non-governmental organisation (NGO).
12 693
3 565
58
office notary An office for a notary public (common law)
3 860
548
9
office political_party An office of a political party
3 354
1 017
8
office property_management Office of a company, which manages a real estate property.
796
162
2
office quango An office of a quasi-autonomous non-governmental organisation.
366
233
4
office religion office of a community of faith
5 807
2 172
43
office research An office for research and development
3 667
4 545
348
office surveyor An office of a person doing surveys, this can be risk and damage evaluations of properties and equipment, opinion surveys or statistics.
451
109
1
office tax_advisor An office for a financial expert specially trained in tax law
5 053
823
4
office telecommunication An office for a telecommunication company
16 968
4 335
77
office visa An office of an organisation or business which offers visa assistance
95
1
0
office water_utility The office for a water utility company or water board.
743
908
20
office yes Generic tag for unspecified office type.
27 434
36 155
420
注意:请确保您已最大化浏览器 ,如下所示:
options.add_argument("start-maximized")
我想做的是抓取以下站点 https://wiki.openstreetmap.org/wiki/Key:office,特别是包含所有标签的 table,因此所有内容都包含在其中:
<table class="wikitable taginfo-taglist">...<\table>
因为里面的所有东西:
<div class="taglist" ...> ... <\div>
(table 的父项)由 JavaScript 生成 我认为这段代码可以工作:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = Options()
options.add_argument("--headless")
caps = webdriver.DesiredCapabilities().FIREFOX
caps["marionette"] = True
driver = webdriver.Firefox(options=options, capabilities=caps, executable_path='../statics/geckodriver')
def get_tag_soup(url):
driver.get(url)
try:
table = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME , "wikitable taginfo-taglist")))
soup = BeautifulSoup(table.get_attribute('innerHTML'), 'lxml')
except Exception as e:
soup = e
return soup
get_tag_soup('https://wiki.openstreetmap.org/wiki/Key:office')
但是当我 运行 这段代码时,如果我 WebDriverWait
对于 "wikitable taginfo-taglist"
的父级和 EC.presence_of_element_located((By.CLASS_NAME , "taglist"))
它有效。
要提取包含所有标签的 table 而不是
使用
CSS_SELECTOR
:driver.get("https://wiki.openstreetmap.org/wiki/Key:office") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.wikitable.taginfo-taglist"))).text)
使用
XPATH
:driver.get("https://wiki.openstreetmap.org/wiki/Key:office") print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//table[@class='wikitable taginfo-taglist']"))).text)
控制台输出:
Key Value Element Description Map rendering Image Count office accountant An office for an accountant. 6 895 1 967 14 office advertising_agency A service-based business dedicated to creating, planning, and handling advertising. 3 916 580 3 office architect An office for an architect or group of architects. 5 715 1 239 12 office association An office of a non-profit organisation, society, e.g. student, sport, consumer, automobile, bike association, etc. 13 054 3 286 50 office charity An office of a charitable organization 696 384 7 office company An office of a private company 129 801 36 951 608 office consulting An office for a consulting firm, providing expert professional advice to other companies or organisations. 1 341 162 4 office coworking An office where people can go to work (might require a fee); not limited to a single employer 1 297 320 7 office diplomatic 6 634 4 065 95 office educational_institution An office for an educational institution. 14 172 8 563 175 office employment_agency An office for an employment service. 7 300 1 771 43 office energy_supplier An office for a energy supplier. 2 237 1 112 19 office engineer An office for an engineer or group of engineers. 454 98 2 office estate_agent A place where you can rent or buy a house. 44 813 8 042 39 office financial An office of a company in the financial sector 4 891 1 588 24 office forestry A forestry office 523 741 9 office foundation An office of a foundation 1 757 542 10 office government An office of a (supra)national, regional or local government agency or department 98 289 70 569 2 300 office guide An office for tour guides, mountain guides, dive guides, etc. 587 168 1 office insurance An office at which you can take out insurance policies. 34 693 6 475 91 office it An office for an IT specialist. 9 486 2 039 51 office lawyer An office for a lawyer. 22 881 4 841 22 office logistics An office for a forwarder / hauler. 2 796 677 8 office moving_company An office which offers a relocation service. 605 252 4 office newspaper An office of a newspaper 3 511 1 450 27 office ngo An office for a non-profit, non-governmental organisation (NGO). 12 693 3 565 58 office notary An office for a notary public (common law) 3 860 548 9 office political_party An office of a political party 3 354 1 017 8 office property_management Office of a company, which manages a real estate property. 796 162 2 office quango An office of a quasi-autonomous non-governmental organisation. 366 233 4 office religion office of a community of faith 5 807 2 172 43 office research An office for research and development 3 667 4 545 348 office surveyor An office of a person doing surveys, this can be risk and damage evaluations of properties and equipment, opinion surveys or statistics. 451 109 1 office tax_advisor An office for a financial expert specially trained in tax law 5 053 823 4 office telecommunication An office for a telecommunication company 16 968 4 335 77 office visa An office of an organisation or business which offers visa assistance 95 1 0 office water_utility The office for a water utility company or water board. 743 908 20 office yes Generic tag for unspecified office type. 27 434 36 155 420
注意:请确保您已最大化浏览器
options.add_argument("start-maximized")