来自 table 的动态内容 - 无法使用 Selenium 抓取
Dynamic content from table - can't scrape with Selenium
我的主要目标是从该站点table 抓取内容
polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
例如,我尝试 select table 中的内容 - 然后我想将 table 中的所有数据抓取到一个 .csv 文件中,但我 运行 在此任务开始时遇到问题。我尝试 select 第一行的内容,但看起来 Selenium 没有看到 table 区域的任何 HTML 内容。我的代码如下:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
以及我收到的错误:
ile "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\nc_scraper\nc_scraper\spiders\aaa.py", line 29, in <module>
a = driver.find_element(By.XPATH, '//*[@id="maindiv"]/div[1]/p').text
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1244, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="maindiv"]/div[1]/p"}
(Session info: headless chrome=97.0.4692.71)
Stacktrace:
Backtrace:
Ordinal0 [0x00EC6903+2517251]
Ordinal0 [0x00E5F8E1+2095329]
Ordinal0 [0x00D62848+1058888]
Ordinal0 [0x00D8D448+1233992]
Ordinal0 [0x00D8D63B+1234491]
Ordinal0 [0x00DB7812+1406994]
Ordinal0 [0x00DA650A+1336586]
Ordinal0 [0x00DB5BBF+1399743]
Ordinal0 [0x00DA639B+1336219]
Ordinal0 [0x00D827A7+1189799]
Ordinal0 [0x00D83609+1193481]
GetHandleVerifier [0x01055904+1577972]
GetHandleVerifier [0x01100B97+2279047]
GetHandleVerifier [0x00F56D09+534521]
GetHandleVerifier [0x00F55DB9+530601]
Ordinal0 [0x00E64FF9+2117625]
Ordinal0 [0x00E698A8+2136232]
Ordinal0 [0x00E699E2+2136546]
Ordinal0 [0x00E73541+2176321]
BaseThreadInitThunk [0x75DAFA29+25]
RtlGetAppContainerNamedObjectPath [0x77DE7A9E+286]
RtlGetAppContainerNamedObjectPath [0x77DE7A6E+238]
关于如何处理这个问题有什么想法吗?
要从 Token Natluk Community - polygonscan webpage you need to induce WebDriverWait for the and using DataFrame from Pandas you can use the following 的 Transfers table 中提取数据:
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#btnCookie"))).click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#tokentxnsiframe")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.table.table-md-text-normal"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
控制台输出:
[ Txn Hash Method ... Quantity Unnamed: 7
0 0x75411962e2e6527f5a032198816cafe4e1a475a4ebdf... Add Liquidity ET... ... 37929.272725 NaN
1 0x27f61026e9df4c0c14c6259f624917a12ce7f6c20eb7... Swap Exact ETH F... ... 50814.040553 NaN
2 0xd9ee0ed46ef8ce891e81787b25176530a30df6d2b98e... Add Liquidity ET... ... 55288.744543 NaN
3 0x3f3982a38ff3f5c5890eff12a9d3f7061fea88942d96... Add Liquidity ET... ... 978.219682 NaN
4 0x503fad1b044b98c58700d185eb8cb9c16a483fd748d7... Unstake ... 8884.911763 NaN
5 0x503fad1b044b98c58700d185eb8cb9c16a483fd748d7... Unstake ... 9026.302437 NaN
6 0xdc75ad4e37e232f8536305ef8c628fd9391c1f2c5d25... Transfer ... 114000.000000 NaN
7 0x218ae4183e632c47edf581705871a3f16dc32cc513ef... Add Liquidity ET... ... 45125.111655 NaN
8 0x9fbe017ebf37aea501050a68c8ab1d78734b576b5585... Swap Exact ETH F... ... 2563.443420 NaN
9 0xd30adcf551285d4b72495d55cc59ffaed82a224b138c... Claim ... 14923.359293 NaN
10 0x65c733e468df90eaed701bc4f1e21a4090924b1225c1... Swap Exact ETH F... ... 33055.752836 NaN
11 0x82c215000f9807a3a40fe3ef3e461ceac007513b49ff... Swap Exact ETH F... ... 6483.182959 NaN
12 0x6155da0b5b206a8ffffa300a5d75e23fa3833b9b079b... Swap Exact ETH F... ... 13005.174783 NaN
13 0x3435579c22e9fc42f6921229449c8cb18d133a207a66... Transfer ... 47500.000000 NaN
14 0x7a57be9b538e0c73df4b608a8323c2f678ba6136f9a9... Swap Exact ETH F... ... 19605.381370 NaN
15 0x8fe7787039c4a382f6420c78b48933dd59b0843c6ab4... Transfer ... 237500.000000 NaN
16 0x0e55aa0740f6c964db13efe52e1af58a35497f9a292d... Swap Exact ETH F... ... 6561.223602 NaN
17 0x9897d4a2f56a49a935a36183eee3dc846fc19610812c... Swap Exact ETH F... ... 19762.821100 NaN
18 0xf9c7d67bf679624640f20d69636f58f634bf66e7daed... Add Liquidity ET... ... 74224.394200 NaN
19 0x89b490947952e37e10a3619f8fbcb5a80b15f0e2f4aa... Add Liquidity ET... ... 14589.910231 NaN
20 0xc94e56bb3be04e610c6a89e934fb84bba58922f6641a... Transfer ... 142500.000000 NaN
21 0x68a5c142bbfa86b0aa4f469eb17f58e26b5251bd83e9... Swap Exact ETH F... ... 3307.607665 NaN
22 0x2597e521fd0a7e4edffe66007129c93d1dc22485b86a... Swap Exact ETH F... ... 66868.030051 NaN
23 0x14cc91039f59fd9143bc94132b9f053970947b79a16f... Swap Exact Token... ... 42683.069577 NaN
24 0xa5ab4179af827c6883e52cbc010509b701795a8136a0... Swap Exact ETH F... ... 3423.618394 NaN
[25 rows x 8 columns]]
我的主要目标是从该站点table 抓取内容
polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d
例如,我尝试 select table 中的内容 - 然后我想将 table 中的所有数据抓取到一个 .csv 文件中,但我 运行 在此任务开始时遇到问题。我尝试 select 第一行的内容,但看起来 Selenium 没有看到 table 区域的任何 HTML 内容。我的代码如下:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
# Options
chrome_options = Options()
chrome_options.add_argument("--headless")
# Set drive
chrome_driver_path = r"C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path, options=chrome_options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.XPATH, "//table/tbody/tr[0]")))
print(element)
except TimeoutException as e:
print(e)
以及我收到的错误:
ile "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\nc_scraper\nc_scraper\spiders\aaa.py", line 29, in <module>
a = driver.find_element(By.XPATH, '//*[@id="maindiv"]/div[1]/p').text
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1244, in find_element
return self.execute(Command.FIND_ELEMENT, {
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 424, in execute
self.error_handler.check_response(response)
File "C:\Users\kacpe\OneDrive\Pulpit\Python\Projekty\nc-coin-scraper\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 247, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//*[@id="maindiv"]/div[1]/p"}
(Session info: headless chrome=97.0.4692.71)
Stacktrace:
Backtrace:
Ordinal0 [0x00EC6903+2517251]
Ordinal0 [0x00E5F8E1+2095329]
Ordinal0 [0x00D62848+1058888]
Ordinal0 [0x00D8D448+1233992]
Ordinal0 [0x00D8D63B+1234491]
Ordinal0 [0x00DB7812+1406994]
Ordinal0 [0x00DA650A+1336586]
Ordinal0 [0x00DB5BBF+1399743]
Ordinal0 [0x00DA639B+1336219]
Ordinal0 [0x00D827A7+1189799]
Ordinal0 [0x00D83609+1193481]
GetHandleVerifier [0x01055904+1577972]
GetHandleVerifier [0x01100B97+2279047]
GetHandleVerifier [0x00F56D09+534521]
GetHandleVerifier [0x00F55DB9+530601]
Ordinal0 [0x00E64FF9+2117625]
Ordinal0 [0x00E698A8+2136232]
Ordinal0 [0x00E699E2+2136546]
Ordinal0 [0x00E73541+2176321]
BaseThreadInitThunk [0x75DAFA29+25]
RtlGetAppContainerNamedObjectPath [0x77DE7A9E+286]
RtlGetAppContainerNamedObjectPath [0x77DE7A6E+238]
关于如何处理这个问题有什么想法吗?
要从 Token Natluk Community - polygonscan webpage you need to induce WebDriverWait for the
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
options = Options()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('excludeSwitches', ['enable-logging'])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
s = Service('C:\BrowserDrivers\chromedriver.exe')
driver = webdriver.Chrome(service=s, options=options)
driver.get("https://polygonscan.com/token/0x64a795562b02830ea4e43992e761c96d208fc58d")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "button#btnCookie"))).click()
WebDriverWait(driver, 20).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe#tokentxnsiframe")))
data = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "table.table.table-md-text-normal"))).get_attribute("outerHTML")
df = pd.read_html(data)
print(df)
控制台输出:
[ Txn Hash Method ... Quantity Unnamed: 7
0 0x75411962e2e6527f5a032198816cafe4e1a475a4ebdf... Add Liquidity ET... ... 37929.272725 NaN
1 0x27f61026e9df4c0c14c6259f624917a12ce7f6c20eb7... Swap Exact ETH F... ... 50814.040553 NaN
2 0xd9ee0ed46ef8ce891e81787b25176530a30df6d2b98e... Add Liquidity ET... ... 55288.744543 NaN
3 0x3f3982a38ff3f5c5890eff12a9d3f7061fea88942d96... Add Liquidity ET... ... 978.219682 NaN
4 0x503fad1b044b98c58700d185eb8cb9c16a483fd748d7... Unstake ... 8884.911763 NaN
5 0x503fad1b044b98c58700d185eb8cb9c16a483fd748d7... Unstake ... 9026.302437 NaN
6 0xdc75ad4e37e232f8536305ef8c628fd9391c1f2c5d25... Transfer ... 114000.000000 NaN
7 0x218ae4183e632c47edf581705871a3f16dc32cc513ef... Add Liquidity ET... ... 45125.111655 NaN
8 0x9fbe017ebf37aea501050a68c8ab1d78734b576b5585... Swap Exact ETH F... ... 2563.443420 NaN
9 0xd30adcf551285d4b72495d55cc59ffaed82a224b138c... Claim ... 14923.359293 NaN
10 0x65c733e468df90eaed701bc4f1e21a4090924b1225c1... Swap Exact ETH F... ... 33055.752836 NaN
11 0x82c215000f9807a3a40fe3ef3e461ceac007513b49ff... Swap Exact ETH F... ... 6483.182959 NaN
12 0x6155da0b5b206a8ffffa300a5d75e23fa3833b9b079b... Swap Exact ETH F... ... 13005.174783 NaN
13 0x3435579c22e9fc42f6921229449c8cb18d133a207a66... Transfer ... 47500.000000 NaN
14 0x7a57be9b538e0c73df4b608a8323c2f678ba6136f9a9... Swap Exact ETH F... ... 19605.381370 NaN
15 0x8fe7787039c4a382f6420c78b48933dd59b0843c6ab4... Transfer ... 237500.000000 NaN
16 0x0e55aa0740f6c964db13efe52e1af58a35497f9a292d... Swap Exact ETH F... ... 6561.223602 NaN
17 0x9897d4a2f56a49a935a36183eee3dc846fc19610812c... Swap Exact ETH F... ... 19762.821100 NaN
18 0xf9c7d67bf679624640f20d69636f58f634bf66e7daed... Add Liquidity ET... ... 74224.394200 NaN
19 0x89b490947952e37e10a3619f8fbcb5a80b15f0e2f4aa... Add Liquidity ET... ... 14589.910231 NaN
20 0xc94e56bb3be04e610c6a89e934fb84bba58922f6641a... Transfer ... 142500.000000 NaN
21 0x68a5c142bbfa86b0aa4f469eb17f58e26b5251bd83e9... Swap Exact ETH F... ... 3307.607665 NaN
22 0x2597e521fd0a7e4edffe66007129c93d1dc22485b86a... Swap Exact ETH F... ... 66868.030051 NaN
23 0x14cc91039f59fd9143bc94132b9f053970947b79a16f... Swap Exact Token... ... 42683.069577 NaN
24 0xa5ab4179af827c6883e52cbc010509b701795a8136a0... Swap Exact ETH F... ... 3423.618394 NaN
[25 rows x 8 columns]]